Ben Simo, former president of the Association for Software Testing did a keynote this past week at CAST2014 on his experiences with Healthcare.gov. The keynote was about an hour of Ben describing very real problems he experienced while trying to seek an insurance plan for his grand daughter. To be honest, what he experienced was horrifying. The problems Ben experienced ranged from not being able to create an account to significant security issues. You can find descriptions that experience here and here on Bens personal blog.
It is important to note that Ben was not ‘hacking’ the site in any regard. He had an authentic healthcare need, and actively sought to communicate the issues he found to the proper people. He was able to isolate and describe these problems because of the years he spent developing developing skills as a software tester.
Here is an interview with Ben about some of his work.
One interesting question came up during the talk:
Was the massive initial struggle with healthcare.gov caused by bad software testing?
I’m not so sure.
A lot of the pressing issues with the website have been resolved now and people have been successfully using it to sign up for insurance. There are some clear lessons we can take away from this that are representative of how most software projects work.
The second law of consulting according to Gerry Weinberg is “No matter how it looks at first, it’s always a people problem”
Communication is a tough problem. Telling people they need to improve their “communication skills” doesn’t work because it is vague and doesn’t tell the person specifically what needs to be improved. From the looks of the schedule above, people were communicating. Whether the message was being received and considered, is something else to consider. Sometimes, testers and Qa folks have a lower social status in software companies. Frequently giving negative feedback to people with a higher social status is tough. Even if the message is given, it might not make it to the desired person how the message was intended.
A dirty little secret of the software world is that a certain amount of time in our workday is spent in queues and waiting. Often we are waiting for someone else to finish a task so we can begin work on a dependent thing. This is even more pronounced in more traditional development methods that have rigid barriers between phases (requirements gathering, development, testing, etc). This inevitably creates a bottle neck, or in the case of healthcare.gov a collapse event, when testing begins. Think of a funnel. You pour a gallon of water into it and only a little bit trickles out at a time. This is how traditional development works to some degree. Testing adds scope to existing tasks and occasionally discovers new scope. The last couple weeks of a project is not an ideal time to discover that.
Failures are caused by systems
Failures of this scale don’t happen because a test group didn’t do a great job testing, or product managers doing a bad job gathering requirements and translating those into a document, or developers not communicating scheduling worries up the chain. Technology aside, software is a complicated system of social forces all coming together to make a product people will find useful. Blaming a single person, or a single group in the system just doesn’t make sense.
These problems happen on a lot of software projects, they didn’t magically appear for healthcare.gov. This time, people noticed because the stakes were high enough. Hopefully some of these lessons will be remembered and transfer into the mainstream software body of knowledge.
Healthcare.gov is not an island, I’d be willing to bet that most people that have been working in software for a few years have experienced some or all of the problems that caused the tragic launch.