At next month’s Conference of the Association for Software Testing (CAST) in Colorado Springs Doug Hoffman will call to question one of the most fundamental ideas in software testing: Do tests really pass or fail? I had the opportunity to talk with Hoffman about his conference session titled “Why tests don’t pass.”
Doug Hoffman has over thirty years experience in software quality assurance and has earned degrees in Computer Science, Electrical Engineering, and an MBA. He is currently working as an independent with Software Quality Methods, LLC. Hoffman is involved in just about every organization having to do with software quality; he’s an ASQ Fellow, a member in the ACM and IEEE, and is a Founding Member and a Director of the Association for Software Testing.
When asked to summarize his talk, Hoffman got straight to the point, “The results of running a test aren’t really pass or fail. I think this message will resonate with part of the audience and may inspire others to challenge the idea. CAST is a venue where such discussion is encouraged.”
The idea is expanded on in the summary for his talk:
Most testers think of tests passing or failing. Either they found a bug or they didn’t. Unfortunately, experience repeatedly shows us that passing a test doesn’t really mean there is no bug. It is possible for bugs to exist in the feature being tested in spite of passing the test of that capability. It is also quite possible for a test to surface an error but it not be detected at the time. Passing really only means that we didn’t notice anything interesting.
Likewise, failing a test is no guarantee that a bug is present. There could be a bug in the test itself, a configuration problem, corrupted data, or a host of other explainable reasons that do not mean that there is anything wrong with the software being tested. Failing really only means that something that was noticed warrants further investigation.
“I think all we can really conclude from a test is whether or not further work is appropriate.” Hoffman said. “The talk goes into why I think this, and some of the implications of thinking this way.”
When I asked Hoffman what inspired him to question the binary nature of a test, he said: “I was discussing the value (or lack of value) of pass/fail metrics when it occurred to me how bogus the numbers were, and some of the reasons. That led me to think through what ‘pass’ and ‘fail’ mean.”
So where does this leave teams who use pass/fail metrics? What does Hoffman see as a better alternative? Instead of a world of pass/fail, which doesn’t inspire additional work or thinking about the problem, he sees a system where a result might lead you down the road to additional investigation or bug reporting. With each result you have to ask additional questions before you move on. It challenges the tester to evaluate when they are really done with something, or if they’ve gotten all the value they can from an activity.
“Even with exploratory sessions, we conclude whether or not there are problems to report now and further avenues where we think we’ve detected problems, or not. For discrete test cases it is much clearer whether or not further work is indicated. In any case, most people refer to the software as failing or passing based on these indications.”
“The idea of a test passing/failing, indeed the idea of discrete tests, may be foreign to some people who have only known exploratory testing. In those contexts there may be audience members who may challenge that tests don’t pass or fail because the concepts aren’t applicable.”
So for Hoffman, testers doing exploratory testing face this issue all the time and already have methods for dealing with it. “There also could be criticism that I look at test results as being binary,” said Hoffman. “Others may consider there to be more than two outcomes. Again, I think it depends on how pass and fail are defined.”
In the past, Hoffman has done extensive work around test oracles. An oracle is the principle or mechanism by which we recognize a problem (that is, it’s how you can tell the good behavior from the bad). When asked how this work relates to his work on test oracles, Hoffman replied: “This is one conclusion I’ve drawn from that oracle work. Over the years I stopped talking about passing and failing, but had never consciously realized it.”
For more on the upcoming show, check out the CAST conference website. I also recommend, if you haven’t already, familiarizing yourself with Doug Hoffman’s work, which is available at Software Quality Methods.
Fiona Charles will share the details of her scenario testing method at this year’s Conference for the Association for Software Testing (CAST), which takes place July 13-16th in Colorado Springs. Charles has used this approach on several projects where test scenarios were designed based on models derived from system data. I recently had the opportunity to talk with Charles about her upcoming CAST presentation, titled “Modeling Scenarios with a Framework Based On Data.”
Charles teaches organizations to match their software testing to their business risks and opportunities. With 30 years experience in software development and integration projects, she has managed testing and consulted on testing on many projects for clients in retail, banking, financial services, health care, and telecommunications.
When asked where the talk came from, Charles said that she’s not seen very much written about scenario testing and that she believes it’s an important way to test in certain circumstances.
For the talk I searched online and in the books in my own testing library, but didn’t find much. I’ve cited the two really useful articles that anyone contemplating scenario testing should read. Cem Kaner’s article on scenario testing is an excellent general introduction to the topic, and Hans Buwalda’s Better Software article on soap opera testing describes one way to model a scenario test
She went on to describe the specific projects that formed the basis for her presentation, saying:
I originally developed this method of designing scenarios for large-scale systems integration tests. The first one I did was for a retail project where we were building a new customer rewards system and integrating it with both the in-store systems and the enterprise corporate systems. My role was to conduct a black-box functional test of the integrated systems, after each team had tested its own system and immediately before we went live with a pilot. It seemed obvious to me that I needed to build the test around the integrated data: the flows and the frequency with which different types of data changed. There were no context diagrams or dataflow diagrams for the integration, so I began by developing one, which I then used to model the integrated test and spec out the scenarios to run it.
The next project was another retail integration, this time integrating a new custom-built e-store and warehouse management system into a large and complicated suite of enterprise systems. I wanted to extend the method I’d used before by introducing more structure so we could construct scenarios from reusable building blocks. Also, I couldn’t be sure my team would be able to complete the transactions planned for a given day, and it was essential to be able to say exactly what should be expected in the downstream systems, where the data outcomes might take several days to appear. So I was especially keen to have an easy way of generating “dynamic” expected results for my core team to communicate to the extended team, which consisted of people evaluating each store-day’s test results in their own systems downstream. I was lucky to have on my team a programmer/tester who understood what I wanted to do and was committed to implementing it.
He and I worked together over the course of several projects, sharing ideas and spurring each other to build on and extend how we tested integrated systems. I’m a modeler by temperament and preference, and over time I began to abstract reusable models to have a way to talk about what we were doing. One of them I think of as a conceptual framework for building scenarios based on system data: the topic for this talk. The first time I applied it to testing a standalone system was for the project I’m using as the example for the talk, testing a customized Point of Sale (POS) system.
When asked why she focused on data as a basis for scenario testing, instead of focusing her test development around more traditional business requirements or the same source documents used by the programmers, Charles elaborated on some of the challenges seen with using that as the only approach.
I’ve never seen documented requirements that came near adequately describing what a system was intended to do. And even when use cases cover most of the intended interactions, they’re usually much too high-level to build tests from. Paradoxically, I find that on the one hand the sources used by the programmers are too high-level, as in use cases, and on the other hand, they mostly don’t take a large enough view of how the whole system operates, either in itself or in the context of the systems it’s integrated with.
I once managed the testing on a bank teller system where a significant downstream output was to the bank’s General Ledger system. Yet the architect’s context drawing didn’t show the GL—and neither he nor any of the programmers could tell us what we needed to know about it to model the test.
That’s one reason I think we need to build our own models for testing, typically based in some way on how we expect a system to be used and the outcomes we hypothesize. That could be a state transition model, one of many other kinds of models, or a combination. The essential thing is to model a test consciously taking a fresh approach, and not blindly accept what we’re given.
I asked Charles to share a bit about her impression of the AST conference. This was her second year being involved in the conference, and I was curious why she choose CAST as the venue for her message. “My hope,” said Charles, “is that CAST primarily attracts testers who are open-minded and curious, not wedded to “traditional” (and to me boring and suboptimal) ways of testing systems. Those are the testers I most want to talk to and learn from.”
One of the key aspects of CAST is that after a topic is presented, debate on issues related to the topic is encouraged. The AST community believes that through dialog and struggling with difficult problems you get better solutions. So I asked Fiona Charles what she thought a likely criticism of her presentation might be.
One criticism I might expect is that this is a structured method with built-in expected results, analogous to scripted testing. I can see an audience of mainly exploratory testers possibly having issues with this. My answer would be that this is one kind of testing, appropriate in contexts where we have to have strictly controlled dynamic data in order to evaluate the aggregated outcomes. That’s not a good context for exploratory testing. It doesn’t preclude it entirely, but exploratory tests do have to be back-fed daily into the expected outcomes.
And finally, when I asked what topics currently had her excited, Charles offered the following:
I am very interested in agile and the practical integration of real testing (rather than mere confirmation) into agile projects. But I have a concern that in the rush to small teams and small projects (which I think is mainly a good thing), larger integration issues are being ignored. Those could become important in the next while. I’m also interested at the moment in how we can get our hands around testing for risks that really matter to businesses. I’m exploring that in the tutorial I’m doing at CAST 2009 with Michael Bolton and in tutorials I’m doing elsewhere, and I’m also writing about it.
At some point in time, we all have likely engaged with some level of requirements-based testing (RBT) or development. While RBT is generally a fundamental in software quality, there can be limitations and cons as described in this Software Quality Insights tip. In my professional IT work, I currently focus primarily on infrastructure and related technologies. But that is not to say that we cannot take a page from the requirements-based disciplines.
One of the most frustrating situations in the new build process of any system or collection of systems is when others have applied technologies without defining requirements. This frequently arises in disaster recovery, system availability or business continuity arenas. While the development and application teams want to provision a highly-available solution, there can be a disconnect between the internal decisions and what infrastructure teams can provide.
The current IT landscape offers infrastructure professionals many options in protection and availability. This is made possible by virtualization, load-balancing solutions and a matured backup software space. My approach has been for the application and development teams to provide the availability requirements for a technology solution. This usually involves defining two important terms:
Recovery Time Objective (RTO) – How much time would be required to recover from any system-level failure that would make the system and business process available.
Recovery Point Objective (RPO) – A defined requirement for point-in-time recoverability for the business process.
The RTO and RPO work together to determine what availability and recovery would be provided to a business process. Infrastructure professionals can play this game quite well now with the current landscape of tools and technologies. Be sure to engage them as appropriate in the system provisioning process.
When you hit a key on your keyboard, the delay before the letter appear on your screen is mildly annoying. When you’re in a warship under enemy attack, having a delay before new radar information shows up would be deadly. In a nutshell, that’s the difference between general-purpose and real-time programming.
The difference, however, can be less obvious in business settings.
Not knowing when real-time applications are needed is a common mistake companies and software developers make, Eric Bruno and Greg Bollella, authors of Real-time Java Programming with Java RTS, told me recently.
“Some companies have real-time requirements but don’t interpret them as such,” said Greg Bollella, a Sun Microsystems distinguished engineer who leads R&D for real-time Java. An example would be financial companies involved in stock trading. “Often, they try to force a general-purpose system to behave as if it is a real-time system.”
For the most part, those efforts fail. “Response times are too slow, and code can become very fragile,” said Bruno, who has broad experience working on software design and architecture on financial trading and data and real-time news delivery.
In this video excerpt of our interview, Bollella and Bruno discuss this common mistake and others made in real-time Java programming:
I recently met and talked with authors Jim Clarke and Eric Bruno about JavaFX.. They co-wrote, with Jim Connors, the recently-released book on that subject, JavaFX: Developing Rich Internet Applications. Clarke and Bruno explain how JavaFX simplifies and improves the RIA development process in ther book and also in this video excerpt from our interview,. Their book offers an introduction to JavaFX, and then it heads off into nuts-and-bolts descriptions of how to use its its ready-built components and frameworks to build RIAs.
In an interview this week, Atlassian engineer/Nerd Herder, Pete Moore, told me that he got good news and bad news from software developers and engineers at JavaOne 2009 this year. The good news was that developers had moved beyond Code Review 101, but the down side was a lack of adoption of cloud and some backwards thinking about tool purchasing.
A good number of software engineers at JavaOne 2009 told Moore, that they still have to fight with management for approval to buy lightweight development tools. Wait, there’s a punchline: The big surprise is that the managers aren’t approving these requests due to lack of money, but rather because the managers “still believe in top-down purchasing of suites and one-size-fits-all,” Moore said. He couldn’t believe that the people holding development purse-strings had such an antiquated approach to buying software. Well, actually, he called them “ignorant managers.”
Fortunately, Moore said, Atlassian’s products are priced low enough that “most teams can sidestep the management silliness, because it fits in their discretionary budgets.”
I met Moore at JavaOne, where he showed me an animated 3D tee shirt logo. We talked about Atlassian’s comprehensive Java-based plugin architecture, a subject that drew a lot of interest from attendees in the booth. Here’s an excerpt from our conversation in this video.
Moore spent a lot of time at JavaOne talking about the nuts and bolts of integrating plugins into real life environments. “I think this underlines that engineers still want pragmatic point solutions,” he said when we talked this week. “Best of breed [software] was the catch cry a few years ago, and it’s still what the front lines want, they now just want them to work together!”
This was Moore’s fifth JavaOne, and “it was sensational that I didn’t have anyone who didn’t know what code coverage or peer code review was,” he said. That hasn’t been the case in the past, when he’s had to explain what per-test coverage was, “or worse, the merits of unit testing.”
Two years ago when Atlassian introduced its Crucible code review tool “the majority of
young developers had never done formal code review, and everyone was talking about pair programming,” he said. “This year, whilst there were still heaps of people who weren’t doing reviews, it seemed that every second person specifically wanted a demo of Crucible.”
Developers haven’t stepped up in another area, though. “I was disappointed not to see more development in the cloud in real life,” Moore said. Engineers like Atlassian’s Bamboo tool, with which one could start agents and do builds in the cloud. “But almost to a person they
said, ‘There’s no way we’d be allowed to use that.’ Here’s hoping that next year the story will be different.”
Next month, the Association for Software Testing (AST) will hold the fourth annual Conference of the Association for Software Testing (CAST) in Colorado Springs, Colorado. This will be the first year I won’t be attending, so I wanted to take a chance to catch up with some of the speakers to talk to them about their papers and presentations. The first pair of speakers I was able to catch up with were giving the closing keynote for the conference: Rob Sabourin and Tim Coulter.
Rob Sabourin is presently the President of AmiBug.Com Inc, a frequent guest lecturer at McGill University, the author of a short book illustrated by his daughter Catherine entitled “I Am a Bug,” a regular author of articles of software engineering topics, and he’s a regular speaker at just about every software testing conference you’ve heard of.
Tim Coulter is a software developer for The Open Planning Project, has participated in over ten software testing peer workshops, and he brings a fresh perspective to the practice of software testing which you can read on his blog at OneOfTheWolves.com.
Both Sabourin and Coulter are regulars at CAST, and this year they are taking on a rather interesting challenge with their closing keynote. Their talk, “Tim Bits: What I Learned About Software Testing at CAST 2009″ will be an attempt to summarize lessons learned from the 2009 talks and will use a mix of improv and group participation to make the lessons specific and relevant.
“We came up with the idea of ‘Tim Bits’ at a peer conference,” said Sabourin. “I think it was at a Workshop on Performance and Reliability in New York city in which I asked Tim to give us some quick lightening encapsulations of lessons he learned – as a novice – from presentations made by experienced professionals. Tim Bits is also the name of a popular doughnut hole treat at the famous Canadian chain Tim Horton’s and thus the pun began.”
Sabourin, a speaking veteran, has a history of taking on challenging keynote presentations. He’s done light but lesson-filled talks about software testing based on lessons-learned from the Simpsons, Dr. Seuss, and the Looney Tunes Gang. Two of the best talks I’ve seen him give include “A Whodunit? Testing Lessons from the Great Detectives” and “Peanuts and Crackerjacks: What Baseball Taught Me about Metrics.” But given that this talk depends on material presented by others in the two or three days before the closing keynote, I asked Sabourin how they plan to prepare.
“I’ve prepared a number of closing keynote-style presentations at STAR conferences in which I focus on pain points of delegates and how lessons learned from specific conference or tutorial sessions can be applied. So when Tim and I were asked to combine Tim Bits with the ‘Closing Lessons Learned’ to create our closing keynote at Cast, we of course said yes.” Sabourin went on to outline their planning. “Tim and I plan to spend several evenings together in New Jersey the week before CAST preparing our Framework. But the actual content will be captured on-the-fly during CAST.”
When I asked if the on-the-fly preparation was at all intimidating, Sabourin responded: “Not at all! We will be well prepared in advance, and spending time dialoguing with delegates to capture real learnings and applications on the fly during the conference will be fun. I feel that our talk at CAST can be a solid practical constructive step to not only making CAST more useful, but also in demonstrating the power of actively participating in the AST community.”
I also asked Coulter how he felt about the talk, and he said: “I’m extremely excited for this talk. This is going to be up there as one of the coolest things I’ve done so far, in testing or otherwise. The AST community has done so much for me since I started college that I’m happy to do anything I can to give back.”
When asked what they would be working on for next year, the two of them listed off several topics.
“I have been working hard on task analysis of session based exploratory testing implemented in real projects and especially in frameworks like SCRUM,” said Sabourin. “In 2010 I hope to share these experiences. I’m also dedicating a lot of time to visual modeling in test design and testing in turbulent contexts.”
Coulter has been thinking about how to put theory into practice. “I’ve thought testing history would be an exciting thing to research, and if I can get a speech or paper to come out of that I would be more than happy. In total though, I don’t know what’s to come. I envision a talk titled ‘Trying to make it in testing while discovering the (software) world around me,’ but when that’ll come I don’t know.”
For more on the upcoming show, check out the CAST conference website. Another great resource is this site’s info on Rob Sabourin, his book, or his classes. And here’s how to learn more about Tim Coulter, the man behind ‘Tim Bits,’ and his current projects.
JavaFX, a development platform for creating rich Internet applications, was in the spotlight at the recent JavaOne Conference, and the timing was great for Gail and Paul Anderson. The first copies of their book, Essential JavaFX, had just arrived at the conference.
I talked to the Andersons shortly after Oracle CEO Larry Ellison had plugged JavaFX in the JavaOne opening keynote and they’d attended a packed-room JavaFX introductory session.
“We finished this book in record time, which is good considering the high level of interest in JavaFX we see here,” said Gail Anderson. “It was great to hear Larry Ellison praising it and see so many people in the session asking questions and showing so much enthusiasm.” Paul Anderson added: “There are six million Java developers, most of whom can benefit from using JavaFX to create RIAs.” In particular, he said, anyone – not just Java programmers – creating RIAs that are rich in graphics and multimedia content could simplify their projects by using JavaFX.
JavaFX is needed, they said, because other programming languages – such as AJAX – are code-based and not standardized. JavaFX, on the other hand, is a graphical tool that focuses more on visualization and a graphical approach to programming than on code. JavaFX is basically a scripting language built upon and fully integrated with the Java Runtime Environment (JRE). JavaFX applications will run on any desktop and browser that runs the JRE and on top of mobile phones running JavaME.
Let’s hear more about JavaFX from the Andersons themselves in this video excerpt from our interview.
Everyone wants to finish the application development project he or she is working on. It’s a big deal to complete something. It makes you feel good. You get to cross something off a list or feel the satisfaction of seeing software you worked on make someone’s life better.
Sometimes there’s confusion about what it means to be “done.” Does being “done” mean I’m done? Or that the team’s done? Or that the date we said we’d be done has arrived? Or that the software is finally out the door? Confusion on what it means to be done is a dangerous thing.
Nothing is worse than someone saying they finished something — and sincerely believing they had — to later find out that you both had a different impression of what it meant to be finished. When it happens, you have to deal with both the work, and rework, required to get the item completed and the potential lost of trust on future work.
On my current team, we talk a lot about being done. It’s a big deal. We’ve even developed some bullets around what it means to be done with critical tasks.
During a sprint, a story is “done” when it has been:
- unit tested;
- peer reviewed;
- all alerts have been logged;
- feature tested;
- any defects have been fixed;
- and the product owner accepts the story at sprint review.
A sprint is “done” when:
- all the sprint goals are met;
- all the stories have been accepted by the product owners;
- the team has performed their retrospective;
- and all the work for that sprint has been scheduled and moved to the appropriate release.
Outside of a sprint, a release is “done” when:
- all stories have been merged in;
- all defect fixes have been merged in;
- the merge has been peer reviewed;
- all outstanding test charters for the release have been executed;
- all final regression testing is completed;
- any user acceptance testing is completed;
- all the configuration management work is completed;
- all the configuration management work has been peer reviewed;
- and the release is deployed to production.
So how did our definitions for the word “done” become lists? Each bullet tells a small story about our team and the way we work. Many of the bullet points imply development practices (like peer reviews, unit testing, and exploratory testing) while others are milestones or events (like product owner acceptance or deployment). You get a feel for both our process, and our history when you look at what it means for us to be done.
Think about your own team. What does done mean for you? Is that a shared meaning? Ask around. If done means something different for you or your team, consider posting your meaning in a comment below.
Get ready to see the absolute best demo of this year’s JavaOne. No, it’s not a demo of a product that’s for sale. It’s just for fun.
Roaming the exhibit hall at JavaOne here in San Francisco, I met Pete Moore, Nerd Herder at Atlassian. Yes, Nerd Herder is his title. First off, we spent some time talking about Atlassian’s Java powered products. Next, we talked about living in Sydney — a dream of mine. Then, Pete gave me a tee shirt and showed me that tee shirt in action.
What you’ll see is my video of the large screen on which the demo takes place. Remember the camera isn’t pointed at Pete. Believe me, in person, it’s even better.
After this, I’ll look at tee shirt designs in a completely different way.