Determining testing coverage is about figuring out what you’re going to test in the application. When I start this process, I start with a coverage outline. And while I like to develop coverage outlines in Excel, you can use just about any application you’d like. A lot of people use mind mapping tools, word, or a graphing tool like Visio or OmniGraffle.
I’ll often start by developing a generic list of items to cover while I’m testing. I typically do this by working through the elements of the SFPDO mnemonic to get things started. The SFPDO mnemonic comes from James Bach, and it’s a heuristic to help you figure out what you need to test. If you are not familiar with the SFDPO heuristic, it addresses the following:
- Structure: what the product is
- Function: what the product does
- Data: what it processes
- Platform: what it depends upon
- Operations: how it will be used
Within each of those areas, there are specific factors you can look at. For example the following list details out what’s included in Structure – often ignored area of test coverage:
- Code: the code structures that comprise the product, from executables to individual routines.
- Interfaces: points of connection and communication between sub-systems.
- Hardware: any hardware component that is integral to the product.
- Non-executable files: any files other than multimedia or programs, like text files, sample data, or help files.
- Collateral: anything beyond software and hardware that is also part of the product, such as paper documents, web links and content, packaging, license agreements, etc.
Using the SFDPO mnemonic, I’ll cover each area in detail to identify what I believe I should be testing. Once I have my initial list, I put it down and walk away from it. I do this for a couple of reasons. Normally, it’s because I’m tired, but also to give myself time away from the list to see if anything new occurs to me while I keep it in the back of my thoughts.
A second approach I use to identify coverage is to look at what test data I already have. I’ll see if there is any data I have access to that’s ready to use, or could be ready to use with very little work. Is there test data lying around from past projects or production that I can use? What coverage does that give me? Is there test data I can create easily with tools or automation? What coverage does that give me? If I find anything interesting, or if the data I find sparks any ideas, I’ll go back and add that to the coverage outline.
Finally, a their approach is to think about specific risks related to the product I’ll be testing. Sometimes I’ll use bug taxonomies to spark my thinking if I have a hard time getting started. These normally help me with generic risks. The one I reference most is the appendix to Kaner, Falk, and Nquyen’s Testing Computer Software. Once the taxonomy gets me going, I can normally think of some additional risks that are more specific to my application.
Regardless of where the ideas come from and how I develop it, once I have a coverage outline I work to get it reviewed with various project stakeholders. That typically involves dialog and trade-offs. I cut out a bunch of the stuff I wanted to test and add a bunch of stuff I didn’t think of. Over time, this outline evolves as my understanding of the application and the risks to the project evolve.
For more information on SFDPO, check out Bach’s original article or his methodology handout which details the specific product elements covered with the mnemonic. Also, if you don’t have a copy of Testing Computer Software, you can pick one up here.
Having tested software during many projects, I’ve seen that the most effective testers are the ones who start early — and I don’t mean the ones who start testing early. I’ll explain what I mean with this step-by-step tour of a pattern I’ve noticed multiple times in software testing projects.
1. A project starts.
2. All the testers work very hard to understand the problem they are trying to solve.
3. Eventually, some small amount of “working” — a very subjective term — software gets delivered to a development environment for the developers to do their unit testing and debugging.
4. One group of testers — often those doing exploratory testing or those working in an agile project context — ask the development team if they can get involved and help test that early code. They don’t care if they can’t log “defects,” because they just want to see the product and provide feedback when and if the developers think it would be helpful.
5. Another group of testers (often those doing scripted testing or those working in more “traditional” corporate testing environments) say they want to wait until unit testing is complete before they start their testing.
6. The group that starts early develops an early collaborative relationship with the developers who let them test their code.
7. Eventually, some small amount of formally “working” software gets delivered to a test environment for all the testers to begin their first test cycle.
8. At this point, both groups of testers start their first test cycle, and both find and log issues.
9. The issues found by the first group — those who worked more closely with the developers — tend to get resolved first, not based on priority or severity, but based on the personal relationship of the tester to the developer.
10. If a defect from that first group — those who worked more closely with the developers — can’t be reproduced, the developer comes over to work with the exploratory tester to isolate the problem.
11. If a defect from the second group — those who waited to start testing — can’t be reproduced, the developer sends it back as “can not reproduce” or the ever famous response, “It works on my machine.”
Before someone points out that “it’s not always practical to start testing early,” and that “there are a lot of good reasons to wait,” I get it. I’m not advocating that in all circumstances you should start testing early. There are plenty of reasons where that might not be practical. However, I’ve never worked in anywhere I would say that’s been the case. And I’ve seen this pattern a lot. It’s not universal, but it’s very common.
The testers who get a jump-start on collaborating with the development often have a closer relationship, and are thus viewed as an asset. It is these testers who are often pressed to help isolate a defects, even if it wasn’t logged by them. They are also the testers who get invited to design review meetings, because their opinions are highly valued.
Does that mean you have to get involved early to be an asset or to have those relationships? Absolutely not. But, I suspect that if you can, you’ll have a better chance of collaborating with people on your project team before pressures are high, stress is up and your “feedback” is viewed as a call for another late night debugging session.
There’s nothing more intimidating than a blank sheet of paper. Writers know this to be true, but so do test managers. The easy way out is to pull out a template and to start filling in the various “recommended” sections and details. An even easier approach is to pull out a past test plan and to just start changing project names, diagrams, and technologies. However, these approaches miss the point.
Recently while writing a test plan for a new project, I’ve noticed an odd habit I’ve developed. Ten years ago, when I wrote a test plan I started with a template. Four years ago, if I wrote a test plan I started with a blank sheet of paper. I noticed that when I write a test plan today, I look at templates, decide not to use them, and then end up pulling in pieces of them anyway.
The planning process isn’t about producing a document. Okay, well it shouldn’t be about producing a document. I recognize that in some companies it is. Instead, it’s about thinking about the problem. Software development problems are difficult and solving these problems requires time spent in research, comparing options, and prototyping. Our planning process, in the early stages, is about exploring those options and elaborating on what we think we’ll need to do (and when we’ll need to do it).
I find templates keep me from thinking about the actual problem. Instead they get me thinking about formatting and populating sections that aren’t yet filled in. When I’m using a template, I’m thinking about polish – not content.
However, there’s value to templates. They’re useful checklists for what types of things you should think about. I forget stuff just like anyone else. I’ve gotten a lot of good ideas from templates. So I’ve developed a habit of using a template to “prime the pump.”
I take an initial look at my templates and past test plans and use that to help get me started on problem solving. I’ll then switch over to a blank sheet of paper and start typing out my ideas and thoughts about what we need to test and how we should test it. Later, when I feel I’ve got most of my content, I’ll go back to a template and start pasting the content into the appropriate sections.
This technique keeps me from focusing on polish at the wrong time. There’s nothing wrong with polish, I just don’t want to be thinking about what font to use when I should be thinking about how I’m going to find or generate test data. This technique keeps me free from distractions when what I really need to be doing is focusing on the problem. This helps me deal with some of the intimidation of the blank page, but also allows me to be focused on the difficult topics when that’s what needs to be done.
Before Windows Vista came along and ruined it all, I previously used a bug in Windows Notepad to illustrate a problem testers often face. Vista ruined it by fixing the bug. If you have a version of Windows pre-Vista, you can still try this bug out. To reproduce the issue, open Notepad and type “this app can break”. Then save the file. If you were to close the file and re-open it, you’d find that your data has been corrupted.
Spoiler alert: If you want to research the Notepad problem and see if you can figure out what the issue is, then stop reading. I’ll tell you now it’s not an Easter egg, even though it looks like one.
On Windows XP Notepad calls a method titled IsTextUnicode when it opens a file. You can read about it here. The noteworthy text on this page is the following:
“This function uses various statistical and deterministic methods to make its determination […] These tests are not foolproof. The statistical tests assume certain amounts of variation between low and high bytes in a string, and some ASCII strings can slip through.”
What that text states is that Notepad uses a heuristic algorithm to open a file. Like any heuristic, it’s a solution to a problem that works most of the time. That’s why you’ve likely never seen that bug before. There are only a finite set of conditions that will cause it to fail.
This bug represents several problems that many testers face everyday:
- When the development solution is heuristic, or the number of variables involved makes a deterministic solution to the problem impossible to determine manually, testers have to expect that there are cases they will miss that could expose problems. For Notepad, that’s fine. For a heart monitor, it might not be.
- A method that a developer uses might work perfect for two uses (Word and Wordpad), but might fail when used for a third (perhaps inappropriate) application. We use so many third party languages and frameworks when we develop today, it’s impossible for a developer to keep all of the code they didn’t write straight.
- As testers, we often need to dig into a problem well past the point of saying: “I noticed what might be problem here.” If we understand why this is a problem, it helps us refine our models of the applications we’re testing. Now your model of Notepad should have changed to imply that Notepad uses a lot of the same code-base that Word uses. That’s interesting to know when testing because it gives you another oracle for what the correct behavior might be. It can also inform your conjectures for application behavior.
I was first given this testing/debug problem by James Bach a number of years ago (pre-Windows XP I think). I think I spent over an hour testing and researching until I came upon the root cause of the problem. It was a valuable lesson for me. Because of this experience, I now look forward to opportunities to help with issue research and isolation.
Recently, fellow SearchSoftwareQuality.com expert David Christiansen shared his post about experiences with testing ruts that he gets into and what he does to stay out of those ruts.
What resonated with me was his description of how he sometimes doesn’t feel like working to isolate bugs:
You did x, y, and z and the app crashed, so you filed a bug report and moved on. Does it crash with just x? Are there variants of y and z that don’t make it crash? How do they work together? If you don’t know and don’t care, you need to power up.”
Dave points out what I believe is an important step for software testers. I’ve seen many testers encounter what could be critical issues, they log a defect ticket in passing with a shallow description of the problem, and they move on. Just to be fair, I’ve done it too. When this happens, I find that often there are two outcomes:
- The issue isn’t looked at immediately, or even fixed, because the description is vague, looks like an edge case, and doesn’t have clear implications past the immediate problem identified in the ticket.
- The tester misses out on a deep and rich opportunity to learn more about the application, how it was developed, and what dependencies it has. I find that some of my most insightful observations about the system, how it works, and how that relates to the testing I’m doing comes from isolating defects.
While you don’t need to track down a possible issue to the offending line of code, I think a tester should be able to draw a clear chalk outline around the issue. That means they should be able to say, with some confidence, that under what conditions it does and doesn’t occur, and what minimal set of conditions appear to trigger it. If they can, they should talk about potential impact – but only if it’s data-driven analysis and relevant to getting the issue fixed.
To that end, the following tips might be helpful for when you’re working to isolate an issue:
- Take excellent notes and keep all the evidence. This includes test execution notes in a notepad, screenshots or screen capture utilities, copies of log files, snapshots of disks or virtual images, etc….
- Work to recall what you were doing before you found the problem. Often times, if the cause of the problem isn’t obvious, it was something you did five steps earlier that triggered what you saw. If you can find the deviant step, try variants of that activity to see how else the problem manifests itself.
- If the investigation goes for more than a day, find a place to share information about the problem with the rest of the team (a wiki, SharePoint, or a defect tracking tool). I often find it useful to keep lists of the following information:
- a list of the symptoms of the problem
- a list of what variables you’ve looked at already as you try to reproduce the issue
- a list of the variable you haven’t looked at yet, but you suspect they might be related
- a list of who you’ve spoke with about the issue (and any details they provide)
- a list of possible workarounds
- a list of possible tools (or techniques you may not know or be good at) that might help
At some point, it’s important to recognize that with any difficult problem, you’ll need some stopping heuristics. Of course the one we all want to use is, “I found the problem.” However, sometimes that doesn’t happen. Make sure you have a clear idea of how important the problem is and how much time you have to dedicate to it so at the appropriate time you can drop it or shelve it for later.
For more on this topic, and dealing with other testing ruts, be sure to checkout Dave’s entire post on testing ruts and how he deals with them.
The topic of cross-platform usability came up in several sessions at this year’s Ajax Experience in Boston Massachusetts. Naturally, Ajax is so popular because its open standards model is supported by many browsers and platforms. Speakers urged conference attendees to extend that advantage, enhancing cross-platform, cross-browser compatibility and usability in Ajax-based rich Internet applications.
Paul Burnett of Creative Solutions,, for example, praised Adobe’s Dreamweaver cs4 for its cross-platform functionality, made possible by huge improvements in its Aire bridge program. “Dreamweaver cs4 boasts new usability features and widgets, which have increased the power of this tool and put that power in its users hands,” said Burnett during his session on Building standards-based and accessible Ajax sites with Dreamweaver cs4.
Nicole Sullivan, founder of of Object Oriented CSS, preached the effectiveness of strong base coding for similarities that are no longer machine dependent in her session on creating scalable, fast websites. She has created templates that make it possible to have nearly identical web page layouts, no matter what machine, hardware or server they are viewed on. She has done this through manipulation of CSS for site templates by predefining, and sometimes redefining, site functions. Her co-speaker, Gonzalo Cordero, demonstrated some of the form and function of their appearence driven code would work on a live website.
Developers can now make layouts that will look good no matter who is looking at them and on what platform, she said.”Fixed dimensions limit the ways in which a module can be used and reused,” Sullivan said. “We have defined structure in a separate class.”
Concurrency defects in multi-core or multi-threaded applications are probably the source of more troublesome problems than other defects in recent history, Coverity’s Mark Donsky told me recently. These defects are tricky, because they are “virtually impossible to reduce reliably,” he said, and “can take months of painstaking effort to reproduce and fix using traditional testing methodologies.”
Donsky filled me in on some of the most common concurrency defects and spotlighted the three that are currently causing the most problems: race conditions, deadlocks and thread blocks.
- Race conditions describe what happens when multiple threads access shared data without appropriate locks. “When a race condition occurs, one thread may inadvertently overwrite data used by another thread,” he said. “This results in data loss and corruption.”
- A deadlock can occur when two or more two or more threads are each waiting for each other to release a resource. “Some of the most frustrating deadlocks involve three, four or even more threads in a circular dependency,” said Donsky. “As with race conditions, deadlocks are virtually impossible to reproduce and can delay product releases by months as engineering teams scramble to isolate the cause of a deadlock.”
- A thread block happens when a thread invokes a long-running operation while holding a lock that other threads are waiting for. Although the first thread will continue executing, all other threads are blocked until the long-running operation completes. Consequently, Donsky explained, “a thread block can bring many aspects of a multi-threaded application down to a grinding halt.”
Concurrency defects showed up very often in last year’s Coverity Open Source Report, an analysis of over 50 million lines of open source code, and continue to pose a big problem, said Donsky.
Other common, crash-causing defects include null pointer dereferencing, resource leaks, buffer overruns and unsafe use of returned null values, according to Donsky, director of project management for San Francisco-based Coverity, maker of software integrity products.
Coverity’s open source security Scan site has been catching some headline-making defects recently, inclulding the 0day Local Linux root exploit. “As part of the Scan program, we reported this issue back to key Linux developers so that they could respond to this vulnerability,” said Donsky.
Catching software defects before they go into production, said Donsky, is the best way for your software not to make the wrong kind of front-page news.
Steve Souders, author and respected authority on page performance, issued a call to arms for software users, developers and testers to improve the performance and power of web sites. At The Ajax Experience 2009 in Boston yesterday, he gave a state-of-page-performance overview and tips on boosting performance in a session called “Even Faster Web Sites.”
“We are really in the infancy of page performance,” said Souders, who encouraged new perspectives on the development and maintenance of sites.
Focusing on the web site back end performance is an immature practice that’s not productive, Souders said. “The back end only makes up about 20% of the total load time, and still the largest and arguably most important component the front end goes largely ignored,” he said.
Souders recommended the use of tools available online, such as Yslow, a program that Souders created, as well as Google’s Page Speed. Both can help accelerate load times and overall performance.
Addressing load times is particularly important, he said, noting the adverse effects of slow load speeds on views, traffic, and revenue.
The key to online success, said Souder, lies in attention to detail, user experience, efficiency, speed and, of course, the front end.
Stay tuned in for more from The Ajax Experience 2009.
In college, I knew two guys who tested DVD players. Their job was to watch movies. Even better, they got to pick the movies! What a sweet deal right? Who wouldn’t want that testing gig?
Well, there’s a catch. There’s always a catch. While they could pick any movie, they had to watch the movie hundreds of times. Repeatedly. Again and again. I suspect that the two of them could recite every line of Gladiator in reverse order. Apparently, after you’ve seen a movie, any movie, a couple hundred times, you don’t like it any longer.
Why, might you ask, would they need to watch the same movie over and over? Well, they were testing different aspects of the audio and video drivers in the DVD player, along with the consumer-facing software that ran the device. The only way to do that, is to actually watch a movie. And the only way for you to be able to detect a small and subtle issue with the video rendering would be if you knew the movie by heart and could recognize even the smallest detail being out of place.
Talking with these fellows opened my eyes to what testers refer to as the oracle problem. An oracle is mechanism by which a tester recognizes a problem with the software. When you have an oracle — an expected result, a requirements document, a previous version of the product, etc. — you can determine when things are working and when they aren’t. For most of us, that’s text or a picture in a document that either does or doesn’t match what we see on the screen while we’re testing. For these guys, it was their memory of how the movie should sound, look, and “feel.”
The oracle problem is that all oracles are fallible. For example, requirements specifications are incomplete, contain conflicting information, or are ambiguous. Expected results in a test case only detail out a small portion of what’s expected, the tiny little portion of the application and functionality the test is designed to expose. Oracles are hard to find, require work to effectively use and always leave a tester wanting for more. That’s the problem.
For the DVD testers, the test case was the movie, and the oracle was their memory of it. Make a configuration change, watch the movie. Get a new version of a driver, watch the movie. Use a feature of the consumer-facing software, keep watching the movie. They were the oracle. They identified the problem. This context helps highlight the role we as testers play in interpreting and exercising the various oracles that we apply.
Take a second to think about the test oracles you use on a daily basis. Where do they come from? What rules do you use to interpret them? When there are ambiguities, or gaps in information, where do you go for disambiguation? What role do you play in interpreting how the oracles you use are applied, or in determining to what degree a test passes or fails?
And, if you’re not up for reflective questions about the work you do every day, instead just think about what movie you’d choose to watch over and over again — day in and day out. It’s a difficult question, because whatever movie you choose, you’ll never want to see it outside of work again.
Virtualization and virtual lab management systems can cut application testing and QA times significantly, thus speeding development, GlassHouse consultant Rob Zylowski said in our interview at VMworld 2009 in San Francisco. Yet, he estimates that only about 10 percent of application development teams using virtual lab managers like VMware Lab Manager and VMLogix LabManager 3.8.
Most adopters of virtual lab management software are doing development, testing and QA in the data center to do system troubleshooting. “That’s a good use, but it’s not as powerful as taking virtual lab managers fully into the application development, test and QA departments,” Zylowski said.
The learning curve and developers’ resistance to giving up their in-department servers are two barriers to adoption, Zylowki said. Those barriers are insignificant when compared with the savings in development and testing time and reduction of team conflicts and repetitive work enabled by virtual lab managers.
A key value of virtual lab managers is the ability to take snapshots as developers code and quality assurance (QA) testing is done. “Systems like VMware Lab Manager give incredible power to developers to troubleshoot,” said Zylowski, director of virtualization services for Framingham, Mass.-based GlassHouse Technologies Inc.
Zylowsky talks about more issues related to and uses for tools like VMware Lab Manager in this video excerpt from our interview.