Whenever I walk into a movie theater, I remember when I tested a self-service ticket machine. No one was paying me to test the kiosk. I was just killing time, waiting at a theater for someone to join me to watch a movie. The machine looked and functioned similar to an ATM. You select your movie, slide your credit card and print your tickets. What was great about the opportunity is that it allowed me to practice exploratory testing, usability testing, performance testing and security testing all at once.
I discovered that “playing” with the kiosk nicely illustrated what software testers do every day.
The system would allow you to select up to 10 tickets for each type of ticket you could purchase: adult, child and senior. While testing the limits of ticket selection and the proper calculation of the total amount, I noticed that if you max out the number of tickets for senior- and child-priced tickets, the system would beep at you each time you tried to select more then ten tickets. However, when you attempted to select more then ten tickets priced for adults, there was no beep. It made me wonder about the beep. Was it a usability feature?
After I was done doing my functional analysis of the system I had a chance to do some usability testing by watching people interact with the system. I noticed one case in particular that showed what I consider to be a serious defect. A lady using the system selected her movie, entered her credit card information and started waiting as the screen displayed the message: “Please wait while processing your transaction.” I assume that at this point the system was attempting to connect to whatever service it uses to process credit cards.
As luck would have it, at that moment credit card processing for the theater went down. I know this due to the very vocal population of customers at the ticket counter. Unfortunately for the lady making her self-service purchase, the ticket machine seemed to have hung as well. It just sat there saying “Please wait while processing your transaction.” No message saying: “Timed out while connecting to service. Please try again.” No message saying: “Trying your transaction again, please wait.” Nothing. It just sat there.
After about five minutes, the lady finally lost her patience and started pushing the cancel button. She pushed it once. She pushed it a second time – harder. She then pushed it five times in rapid succession. She then put all of her weight into the pushing of the button and kept the button down for several seconds. This processed continued for some time. I counted as she pushed the button over 40 times. Still the screen read: “Please wait while processing your transaction.” So much for the cancel option! She then left the machine and went to the ticket counter for help.
I found other issues while testing, but what stands out for me when reviewing this experience is not the issues I found, but that the process of finding issues “in the wild” is the same that we use “in the lab.” There was setup and configuration for my testing: show times; my credit card; connectivity to the bank; real users I could observe; and my watch to time transaction response times.
There was interaction with the system: myself and others pushing buttons: the system with the bank: the system with the system at the counter that the clerks used; customers swiping cards; and the system printing tickets and receipts.
There was observation of results: noticing beeps and information on the screen; looking at my receipt and tickets; looking at the time on my watch; listening to customer reactions and the conversations at the counter; and seeing the actions the user took under stress.
I was able to draw conclusions based on those observations: the need for better error messaging in the system; the probability of a bug around the beeping for adults; and the fact that the cancel key sticks could be due to multiple people applying fifty pounds of pressure for extended periods of time.
Does that testing process sound familiar?
I like this memory because it illustrates all the basic mechanics of software testing, regardless of the type. It doesn’t matter if it’s functional testing, usability testing, performance testing, security testing, or even automated testing:
- Testing almost always requires basic setup and system configuration.
- Testing requires that someone operate the test system or interact with it in some way.
- Testing requires that someone observe the results of those interactions.
- Testing requires that someone evaluate those results and draw conclusions.
What’s even better is that I learned something while waiting!
When I lead testing teams, the teams are typically doing session-based exploratory testing. A big part of session-based exploratory testing is the debrief. When testers complete a testing session (a time boxed testing effort focused on a specific test mission) they debrief with me as the testing manager. That means I might sit down with each tester two or three times a day to do debriefs.
In each debrief the tester walks me through what they tested, what issues they found, we discuss the impact of their testing to project risks and test coverage, and sometimes we review the notes from their testing. There’s a lot that can get covered in a debrief, so I’ve developed a list of questions that I can use to help me make sure I’ve covered everything when I’m debriefing someone.
- What was your mission for this session?
- What did you test and what did you find?
- What did you not test (and why)?
- How does your testing affect the remaining testing for the project? Do we need to add new charters or re-prioritize the remaining work?
- Is there anything you could have had that would have made your testing go faster or might have made your job easier?
- How do you feel about your testing?
I don’t use these questions as a template. Instead I use them to fill in the gaps. I’ll typically open with something generic like, “Tell me about your testing.” Then after the tester is done telling me about their session, I walk through this list in my head an make sure I have answers to each of these questions. If not, then I’ll go ahead and ask at that time.
Recently, during a class on exploratory testing where I review this list I was asked why I include the last question, “How do you feel about your testing?” For me, that’s a coaching question. I’m looking for the tester to express something that they might need help with. Often they do. They might say something like, “I wasn’t happy with my testing of X or Y.” Or they might say they didn’t feel prepared for the session. I’ll use this information to help them with their testing.
When you first start debriefs, they might be slow. Some might take five or ten minutes. But fear not, like anything the more you and your team do it – the easier it gets. Most debriefs take under five minutes, and some can be as quick as 60 seconds. The trick is to just make sure you’re not forgetting anything as you quickly move through the information.
Determining testing coverage is about figuring out what you’re going to test in the application. When I start this process, I start with a coverage outline. And while I like to develop coverage outlines in Excel, you can use just about any application you’d like. A lot of people use mind mapping tools, word, or a graphing tool like Visio or OmniGraffle.
I’ll often start by developing a generic list of items to cover while I’m testing. I typically do this by working through the elements of the SFPDO mnemonic to get things started. The SFPDO mnemonic comes from James Bach, and it’s a heuristic to help you figure out what you need to test. If you are not familiar with the SFDPO heuristic, it addresses the following:
- Structure: what the product is
- Function: what the product does
- Data: what it processes
- Platform: what it depends upon
- Operations: how it will be used
Within each of those areas, there are specific factors you can look at. For example the following list details out what’s included in Structure – often ignored area of test coverage:
- Code: the code structures that comprise the product, from executables to individual routines.
- Interfaces: points of connection and communication between sub-systems.
- Hardware: any hardware component that is integral to the product.
- Non-executable files: any files other than multimedia or programs, like text files, sample data, or help files.
- Collateral: anything beyond software and hardware that is also part of the product, such as paper documents, web links and content, packaging, license agreements, etc.
Using the SFDPO mnemonic, I’ll cover each area in detail to identify what I believe I should be testing. Once I have my initial list, I put it down and walk away from it. I do this for a couple of reasons. Normally, it’s because I’m tired, but also to give myself time away from the list to see if anything new occurs to me while I keep it in the back of my thoughts.
A second approach I use to identify coverage is to look at what test data I already have. I’ll see if there is any data I have access to that’s ready to use, or could be ready to use with very little work. Is there test data lying around from past projects or production that I can use? What coverage does that give me? Is there test data I can create easily with tools or automation? What coverage does that give me? If I find anything interesting, or if the data I find sparks any ideas, I’ll go back and add that to the coverage outline.
Finally, a their approach is to think about specific risks related to the product I’ll be testing. Sometimes I’ll use bug taxonomies to spark my thinking if I have a hard time getting started. These normally help me with generic risks. The one I reference most is the appendix to Kaner, Falk, and Nquyen’s Testing Computer Software. Once the taxonomy gets me going, I can normally think of some additional risks that are more specific to my application.
Regardless of where the ideas come from and how I develop it, once I have a coverage outline I work to get it reviewed with various project stakeholders. That typically involves dialog and trade-offs. I cut out a bunch of the stuff I wanted to test and add a bunch of stuff I didn’t think of. Over time, this outline evolves as my understanding of the application and the risks to the project evolve.
For more information on SFDPO, check out Bach’s original article or his methodology handout which details the specific product elements covered with the mnemonic. Also, if you don’t have a copy of Testing Computer Software, you can pick one up here.
Having tested software during many projects, I’ve seen that the most effective testers are the ones who start early — and I don’t mean the ones who start testing early. I’ll explain what I mean with this step-by-step tour of a pattern I’ve noticed multiple times in software testing projects.
1. A project starts.
2. All the testers work very hard to understand the problem they are trying to solve.
3. Eventually, some small amount of “working” — a very subjective term — software gets delivered to a development environment for the developers to do their unit testing and debugging.
4. One group of testers — often those doing exploratory testing or those working in an agile project context — ask the development team if they can get involved and help test that early code. They don’t care if they can’t log “defects,” because they just want to see the product and provide feedback when and if the developers think it would be helpful.
5. Another group of testers (often those doing scripted testing or those working in more “traditional” corporate testing environments) say they want to wait until unit testing is complete before they start their testing.
6. The group that starts early develops an early collaborative relationship with the developers who let them test their code.
7. Eventually, some small amount of formally “working” software gets delivered to a test environment for all the testers to begin their first test cycle.
8. At this point, both groups of testers start their first test cycle, and both find and log issues.
9. The issues found by the first group — those who worked more closely with the developers — tend to get resolved first, not based on priority or severity, but based on the personal relationship of the tester to the developer.
10. If a defect from that first group — those who worked more closely with the developers — can’t be reproduced, the developer comes over to work with the exploratory tester to isolate the problem.
11. If a defect from the second group — those who waited to start testing — can’t be reproduced, the developer sends it back as “can not reproduce” or the ever famous response, “It works on my machine.”
Before someone points out that “it’s not always practical to start testing early,” and that “there are a lot of good reasons to wait,” I get it. I’m not advocating that in all circumstances you should start testing early. There are plenty of reasons where that might not be practical. However, I’ve never worked in anywhere I would say that’s been the case. And I’ve seen this pattern a lot. It’s not universal, but it’s very common.
The testers who get a jump-start on collaborating with the development often have a closer relationship, and are thus viewed as an asset. It is these testers who are often pressed to help isolate a defects, even if it wasn’t logged by them. They are also the testers who get invited to design review meetings, because their opinions are highly valued.
Does that mean you have to get involved early to be an asset or to have those relationships? Absolutely not. But, I suspect that if you can, you’ll have a better chance of collaborating with people on your project team before pressures are high, stress is up and your “feedback” is viewed as a call for another late night debugging session.
There’s nothing more intimidating than a blank sheet of paper. Writers know this to be true, but so do test managers. The easy way out is to pull out a template and to start filling in the various “recommended” sections and details. An even easier approach is to pull out a past test plan and to just start changing project names, diagrams, and technologies. However, these approaches miss the point.
Recently while writing a test plan for a new project, I’ve noticed an odd habit I’ve developed. Ten years ago, when I wrote a test plan I started with a template. Four years ago, if I wrote a test plan I started with a blank sheet of paper. I noticed that when I write a test plan today, I look at templates, decide not to use them, and then end up pulling in pieces of them anyway.
The planning process isn’t about producing a document. Okay, well it shouldn’t be about producing a document. I recognize that in some companies it is. Instead, it’s about thinking about the problem. Software development problems are difficult and solving these problems requires time spent in research, comparing options, and prototyping. Our planning process, in the early stages, is about exploring those options and elaborating on what we think we’ll need to do (and when we’ll need to do it).
I find templates keep me from thinking about the actual problem. Instead they get me thinking about formatting and populating sections that aren’t yet filled in. When I’m using a template, I’m thinking about polish – not content.
However, there’s value to templates. They’re useful checklists for what types of things you should think about. I forget stuff just like anyone else. I’ve gotten a lot of good ideas from templates. So I’ve developed a habit of using a template to “prime the pump.”
I take an initial look at my templates and past test plans and use that to help get me started on problem solving. I’ll then switch over to a blank sheet of paper and start typing out my ideas and thoughts about what we need to test and how we should test it. Later, when I feel I’ve got most of my content, I’ll go back to a template and start pasting the content into the appropriate sections.
This technique keeps me from focusing on polish at the wrong time. There’s nothing wrong with polish, I just don’t want to be thinking about what font to use when I should be thinking about how I’m going to find or generate test data. This technique keeps me free from distractions when what I really need to be doing is focusing on the problem. This helps me deal with some of the intimidation of the blank page, but also allows me to be focused on the difficult topics when that’s what needs to be done.
Before Windows Vista came along and ruined it all, I previously used a bug in Windows Notepad to illustrate a problem testers often face. Vista ruined it by fixing the bug. If you have a version of Windows pre-Vista, you can still try this bug out. To reproduce the issue, open Notepad and type “this app can break”. Then save the file. If you were to close the file and re-open it, you’d find that your data has been corrupted.
Spoiler alert: If you want to research the Notepad problem and see if you can figure out what the issue is, then stop reading. I’ll tell you now it’s not an Easter egg, even though it looks like one.
On Windows XP Notepad calls a method titled IsTextUnicode when it opens a file. You can read about it here. The noteworthy text on this page is the following:
“This function uses various statistical and deterministic methods to make its determination […] These tests are not foolproof. The statistical tests assume certain amounts of variation between low and high bytes in a string, and some ASCII strings can slip through.”
What that text states is that Notepad uses a heuristic algorithm to open a file. Like any heuristic, it’s a solution to a problem that works most of the time. That’s why you’ve likely never seen that bug before. There are only a finite set of conditions that will cause it to fail.
This bug represents several problems that many testers face everyday:
- When the development solution is heuristic, or the number of variables involved makes a deterministic solution to the problem impossible to determine manually, testers have to expect that there are cases they will miss that could expose problems. For Notepad, that’s fine. For a heart monitor, it might not be.
- A method that a developer uses might work perfect for two uses (Word and Wordpad), but might fail when used for a third (perhaps inappropriate) application. We use so many third party languages and frameworks when we develop today, it’s impossible for a developer to keep all of the code they didn’t write straight.
- As testers, we often need to dig into a problem well past the point of saying: “I noticed what might be problem here.” If we understand why this is a problem, it helps us refine our models of the applications we’re testing. Now your model of Notepad should have changed to imply that Notepad uses a lot of the same code-base that Word uses. That’s interesting to know when testing because it gives you another oracle for what the correct behavior might be. It can also inform your conjectures for application behavior.
I was first given this testing/debug problem by James Bach a number of years ago (pre-Windows XP I think). I think I spent over an hour testing and researching until I came upon the root cause of the problem. It was a valuable lesson for me. Because of this experience, I now look forward to opportunities to help with issue research and isolation.
Recently, fellow SearchSoftwareQuality.com expert David Christiansen shared his post about experiences with testing ruts that he gets into and what he does to stay out of those ruts.
What resonated with me was his description of how he sometimes doesn’t feel like working to isolate bugs:
You did x, y, and z and the app crashed, so you filed a bug report and moved on. Does it crash with just x? Are there variants of y and z that don’t make it crash? How do they work together? If you don’t know and don’t care, you need to power up.”
Dave points out what I believe is an important step for software testers. I’ve seen many testers encounter what could be critical issues, they log a defect ticket in passing with a shallow description of the problem, and they move on. Just to be fair, I’ve done it too. When this happens, I find that often there are two outcomes:
- The issue isn’t looked at immediately, or even fixed, because the description is vague, looks like an edge case, and doesn’t have clear implications past the immediate problem identified in the ticket.
- The tester misses out on a deep and rich opportunity to learn more about the application, how it was developed, and what dependencies it has. I find that some of my most insightful observations about the system, how it works, and how that relates to the testing I’m doing comes from isolating defects.
While you don’t need to track down a possible issue to the offending line of code, I think a tester should be able to draw a clear chalk outline around the issue. That means they should be able to say, with some confidence, that under what conditions it does and doesn’t occur, and what minimal set of conditions appear to trigger it. If they can, they should talk about potential impact – but only if it’s data-driven analysis and relevant to getting the issue fixed.
To that end, the following tips might be helpful for when you’re working to isolate an issue:
- Take excellent notes and keep all the evidence. This includes test execution notes in a notepad, screenshots or screen capture utilities, copies of log files, snapshots of disks or virtual images, etc….
- Work to recall what you were doing before you found the problem. Often times, if the cause of the problem isn’t obvious, it was something you did five steps earlier that triggered what you saw. If you can find the deviant step, try variants of that activity to see how else the problem manifests itself.
- If the investigation goes for more than a day, find a place to share information about the problem with the rest of the team (a wiki, SharePoint, or a defect tracking tool). I often find it useful to keep lists of the following information:
- a list of the symptoms of the problem
- a list of what variables you’ve looked at already as you try to reproduce the issue
- a list of the variable you haven’t looked at yet, but you suspect they might be related
- a list of who you’ve spoke with about the issue (and any details they provide)
- a list of possible workarounds
- a list of possible tools (or techniques you may not know or be good at) that might help
At some point, it’s important to recognize that with any difficult problem, you’ll need some stopping heuristics. Of course the one we all want to use is, “I found the problem.” However, sometimes that doesn’t happen. Make sure you have a clear idea of how important the problem is and how much time you have to dedicate to it so at the appropriate time you can drop it or shelve it for later.
For more on this topic, and dealing with other testing ruts, be sure to checkout Dave’s entire post on testing ruts and how he deals with them.
The topic of cross-platform usability came up in several sessions at this year’s Ajax Experience in Boston Massachusetts. Naturally, Ajax is so popular because its open standards model is supported by many browsers and platforms. Speakers urged conference attendees to extend that advantage, enhancing cross-platform, cross-browser compatibility and usability in Ajax-based rich Internet applications.
Paul Burnett of Creative Solutions,, for example, praised Adobe’s Dreamweaver cs4 for its cross-platform functionality, made possible by huge improvements in its Aire bridge program. “Dreamweaver cs4 boasts new usability features and widgets, which have increased the power of this tool and put that power in its users hands,” said Burnett during his session on Building standards-based and accessible Ajax sites with Dreamweaver cs4.
Nicole Sullivan, founder of of Object Oriented CSS, preached the effectiveness of strong base coding for similarities that are no longer machine dependent in her session on creating scalable, fast websites. She has created templates that make it possible to have nearly identical web page layouts, no matter what machine, hardware or server they are viewed on. She has done this through manipulation of CSS for site templates by predefining, and sometimes redefining, site functions. Her co-speaker, Gonzalo Cordero, demonstrated some of the form and function of their appearence driven code would work on a live website.
Developers can now make layouts that will look good no matter who is looking at them and on what platform, she said.”Fixed dimensions limit the ways in which a module can be used and reused,” Sullivan said. “We have defined structure in a separate class.”
Concurrency defects in multi-core or multi-threaded applications are probably the source of more troublesome problems than other defects in recent history, Coverity’s Mark Donsky told me recently. These defects are tricky, because they are “virtually impossible to reduce reliably,” he said, and “can take months of painstaking effort to reproduce and fix using traditional testing methodologies.”
Donsky filled me in on some of the most common concurrency defects and spotlighted the three that are currently causing the most problems: race conditions, deadlocks and thread blocks.
- Race conditions describe what happens when multiple threads access shared data without appropriate locks. “When a race condition occurs, one thread may inadvertently overwrite data used by another thread,” he said. “This results in data loss and corruption.”
- A deadlock can occur when two or more two or more threads are each waiting for each other to release a resource. “Some of the most frustrating deadlocks involve three, four or even more threads in a circular dependency,” said Donsky. “As with race conditions, deadlocks are virtually impossible to reproduce and can delay product releases by months as engineering teams scramble to isolate the cause of a deadlock.”
- A thread block happens when a thread invokes a long-running operation while holding a lock that other threads are waiting for. Although the first thread will continue executing, all other threads are blocked until the long-running operation completes. Consequently, Donsky explained, “a thread block can bring many aspects of a multi-threaded application down to a grinding halt.”
Concurrency defects showed up very often in last year’s Coverity Open Source Report, an analysis of over 50 million lines of open source code, and continue to pose a big problem, said Donsky.
Other common, crash-causing defects include null pointer dereferencing, resource leaks, buffer overruns and unsafe use of returned null values, according to Donsky, director of project management for San Francisco-based Coverity, maker of software integrity products.
Coverity’s open source security Scan site has been catching some headline-making defects recently, inclulding the 0day Local Linux root exploit. “As part of the Scan program, we reported this issue back to key Linux developers so that they could respond to this vulnerability,” said Donsky.
Catching software defects before they go into production, said Donsky, is the best way for your software not to make the wrong kind of front-page news.
Steve Souders, author and respected authority on page performance, issued a call to arms for software users, developers and testers to improve the performance and power of web sites. At The Ajax Experience 2009 in Boston yesterday, he gave a state-of-page-performance overview and tips on boosting performance in a session called “Even Faster Web Sites.”
“We are really in the infancy of page performance,” said Souders, who encouraged new perspectives on the development and maintenance of sites.
Focusing on the web site back end performance is an immature practice that’s not productive, Souders said. “The back end only makes up about 20% of the total load time, and still the largest and arguably most important component the front end goes largely ignored,” he said.
Souders recommended the use of tools available online, such as Yslow, a program that Souders created, as well as Google’s Page Speed. Both can help accelerate load times and overall performance.
Addressing load times is particularly important, he said, noting the adverse effects of slow load speeds on views, traffic, and revenue.
The key to online success, said Souder, lies in attention to detail, user experience, efficiency, speed and, of course, the front end.
Stay tuned in for more from The Ajax Experience 2009.