Some challenges are not worth taking on, said consultant Lloyd Roden during his StarWest 2009 keynote here at the Disneyland Hotel today. A software testing expert for U.K.-based Grove Consultants, Roden started off his talk about testers’ top challenges today with a warning about setting up the wrong challenges for a test organization.
“We can’t fight every challenge. You can’t do everything,” Roden said.
Examine the preparation needed and the talents of your team before setting goals, Roden advised. “Climbing everest would be a challenge, but it would be a stupid one to undertake without preparation and skill,” he said. “Cooking a meal for 20 would be a challenge for some and not for me, because I love cooking and have cooked a lot.”
A good challenge improves the people who take it on and opens their minds to different approaches to achieving the goal at hand. “Bad challenges are harmful and have undesirable consequences,” he said.
After advising testers there to “choose your battles carefully,” Roden gave a personal example, recalling his daughter asking for pierced ears when she was 14 years old. At first he was against the ear piercing, his daughter’s first request, until his wife explained that piercings didn’t have to be permanent. So, he chose not to say no to piercings. He did say no to tattoos, which are permanent, until she was 18. He didn’t feel she was prepared to make a decision with permanent consequences.
One way test managers can know what challenges to set for test teams is to do testing himself. He often runs into test managers who do no testing and believes they’re missing the opportunity to really know what goes on with his team everyday. Remaining outside of testing can lead test managers to set unrealistics goals and challenges.
Software consultant, trainer and author James Bach is a bit of a lightning rod in our industry. Most of his notoriety comes from his stand on software testing certifications. He doesn’t like them and isn’t shy about letting people know it. Bach also challenges other widely held beliefs in our industry.
Bach takes his off-the-certification-track testing tutorial, How to teach yourself testing, to StarWest 2009 in Anaheim this week. He’ll also take on some of the myths about rigor in software testing at the Pacific Northwest Software Quality Conference (PNSQC), which takes place Oct 26-28, 2009 in Portland Oregon.
The software myths talk gathers into one place and in succinct form arguments that Bach has been making indirectly for years.
“I keep hearing that more rigor is good and less rigor is bad. Some managers who’ve never studied testing, never done testing, probably have never even seen testing up close, nevertheless insist that it be rigorously planned in advance and fully documented. This is a cancer of ignorance that hobbles our craft.”
Bach’s talk is about clearing up what he calls the “silliness and sloppiness” surrounding the notion of rigorous processes.
“Managers want to say ‘let’s get a lot more rigorous in our processes!’ They may say ‘formal’ or ‘repeatable,’ but it’s all the same sort of thing,” he told me. “But getting rigorous is no panacea. It’s actually a bad idea, in many cases, because rigor can interfere with learning by locking us into bad practice. We need to apply rigor without obsession or compulsion, and at the right level, so that our testing is flexible and inexpensive.”
Bach has been a test manager or consulting tester since Apple lured him from a programming career in 1987. He spent about 10 years in Silicon Valley before going independent and traveling the world teaching rapid software testing skills. He is the author of Lessons Learned in Software Testing, and a new book, Secrets of a Buccaneer-Scholar: How Self-Education and the Pursuit of Passion Can Lead to a Lifetime of Success.
I asked him to offer an example of one of the myths facing our industry. He shared some of the impact of detailed documentation around testing processes and procedures:
“One myth is that a process becomes more rigorous just because it is written down. Well, as anyone ought to know who works with procedures, what is written is not necessarily what is practiced. In fact, it rarely is, in my experience. Moreover, it is not even possible to write down everything that matters about the processes that skilled people use.
“The impact of this myth is that a huge amount of money and time is wasted trying to chase an unhelpful obsession with documentation. It makes good theater – you look busy – but it doesn’t necessarily help you do better work.”
Bach has recently been training to fly a float plane. During PNWSQC he plans to draw some examples from that experience, such as the fact that pilots have to learn many procedures and protocols in order to fly safely. “Rigor is an interesting challenge for a pilot.” he said. “In the talk, I talk about how we use checklists and some of the problems with those checklists.”
A lot of the philosophy behind James’ stance can be found in his latest book:
“Compulsory public education is an example of rigor myths getting drunk and going wild. Many millions of my fellow humans have bought into the idea that education is possible only through schooling. Rigor applied to education is usually presumed to mean that you subordinate your will to that of a schoolmaster, priest, guru or some other authority. I practice a different sort of rigor — education strictly inseparable from life. My life is my education. My life is my own. Education is not some side activity that I do to prepare for life.
“That applies to testing. I am a tester. That is a big part of my life. So, I can’t accept these silly certification programs and bad standards and process guides. Those are examples of rigor, however they represent bad rigor, and my standards are too high for that. I’m trying to raise the standards of the industry so that it will laugh at bad work instead of enshrining it.”
Project management (PM) consultant Michelle LaBrosse shared some quick tips for PM strategies in recessionary times with me recently. These ideas gelled during an interview with Carey Earle, president of Green Apple Marketing, on her syndicated radio program, Your World, Your Way.
“We talked about great ways to use these trends as a great launch pad for new ideas, solutions and direction in the workplace,” said LaBrosse, founder of Cheetah Learning, a PM consultancy, and aPM issues blogger.
Many projects are on an “economic slim-fast” diet, LaBrosse said. Not surprisingly, she’s seeing may project managers focus on practices that save their teams’ time, cut spending and improve quality and time to market. Also, businesses are trending toward novel perks in these days when raises are rare and budgets tight. She’s seen good results when project managers reward teams in inexpensive and creative ways, such as making a premium parking spot available to a top achiever each month.
The secrecy and lack of honest documentation of the early 2000s has caused an about face, in a trend that LaBrosse and Earle call The Full Monty. “Technology brings us a whole new level of honesty whether we like it or not, but underneath the technology is a new human desire for trust and transparency like never before,” LaBrosse said. “Think: Is your documentation in good shape? Are you leaving a trail that you’re proud of? How can you be more transparent in your business?”
Finally, LaBrosse advises project managers to start noticing trends on their own. “The art and science of noticing the dramatic or subtle changes taking place will help you and your team continue to seek out future opportunities and successes,” LaBrosse said.
Whenever I walk into a movie theater, I remember when I tested a self-service ticket machine. No one was paying me to test the kiosk. I was just killing time, waiting at a theater for someone to join me to watch a movie. The machine looked and functioned similar to an ATM. You select your movie, slide your credit card and print your tickets. What was great about the opportunity is that it allowed me to practice exploratory testing, usability testing, performance testing and security testing all at once.
I discovered that “playing” with the kiosk nicely illustrated what software testers do every day.
The system would allow you to select up to 10 tickets for each type of ticket you could purchase: adult, child and senior. While testing the limits of ticket selection and the proper calculation of the total amount, I noticed that if you max out the number of tickets for senior- and child-priced tickets, the system would beep at you each time you tried to select more then ten tickets. However, when you attempted to select more then ten tickets priced for adults, there was no beep. It made me wonder about the beep. Was it a usability feature?
After I was done doing my functional analysis of the system I had a chance to do some usability testing by watching people interact with the system. I noticed one case in particular that showed what I consider to be a serious defect. A lady using the system selected her movie, entered her credit card information and started waiting as the screen displayed the message: “Please wait while processing your transaction.” I assume that at this point the system was attempting to connect to whatever service it uses to process credit cards.
As luck would have it, at that moment credit card processing for the theater went down. I know this due to the very vocal population of customers at the ticket counter. Unfortunately for the lady making her self-service purchase, the ticket machine seemed to have hung as well. It just sat there saying “Please wait while processing your transaction.” No message saying: “Timed out while connecting to service. Please try again.” No message saying: “Trying your transaction again, please wait.” Nothing. It just sat there.
After about five minutes, the lady finally lost her patience and started pushing the cancel button. She pushed it once. She pushed it a second time – harder. She then pushed it five times in rapid succession. She then put all of her weight into the pushing of the button and kept the button down for several seconds. This processed continued for some time. I counted as she pushed the button over 40 times. Still the screen read: “Please wait while processing your transaction.” So much for the cancel option! She then left the machine and went to the ticket counter for help.
I found other issues while testing, but what stands out for me when reviewing this experience is not the issues I found, but that the process of finding issues “in the wild” is the same that we use “in the lab.” There was setup and configuration for my testing: show times; my credit card; connectivity to the bank; real users I could observe; and my watch to time transaction response times.
There was interaction with the system: myself and others pushing buttons: the system with the bank: the system with the system at the counter that the clerks used; customers swiping cards; and the system printing tickets and receipts.
There was observation of results: noticing beeps and information on the screen; looking at my receipt and tickets; looking at the time on my watch; listening to customer reactions and the conversations at the counter; and seeing the actions the user took under stress.
I was able to draw conclusions based on those observations: the need for better error messaging in the system; the probability of a bug around the beeping for adults; and the fact that the cancel key sticks could be due to multiple people applying fifty pounds of pressure for extended periods of time.
Does that testing process sound familiar?
I like this memory because it illustrates all the basic mechanics of software testing, regardless of the type. It doesn’t matter if it’s functional testing, usability testing, performance testing, security testing, or even automated testing:
- Testing almost always requires basic setup and system configuration.
- Testing requires that someone operate the test system or interact with it in some way.
- Testing requires that someone observe the results of those interactions.
- Testing requires that someone evaluate those results and draw conclusions.
What’s even better is that I learned something while waiting!
When I lead testing teams, the teams are typically doing session-based exploratory testing. A big part of session-based exploratory testing is the debrief. When testers complete a testing session (a time boxed testing effort focused on a specific test mission) they debrief with me as the testing manager. That means I might sit down with each tester two or three times a day to do debriefs.
In each debrief the tester walks me through what they tested, what issues they found, we discuss the impact of their testing to project risks and test coverage, and sometimes we review the notes from their testing. There’s a lot that can get covered in a debrief, so I’ve developed a list of questions that I can use to help me make sure I’ve covered everything when I’m debriefing someone.
- What was your mission for this session?
- What did you test and what did you find?
- What did you not test (and why)?
- How does your testing affect the remaining testing for the project? Do we need to add new charters or re-prioritize the remaining work?
- Is there anything you could have had that would have made your testing go faster or might have made your job easier?
- How do you feel about your testing?
I don’t use these questions as a template. Instead I use them to fill in the gaps. I’ll typically open with something generic like, “Tell me about your testing.” Then after the tester is done telling me about their session, I walk through this list in my head an make sure I have answers to each of these questions. If not, then I’ll go ahead and ask at that time.
Recently, during a class on exploratory testing where I review this list I was asked why I include the last question, “How do you feel about your testing?” For me, that’s a coaching question. I’m looking for the tester to express something that they might need help with. Often they do. They might say something like, “I wasn’t happy with my testing of X or Y.” Or they might say they didn’t feel prepared for the session. I’ll use this information to help them with their testing.
When you first start debriefs, they might be slow. Some might take five or ten minutes. But fear not, like anything the more you and your team do it – the easier it gets. Most debriefs take under five minutes, and some can be as quick as 60 seconds. The trick is to just make sure you’re not forgetting anything as you quickly move through the information.
Determining testing coverage is about figuring out what you’re going to test in the application. When I start this process, I start with a coverage outline. And while I like to develop coverage outlines in Excel, you can use just about any application you’d like. A lot of people use mind mapping tools, word, or a graphing tool like Visio or OmniGraffle.
I’ll often start by developing a generic list of items to cover while I’m testing. I typically do this by working through the elements of the SFPDO mnemonic to get things started. The SFPDO mnemonic comes from James Bach, and it’s a heuristic to help you figure out what you need to test. If you are not familiar with the SFDPO heuristic, it addresses the following:
- Structure: what the product is
- Function: what the product does
- Data: what it processes
- Platform: what it depends upon
- Operations: how it will be used
Within each of those areas, there are specific factors you can look at. For example the following list details out what’s included in Structure – often ignored area of test coverage:
- Code: the code structures that comprise the product, from executables to individual routines.
- Interfaces: points of connection and communication between sub-systems.
- Hardware: any hardware component that is integral to the product.
- Non-executable files: any files other than multimedia or programs, like text files, sample data, or help files.
- Collateral: anything beyond software and hardware that is also part of the product, such as paper documents, web links and content, packaging, license agreements, etc.
Using the SFDPO mnemonic, I’ll cover each area in detail to identify what I believe I should be testing. Once I have my initial list, I put it down and walk away from it. I do this for a couple of reasons. Normally, it’s because I’m tired, but also to give myself time away from the list to see if anything new occurs to me while I keep it in the back of my thoughts.
A second approach I use to identify coverage is to look at what test data I already have. I’ll see if there is any data I have access to that’s ready to use, or could be ready to use with very little work. Is there test data lying around from past projects or production that I can use? What coverage does that give me? Is there test data I can create easily with tools or automation? What coverage does that give me? If I find anything interesting, or if the data I find sparks any ideas, I’ll go back and add that to the coverage outline.
Finally, a their approach is to think about specific risks related to the product I’ll be testing. Sometimes I’ll use bug taxonomies to spark my thinking if I have a hard time getting started. These normally help me with generic risks. The one I reference most is the appendix to Kaner, Falk, and Nquyen’s Testing Computer Software. Once the taxonomy gets me going, I can normally think of some additional risks that are more specific to my application.
Regardless of where the ideas come from and how I develop it, once I have a coverage outline I work to get it reviewed with various project stakeholders. That typically involves dialog and trade-offs. I cut out a bunch of the stuff I wanted to test and add a bunch of stuff I didn’t think of. Over time, this outline evolves as my understanding of the application and the risks to the project evolve.
For more information on SFDPO, check out Bach’s original article or his methodology handout which details the specific product elements covered with the mnemonic. Also, if you don’t have a copy of Testing Computer Software, you can pick one up here.
Having tested software during many projects, I’ve seen that the most effective testers are the ones who start early — and I don’t mean the ones who start testing early. I’ll explain what I mean with this step-by-step tour of a pattern I’ve noticed multiple times in software testing projects.
1. A project starts.
2. All the testers work very hard to understand the problem they are trying to solve.
3. Eventually, some small amount of “working” — a very subjective term — software gets delivered to a development environment for the developers to do their unit testing and debugging.
4. One group of testers — often those doing exploratory testing or those working in an agile project context — ask the development team if they can get involved and help test that early code. They don’t care if they can’t log “defects,” because they just want to see the product and provide feedback when and if the developers think it would be helpful.
5. Another group of testers (often those doing scripted testing or those working in more “traditional” corporate testing environments) say they want to wait until unit testing is complete before they start their testing.
6. The group that starts early develops an early collaborative relationship with the developers who let them test their code.
7. Eventually, some small amount of formally “working” software gets delivered to a test environment for all the testers to begin their first test cycle.
8. At this point, both groups of testers start their first test cycle, and both find and log issues.
9. The issues found by the first group — those who worked more closely with the developers — tend to get resolved first, not based on priority or severity, but based on the personal relationship of the tester to the developer.
10. If a defect from that first group — those who worked more closely with the developers — can’t be reproduced, the developer comes over to work with the exploratory tester to isolate the problem.
11. If a defect from the second group — those who waited to start testing — can’t be reproduced, the developer sends it back as “can not reproduce” or the ever famous response, “It works on my machine.”
Before someone points out that “it’s not always practical to start testing early,” and that “there are a lot of good reasons to wait,” I get it. I’m not advocating that in all circumstances you should start testing early. There are plenty of reasons where that might not be practical. However, I’ve never worked in anywhere I would say that’s been the case. And I’ve seen this pattern a lot. It’s not universal, but it’s very common.
The testers who get a jump-start on collaborating with the development often have a closer relationship, and are thus viewed as an asset. It is these testers who are often pressed to help isolate a defects, even if it wasn’t logged by them. They are also the testers who get invited to design review meetings, because their opinions are highly valued.
Does that mean you have to get involved early to be an asset or to have those relationships? Absolutely not. But, I suspect that if you can, you’ll have a better chance of collaborating with people on your project team before pressures are high, stress is up and your “feedback” is viewed as a call for another late night debugging session.
There’s nothing more intimidating than a blank sheet of paper. Writers know this to be true, but so do test managers. The easy way out is to pull out a template and to start filling in the various “recommended” sections and details. An even easier approach is to pull out a past test plan and to just start changing project names, diagrams, and technologies. However, these approaches miss the point.
Recently while writing a test plan for a new project, I’ve noticed an odd habit I’ve developed. Ten years ago, when I wrote a test plan I started with a template. Four years ago, if I wrote a test plan I started with a blank sheet of paper. I noticed that when I write a test plan today, I look at templates, decide not to use them, and then end up pulling in pieces of them anyway.
The planning process isn’t about producing a document. Okay, well it shouldn’t be about producing a document. I recognize that in some companies it is. Instead, it’s about thinking about the problem. Software development problems are difficult and solving these problems requires time spent in research, comparing options, and prototyping. Our planning process, in the early stages, is about exploring those options and elaborating on what we think we’ll need to do (and when we’ll need to do it).
I find templates keep me from thinking about the actual problem. Instead they get me thinking about formatting and populating sections that aren’t yet filled in. When I’m using a template, I’m thinking about polish – not content.
However, there’s value to templates. They’re useful checklists for what types of things you should think about. I forget stuff just like anyone else. I’ve gotten a lot of good ideas from templates. So I’ve developed a habit of using a template to “prime the pump.”
I take an initial look at my templates and past test plans and use that to help get me started on problem solving. I’ll then switch over to a blank sheet of paper and start typing out my ideas and thoughts about what we need to test and how we should test it. Later, when I feel I’ve got most of my content, I’ll go back to a template and start pasting the content into the appropriate sections.
This technique keeps me from focusing on polish at the wrong time. There’s nothing wrong with polish, I just don’t want to be thinking about what font to use when I should be thinking about how I’m going to find or generate test data. This technique keeps me free from distractions when what I really need to be doing is focusing on the problem. This helps me deal with some of the intimidation of the blank page, but also allows me to be focused on the difficult topics when that’s what needs to be done.
Before Windows Vista came along and ruined it all, I previously used a bug in Windows Notepad to illustrate a problem testers often face. Vista ruined it by fixing the bug. If you have a version of Windows pre-Vista, you can still try this bug out. To reproduce the issue, open Notepad and type “this app can break”. Then save the file. If you were to close the file and re-open it, you’d find that your data has been corrupted.
Spoiler alert: If you want to research the Notepad problem and see if you can figure out what the issue is, then stop reading. I’ll tell you now it’s not an Easter egg, even though it looks like one.
On Windows XP Notepad calls a method titled IsTextUnicode when it opens a file. You can read about it here. The noteworthy text on this page is the following:
“This function uses various statistical and deterministic methods to make its determination […] These tests are not foolproof. The statistical tests assume certain amounts of variation between low and high bytes in a string, and some ASCII strings can slip through.”
What that text states is that Notepad uses a heuristic algorithm to open a file. Like any heuristic, it’s a solution to a problem that works most of the time. That’s why you’ve likely never seen that bug before. There are only a finite set of conditions that will cause it to fail.
This bug represents several problems that many testers face everyday:
- When the development solution is heuristic, or the number of variables involved makes a deterministic solution to the problem impossible to determine manually, testers have to expect that there are cases they will miss that could expose problems. For Notepad, that’s fine. For a heart monitor, it might not be.
- A method that a developer uses might work perfect for two uses (Word and Wordpad), but might fail when used for a third (perhaps inappropriate) application. We use so many third party languages and frameworks when we develop today, it’s impossible for a developer to keep all of the code they didn’t write straight.
- As testers, we often need to dig into a problem well past the point of saying: “I noticed what might be problem here.” If we understand why this is a problem, it helps us refine our models of the applications we’re testing. Now your model of Notepad should have changed to imply that Notepad uses a lot of the same code-base that Word uses. That’s interesting to know when testing because it gives you another oracle for what the correct behavior might be. It can also inform your conjectures for application behavior.
I was first given this testing/debug problem by James Bach a number of years ago (pre-Windows XP I think). I think I spent over an hour testing and researching until I came upon the root cause of the problem. It was a valuable lesson for me. Because of this experience, I now look forward to opportunities to help with issue research and isolation.
Recently, fellow SearchSoftwareQuality.com expert David Christiansen shared his post about experiences with testing ruts that he gets into and what he does to stay out of those ruts.
What resonated with me was his description of how he sometimes doesn’t feel like working to isolate bugs:
You did x, y, and z and the app crashed, so you filed a bug report and moved on. Does it crash with just x? Are there variants of y and z that don’t make it crash? How do they work together? If you don’t know and don’t care, you need to power up.”
Dave points out what I believe is an important step for software testers. I’ve seen many testers encounter what could be critical issues, they log a defect ticket in passing with a shallow description of the problem, and they move on. Just to be fair, I’ve done it too. When this happens, I find that often there are two outcomes:
- The issue isn’t looked at immediately, or even fixed, because the description is vague, looks like an edge case, and doesn’t have clear implications past the immediate problem identified in the ticket.
- The tester misses out on a deep and rich opportunity to learn more about the application, how it was developed, and what dependencies it has. I find that some of my most insightful observations about the system, how it works, and how that relates to the testing I’m doing comes from isolating defects.
While you don’t need to track down a possible issue to the offending line of code, I think a tester should be able to draw a clear chalk outline around the issue. That means they should be able to say, with some confidence, that under what conditions it does and doesn’t occur, and what minimal set of conditions appear to trigger it. If they can, they should talk about potential impact – but only if it’s data-driven analysis and relevant to getting the issue fixed.
To that end, the following tips might be helpful for when you’re working to isolate an issue:
- Take excellent notes and keep all the evidence. This includes test execution notes in a notepad, screenshots or screen capture utilities, copies of log files, snapshots of disks or virtual images, etc….
- Work to recall what you were doing before you found the problem. Often times, if the cause of the problem isn’t obvious, it was something you did five steps earlier that triggered what you saw. If you can find the deviant step, try variants of that activity to see how else the problem manifests itself.
- If the investigation goes for more than a day, find a place to share information about the problem with the rest of the team (a wiki, SharePoint, or a defect tracking tool). I often find it useful to keep lists of the following information:
- a list of the symptoms of the problem
- a list of what variables you’ve looked at already as you try to reproduce the issue
- a list of the variable you haven’t looked at yet, but you suspect they might be related
- a list of who you’ve spoke with about the issue (and any details they provide)
- a list of possible workarounds
- a list of possible tools (or techniques you may not know or be good at) that might help
At some point, it’s important to recognize that with any difficult problem, you’ll need some stopping heuristics. Of course the one we all want to use is, “I found the problem.” However, sometimes that doesn’t happen. Make sure you have a clear idea of how important the problem is and how much time you have to dedicate to it so at the appropriate time you can drop it or shelve it for later.
For more on this topic, and dealing with other testing ruts, be sure to checkout Dave’s entire post on testing ruts and how he deals with them.