Archive for November, 2009
Best Practices – Philosophy vs. Practicality
Originally Published Saturday, September 12, 2009
I have spent the last week in Israel teaching our new SDET course in Herzillya and our Senior SDET course in Haifa. I also did a lot of listening and discussing various issues relating to software testing and the maturation of our discipline; not just here in Israel, but around the world both inside and outside of Microsoft. Now I am sitting at LaLa Land after a relaxing day of sailing in the Med, and reflecting on the past week’s discussions.
One of the topics we discussed was best practices, and that seems appropriate to write about since the concept of “best practices” was recently discussed (again) in an article in the Software Test and Performance magazine by Eddie Correia. Eddie argues “…the notion of “best practices” is not useful. Best for whom? And for what kind of testing?” Actually, this is just a repetition of the same old fustian melodramatic hyperbole of the “context-driven” posse.
Perhaps the philosophical “questioning” of best practices may be interesting for folks who like to run around quoting Aristotle and Plato. However, from a pragmatic point of view this is a rather benign debate for anyone capable of thinking for themselves.
In reality, many different professions recognize the concept of best practices. For example, a best practice in preventative medicine is to rinse a minor abrasion and apply a topical antiseptic ointment. A best practice in plumbing is to wrap Teflon tape in the direction of the threads when fitting pipes. “Eliminating distractions in the operational area” is listed as one of the best practices by the FAA for airfield safety. Do these “best practices” apply in all situations? No, they don’t.
So, why do so many professionals recognize the concept of “best practices.” Because they understand that best practices provide guidelines that are generally more effective in the appropriate context as compared to other approaches. They understand that “best practices” are not a rules or rigid standards that must be followed in all circumstances, but “best practices” are general solutions to common problems that can be shared among professionals who might face similar situations. The professionals who understand “best practice” concepts are usually well-trained on other comparable practices for the type of problem they are facing, and know when to apply the best practice within the appropriate situation. They understand that “best practices” don’t simply apply to 1 or 2 limited situations, but have been proven to be generally effective for that particular type of problem.
But, most importantly these professionals (who recognize “best practices”) are extremely knowledgeable about their field and can “act with appropriate judgment” (that’s sapience for the CD crowd), and conversely know when to approach a problem using a different solution.
Fortunately, the argument against “best practices” only stems from a few people who are seemingly more interested in stirring up portentous philosophical debate rather than earnestly discussing the practical advancement of the profession of software testing beyond mysticism and emotionally charged rhetoric. And, that argument really seems to boil down to a rather condescending and incorrect viewpoint that best practices are merely steadfast rules and requirements that must be followed in all situations. I say condescending because this point of view seems to suggest that testers are incapable of analyzing a problem and logically rationalizing the benefits and limitations of various approaches to problems in different situations to reach appropriate decisions on their own.
Personally, I think professionals in the discipline of software testing are highly intelligent, and are quite capable of making smart decisions, and can “act with appropriate judgment” in a wide variety of contextual situations. I also think discussions of best practices are enriched with case studies outlining situations where they may not apply and the alternative approaches that were more effective in those situations. And, I think “best practices” provide a common reference for professionals in that field that can be shared and further developed, and perhaps even give rise to new “best practices” for varying situations.
So, for those of you who believe there is a “one-size fits all” solution that can be applied in every situation I recommend that you don’t subscribe to the concept of best practices. (I would also recommend these people are well supervised and constantly monitored.) But, for the vast majority of professionals in the practice of software testing I suspect you understand the notion of “best practices” is quite useful for pragmatic discussions for advancing the intellectual knowledge pool of our profession and maturing our discipline.
微软的软件测试之道(Microsoft核心技术丛书)
Originally Published Thursday, September 10, 2009
I am really happy to announce that our book has been released in China and available on the Chinese Amazon site! This was really a monumental effort driven by my friend and colleague Kelly Zhang.
We look forward to the feedback from the Chinese testing community, and we hope this provides our Chinese friends with some additional perspectives on software testing (or at least some interesting stories).
Test Automation ROI (Part II)
Originally Published Wednesday, September 02, 2009
Last week I talked about the silliness of wasting time calculating the return on investment (ROI) of an automation effort on any non-trivial software project; especially if it has an extended shelf-life. As my friend Joe Strazzre commented, “If you need an ROI analysis to convince business management that test automation is a good thing when used intelligently, than you have already lost.”
But, management might need to be educated on the limitations of record/playback, rudimentary hard-coded scripts and keyword driven automation efforts because these it is often more appealing for bean counters to invest in low cost tools and continue to rely on non-coding bug finders or domain experts to script out ‘tests’ which do nothing more than repeat some rote set of steps over and over again. But, as E.Dustin, T. Garrett, and B. Gauf wrote in Implementing Automated Software Testing any serious software automation effort “is software development.” Well designed automated tests requires highly skilled, technically competent, extremely creative, analytical testers capable of designing and developing automated tests using procedural programming paradigms.
We should still apply ROI concepts in test automation, but at a much lower level. Essentially, each tester must evaluate the return on investment of any test before automating it. The most fundamental purpose of an automated test effort is to provide some perceived value to the tester, the testing effort, and the organization. As a tester, the primary reason I automate a test is to:
- Free up my time,
- Re-verify baseline assessments (BVT/BAT, regression, acceptance test suites)
- Increase test coverage (via increased breadth of data or state variability),
- Accomplish tasks that can’t easily be achieved via manual testing approaches.
For example, the build verification and build acceptance test suites are baseline tests that must be ran on each new build; these tests should be 99.9% automated because they free up my time to design other tests. Tests that evaluate a non-trivial number of combinations or permutations are generally good candidates for automation because they increase test coverage. Performance, stress, load, and concurrency tests should be heavily automated because they are difficult to conduct manually.
It is important to note that I am not simply referring to UI type automation. A significant amount of “functional tests” designed to evaluate the computational logic of methods or functions can be automated below the UI layer in software architectures using OOP and procedural paradigms where the business and computational logic is separate from the UI layer.
There are many papers that discuss specific factors to take into consideration when deciding what tests to automate. Unfortunately, there is no single cookie-cutter approach in deciding what tests to automate. Different projects have different requirements and expectations, and, of course, not all tests are equal. One of the best papers I’ve read on deciding what tests to automate is When Should a Test Be Automated by Brian Marick. I like the simplicity in his 3 key questions:
- How much more will this test cost to automate versus running it manually?
Some people think that automating a test reduces costs because it eliminates the tester from manually executing that test. Unfortunately, this is not always the case. As i talked about in a previous post, visual comparative oracles are notoriously error prone requiring the testers to constantly massage the test code and manually verify results anyway. Sometimes paying a vendor to run a test periodically is cheaper than paying an SDET to tweak the test every build. But, if the population of potential test data is large, or combinatorial testing of non-trivial features then automating that test case is probably a good investment. - What is the potential lifetime of this automated test?
How many times will this test be re-ran during the development cycle and in maintenance or sustained engineering efforts? Can this test be reused in the next iteration of the product? - Does the automated test have some probability of exposing new issues?
Although I don’t necessarily agree with this question because many automated tests may not expose new issues, but they still provide value to the overall testing effort. For example, I wouldn’t expect tests in my regression test suite to expose new defects because if they do there was a regression in the product. So, I would rephrase this question to ask, “Does this automated test have some probability of exposing new issues, providing additional information that increases confidence, or increases test coverage?”
A few other questions I ask myself when I am deciding whether to automate a particular test include:
- What exactly is being evaluated?
This is perhaps the first question I ask myself. If the test is evaluating functional or non-functional (stress, perf, security, etc.) capabilities then automation may be worthwhile. But, behavioral tests such as usability tests and content testing are generally not good candidates for automated testing. - What is the reliability of automating this test?
I don’t want to have to constantly massage a test in order to get it to run. So, what is the probability this test will throw a lot of false positives or false negatives? How much tweaking will this test require due to code or UI instability? - What are the oracles for this test and can they be automated?
I don’t want to sit in front of a computer and watch software run software. Also, there is a difference between an automated test and a macro (A single, user-defined command that is part of an application and executes a series of commands). There are different types of oracles, and the professional test designer needs to also design the most effective oracle for the test. By the way…if the most effective oracle is a human reviewing the results then that test should probably not be automated using current technologies.
For each test I consider these questions in deciding whether to automate that test. For some tests, I may ask additional questions depending on the context and the business needs of my organization. I don’t use a cookie-cutter template, or try to fill out some spreadsheet to do a cost comparison based on dollar amounts. It’s hard to put a price on value. Instead, I ask myself a few key questions to help me decide if automating a test is worth it to me, my team, and the organization. Is automating a particular test the right thing to do or am I automating something because it’s challenging, or to increase some magical percentage of automated tests compared to all currently defined tests. The key message here is not to blindly automate everything; use your brain and make smart decisions about whether each test should be automated and being able to explain how automating that test benefits the testing effort.
Measuring Test Automation ROI
Originally Published Tuesday, August 25, 2009
I just finished reading Implementing Automated Software Testing by E.Dustin, T. Garrett, and B. Gauf and overall this is a good read providing some well thought out arguments for beginning an automation project, and provides strategic perspectives to manage a test automation project. The first chapter made several excellent points such as:
- Automated software testing “is software development.”
- Automated software testing “and manual testing are intertwined and complement each other.”
- And, “The overall objective of AST (automated software testing) is to design, develop, and deliver an automated test and retest capability that increases testing efficiencies.”
Of course, I was also pleased to read the section on test data generation since I design and develop test data generation tools as a hobby. The authors correctly note that random test data increases flexibility, improve functional testing, and reduce limited in scope and error prone manually produced test data.
There is also a chapter on presenting the business case for an automation project by calculating a return on investment (ROI) measure via various worksheets. I have 2 essential problems with ROI calculations within the context of test automation. First, if the business manager doesn’t understand the value of automation within a complex software project (especially one which will have multiple iterations) they should read a book on managing software development projects. I really think most managers understand that test automation would benefit their business (in most cases). I suspect many managers have experienced less than successful automation projects but don’t understand how to establish a more successful automation effort. I also suspect really bright business managers are not overly impressed with magic beans.
Magic beans pimped by a zealous huckster are the second essential problem with automation ROI calculations. Let’s be honest, the numbers produced by these worksheets or other automation ROI calculators are simply magic beans. Now, why do I make this statement? Because the numbers that are plugged into the calculators or worksheets are ROMA data. I mean really, how many of us can realistically predict the number of atomic tests for any complex project? Also, do all tests take the same amount of time, or will all tests be executed the same number of iterations? Does it take the same amount of time to develop all automated tests, and how does one go about predicting a realistic time for all automated tests to run? And of course, how many of those tests will be automated? (Actually, that answer is easy….the number of automated tests should be 100% of the tests that should be automated.)
Personally, I think test managers should not waste their time trying to convince their business manager of the value of a test automation project; especially with magic beans produced from ROMA data. Instead test managers should start helping their team members think about ROI at the test level itself. In other words, teach your team how to make smart decisions about what tests to automate and what tests should not be automated because they can be more effectively tested via other approaches.
In my next post I will outline some factors that testers, and test managers can use to help decide which tests you might consider automating. Basically, the bottom line here is that an automated test should provide significant value to the tester and the organization, and should help free up the testers time in order to increase the breadth and/or scope of testing.
A Different Perspective on Random Name Generation
Originally Published Saturday, August 15, 2009
My daughter made me laugh today when she offered a bit of her philosophy. She told me that her favorite candy is gummy bears “because gummy bears get stuck between your teeth, and then you can dig out a second helping with your tongue.” I never really thought of it that way, but how many of us have not picked at a piece of licorice stuck between our teeth with our tongue (or a toothpick) and savor that last little bit? Ummm….
Perhaps it is my own twisted logic, but as I started writing this post I thought about my daughter’s predilection for gummy bears and somehow made a connection to static test data used in tests. Static test data that is simply reused over and over in a test is similar to that last little bit of licorice we dig out of our teeth. The last bit tastes just like the first bite, and all the other bites between. This may be good for those who like the flavor of licorice, but it is not so good for hard-coded test data in rudimentary test scripts, especially in automated tests.
If you have followed my posts or my personal website then you know that I am a big proponent of probabilistic stochastic test data (statistically unbiased, parameterized randomly generated test data that is representative of the population of possible inputs for a specific variable). The latest addition to my random test data generator toolbox is PseudoName, a random name (pseudonym) generator library for automated testing.
Before designing and developing PseudoName I researched the plethora of available random name generators currently available because I am not a big fan of reinventing the wheel either. In fact, there are many very good online html based random name generators. For example, Fake Name Generator that not only generates a pseudonym, but also generates an address, phone number, etc. essentially creating a fictitious persona. However, while this tool is useful for manual testing it is not so useful for automated tests. The Automated Testing Institute website provides code samples in VBScript and Ruby for generating random names from a built in collection of names stored in an array. These examples are also useful and the collections can certainly be expanded to include a greater variety of names, but they are still limited in scope.
A common problem that I noticed among all available random name generators is the Romanization (representing a written language with the Latin alphabet) of the pseudonym. Basically this means the random names are always represented with ASCII characters. Romanization may be satisfactory for those who only know the letters “A” through “z” or for those whose eyes glaze when the displayed character glyphs are in a foreign language. But, for those of us dealing with modern software or services that supports Unicode and may be adapted (or localized) or used in different locales where it is important to support the native language we soon realize that Romanization using simple ASCII characters is simply not enough for effective globalization testing.
Unlike most random name generators PseudoName generates a random name (pseudonym) from columns of name data in an Excel spreadsheet. The name data in the Excel spreadsheet is stored as Unicode so the characters can be the same as those used in the desired region or locale. For example to generate a random female Chinese name most name generators would produce a string such as “Dongyi Li.” However, PseudoName can randomly generate a name using Chinese characters such as “冬怡 李.” (Actually, Dongyi Li is not a pseudonym. Dongyi is my friend and she was kind enough to produce the Chinese name list of female, male, and surnames, and also helped me with refactoring the code used in this tool.)
The PseudoName library is simple to use in an automated test. The PseudoName members page also includes simple examples, and the NameInfo properties allow customization of the pseudonym output. If additional properties are necessary to generate reasonably realistic names in different locales please let me know. Also, if there is enough demand I might consider slapping on a GUI.)
The format for the Excel sheet is simple. The first column is female names, the second is male names, and the third is surnames. The names listed in the currently available US and Japanese names data files are the most common names in those countries according to census data. The names in the Chinese data file are the characters used for feminine and masculine names, as well as the most common surnames used in China. (I could really use some help collecting name lists using Unicode character scripts from other countries around the world. If you want to contribute please send me a name list in Excel and I will post it on the tool website for other testers to use.)
Stupid Hammer!!!
Originally Published Tuesday, August 11, 2009
I remember as a young lad working construction for my uncle one summer. The hours were long, it was hot, and I would much rather have been somewhere else. But, I was saving up for my first motorcycle, so I did whatever jobs I could find. Perhaps it was because my mind wasn’t really engaged in what I was doing, but as I was hammering nails into drywall the hammer slipped off the head of the nail and mashed my thumbnail. Man that hurt. Of course after yelling out a few expletives I screamed, “stupid [expletive deleted] hammer!” Years later, I was working an engine and the wrench slipped and the back of my hand slammed into the engine block. Covered with grease (and now a bit of blood) I examined my hand to assess the damage in order to decide whether to keep working or tend to my wounds, and the first thing I thought of as I looked at my somewhat mangled hand was “stupid wrench.” Do you see the pattern here? Isn’t it a bit funny that often when we use a tool incorrectly, misuse a tool to do something the tool was not designed to do, or do not really know how to use the tool to begin with and a problem ensues our initial reaction is to blame the tool. Of course, it couldn’t be our fault; it has to be the tool!
I sometimes see this deflection of responsibility repeated by testers who attempt to apply various techniques or methodologies to a software testing problem and later discover they missed or overlooked a bug. Their immediate reaction is “such and such” technique sucks because it didn’t find “this” bug. Of course it could never be our own fault! It could never be that we don’t sufficiently understand the principles of a technique or approach in order to apply it correctly. It could never be that we don’t fully understand the ‘system’ in depth that we are testing. And it could never be that we are using a particular technique or approach in the wrong context. The problem could never be the fault of the person wielding the tool…it must be the tool.
A few years ago I was rebuilding a sailing dinghy. I was ready to mount the rub strake and started drilling holes through the fiberglass hull and the teak strip. For some bizarre reason, I decided to simply hold the rub strake against the gunwale as I moved forward drilling the holes rather than use C-clamps. My grip was sufficiently tight to hold the rub strake tight against the hull, and I had used an electric drill for years and knew well how to use the tool. But, after drilling several holes I suddenly experienced an excruciating pain in my left index finger. Yep…the drill bit went right into the proximal interphalangeal joint on my left index finger. After screaming a few profanities (which by the way helps us deal with pain) I flung the drill across the garage only to put a hole in the drywall and hear it crash to the floor in pieces. With that I hurled a few more obscenities, wrapped my finger with ice, and headed off to the doctor and thought to myself…”man, that was stupid of me not to use clamps.” To add insult to injury, the doctor asked, “Why didn’t you use a clamp?” I replied, “Well, I was in a bit of a hurry.” He shook his head, smiled, and asked the redundant question, “In retrospect that wasn’t too smart was it?”
When we misuse tools, apply them in the wrong context, or if we really don’t understand how to use tools appropriately bad things can happen. (And sometimes the scars are permanent!)
While I don’t think that misapplication of a testing technique or approach would require a trip to the hospital, it might cost us a missed bug or added redundant tests. This also doesn’t mean that all techniques or approaches to testing are 100% effective in finding all categories of anomalies. But, we have to remember for the most successful application of any tool, technique, or approach used in testing depends heavily on the person wielding the tool in the appropriate situation and we must learn to use them smartly!
UI Automation Out of Control
Originally Published Saturday, August 01, 2009
When many people think of test automation they envision rudimentary scripts with hard-coded events and data that manipulate user interface objects much the same way a customer might interact with the software to accomplish a pre-defined, robot-like task. Perhaps this is the reason there is a plethora of tools available to business analysts or super-users hired as ‘black-box’ testers to help them record and playback (or list keywords to sequentially step through) some contrived set of steps they think a customer might perform. Sure…it’s cool to watch windows open and close, and the mouse cursor move across the desktop as if by magic. But, for anyone with half-a-brain the visual amazement lasts for for oh….about 1.7 seconds….after that it is mind numbingly boring! Unfortunately, this automation is usually short lived, requires tremendous overhead in terms of maintenance costs, and contributes to the exceedingly high percentage of failed or less than successful automation projects.
I will say that in general I am not a big fan of GUI automation for a litany of reasons, but mostly because it is misused, or applied to tests that are simply more effectively and efficiently performed by a person. However, I also understand the GUI automation does provide value when used in the right context. I am not against GUI automation, but I certainly don’t advocate automating a test just because we think we can, or because we have some bizarre idealistic belief that everything should be automated.
For example, in one situation I spoke with a tester whose manager wanted him to maintain a legacy test designed to detect the correct color of an arrow symbol after an action was performed. If the action completed correctly the arrow was green; and if it was unsuccessful the arrow appeared red. Now, besides the fact that we could have just as easily automated a test to check the HRESULT value, this test could have been executed by a user within a reasonable time, there was little probability of change in this area of the code, and there were no dependencies. However, the manager insisted this GUI test run despite this test which used image comparison as an oracle was notoriously problematic. (This shouldn’t be surprising since many image comparison oracles are notoriously problematic and throw an inordinate number of false positives.)
The tester said the manager claimed by automating this test it would negate a tester from having to execute the test manually thus saving time. What??? This tester was spending hours per week chasing down false positives and tweaking the automation to “make it work” on the daily builds just to make his manager happy. So, although this feature was used repeatedly by hundreds of people dog-fooding the daily build, another few thousand people around the company self-hosting internal releases, and thousands of customers using beta releases this particular manager determined continued tweaking of this test would save some tester’s time!
In another example a tester inquired how to automate a a test to determine IF the order of the slides in a power point presentation had changed between different copies of a .ppt file. Of course, the question was followed by a flurry of responses suggesting creating a base set of images of each slide in the deck, and then using an image comparison tool to identify changes. I responded a bit differently. First, there are several ways to programmatically detect file changes, and if we detect changes in the binary properties we can easily open the Power Point presentation in slide sorter view and take a few seconds (depending on the number of slides) and visually compare it against an original. Sure it is a bit slower than an automated test, but I really suspect it would be more effective and probably even more efficient in the long run. I also wondered how many times this “test” would actually need to be ran during whatever project this person was working on (it wasn’t PowerPoint) in comparison to the hours/days it would take to develop such a test, and the ensuing maintenance nightmare.
These are just 2 examples of the misuse of automated UI testing that I think illustrate a few important points:
- Not all automated UI tests save time!
Tests that require constant massaging and tweaking because they constantly throw false positives take up a huge amount of a tester’s time in wasted maintenance. - Sometimes a human is a more efficient oracle than a computer algorithm!
Sure, just about anything a computer does can be automated to some degree in some fashion, but there really are clearly some tests where it is more prudent and simpler to rely on a tester. - Don’t rely on automation to emulate your customers!
Test automation does not effectively emulate a human user. Sure, we have test methods in some of our internal automation frameworks to slow down simulated keystrokes (the actual keys are not being pressed on the keyboard), or simulate multiple or repeated clicks on a control or the mouse, and other tricks that try to emulate various user behaviors; however, test automation is generally poor at detecting behavioral issues such as usability, ease of use, or other customer value type assessments. Rely on the feedback from internal and external customers who are dog-fooding, self-hosting, and beta-testing your product (and act on it). - Go under the covers!
I think many testers rely too heavily on UI automation because they think it emulates user behavior (although most things such as populating a text box are simulated via Windows APIs), or perhaps because they don’t know how to dig into the product below the surface of the UI. Whatever the case, think about the specific purpose of the test. If it is easier to check a return value, or call an API to change a setting then go deep…and stop messing around on the surface. (It only complicates the test, wastes valuable machine cycles, reduces reuse across multiple versions, and often leads to long term maintenance costs. (For an example of this see my previous post.) - Constantly massaging code contributes to false negatives!
I have seen many cases where a tester designs a a UI automated test, and then tweaks a bit here and there to get it to run. Often times this tweaking contributes to a tests ineffectiveness in exposing problems, and may even hide other problems. Also, some tweaks are geared around synchronization issues (sync’ing the automated test with the system under test) and involve artificially slowing down the automation (usually by stopping or ‘sleeping’ the automated test process for a specific period of time). Other tweaks might hard-code parameters that then make the test fail on a different resolution or non-portable across different environments. - STOP trying to automate every damn test!
As I stated before…just because we can automate something doesn’t mean that we should try to automate everything! We need to make rational decisions about what tests to automate, and what is the best approach to automating that test.
It is easy to be lured in by the siren call of UI automation. I write automated tests to free up my time to design and develop more and different tests, and so I don’t have to sit in front of the computer executing redundant tests, or constantly massage code to make it run. Automation is a great tool in the arsenal of competent professionals who understand its capabilities and know how to exploit its potential. But, it is one of many tools in our toolbox; and the best tool is the one sitting on our shoulders. Use it!
Random comments…
Originally Published Wednesday, July 22, 2009
This week, I will keep this post quite short and redirect you to my answers to an interview by the great folks at What Is Testing. The questions covered various topics from the book How We Test Software At Microsoft, to my current role at Microsoft, to my perspectives on things such as open source, certifications, and future directions for our profession.
Speaking about our book How We Test Software At Microsoft, I am generally not one to pat myself on the back, but I am really glad to announce our book was translated into Chinese and Korean. In my past life as a Test Manager I worked closely with our teams in Korea, and I was actively engaged in establishing some of Microsoft’s initial partnerships with Chinese vendors. So, I would personally like to thank the many translators and editors, and the publishers who are making our book available to the testers in those countries. I hope our book provides testers in Korea and China some new ideas or alternative perspectives on software testing. Both translated versions should be available soon!
Testing is Sampling
Originally Published Thursday, July 16, 2009
It seems it is about this time of year that I need to detach a bit from the world to reflect back on the past year and reevaluate my personal and professional goals moving forward. Perhaps I am just getting older or perhaps just a bit wiser (that is synonymous with ‘sapient’ for the C-D crowd), but I find it refreshing to break away this time of year to tend to my gardens, work on my boat, read some novels, and contemplate life’s joys. Now, the major work projects are (almost) finished on my boat, the garden is planted and we are harvesting the early produce, and I reset both personal and professional development objectives for the next year and beyond. So, let me get back to sharing some of my ideas about testing.
Many of you who read this blog also know of my website Testing Mentor where I post a few job aids and random test data generation tools I’ve created. I am a big proponent of random test data using an approach I refer to as probabilistic stochastic test data. In May I was in Dusseldorf, Germany at the Software & Systems Quality Conference to present a talk on my approach. I especially enjoy these SQS conferences (now igniteQ) because the attendees are a mix of industry experts and academia, and I was looking for feedback on my approach. I call my approach probabilistic stochastic test generation because the process is a bit more complex than simple random data generation. Similar to random data generation we cannot absolutely predict a probabilistic system, but we can control the feasibility of specified behaviors. And the adjective stochastic simply means "pertaining to a process involving a randomly determined sequence of observations each of which is considered as a sample of one element from a probability distribution." In a nutshell, my approach involves segregating the population into equivalence partitions, then randomly selects elements from specified parameterized equivalence partitions (which is how we know the probability of specific behaviors), finally the data may be mutated until the test data satisfies the defined fitness criteria. By combining equivalence partitioning and basic evolutionary computation (EA) concepts it is possible to generate large amounts of random test data that is representative from a virtually infinite population of possible data.
One of the questions that came up during the presentation was how many random samples are required for confidence in any given test case; in other words how to we determine the number of tests using randomly generated test data? This is not an easy question to answer because the sample size of any given population depends on several factors such as:
- variability of data
- precision of measurement
- population size
- risk factors
- allowable sampling error
- purpose of experiment or test
- probability of selecting "bad" or uninteresting data
Using sampling for equivalence class partition testing
But, the question also brought to mind a parallel discussion regarding how we go about selecting elements from equivalence class partition subsets. I am adamantly opposed to hard-coding test data in a test case (automated or manual), but a colleague challenged me and said that since any element in an equivalent partition is representative of all elements in that partition then why can’t we simple choose a few values from that equivalence subset. I realize this approach is done all the time by many testers; which is perhaps why we sometimes miss problems. But, hard-coding some small subset of values from a relatively large population of possible values is rarely a good idea, and is generally not the most effective approach for robust test design. One problem with hard-coding a variable is that the hard-coded value becomes static, and we know that static test data loses its effectiveness over time in subsequent tests using the same exact test data. Also, by hard-coding specific values in range of values means that we have absolutely 0% probability of including any other values in that range that are not specified. Another problem with hard-coded values stems from the selection criteria used to choose the values from a set of possible values. Typically we select values from a set based on based historical failure indicators, customer data, and our own biased judgment or intuition of ‘interesting’ values.
However, the problem is that any equivalence class partition is a hypothesis that all elements are equal. Of course, the only way to validate or affirm that hypothesis is to test the entire population of the given equivalence class partition. Using customer-like values, or values based on failure indicators, and especially values we select based on our intuition are biased samples of the population, and may only represent a small portion of the entire population. Also, the number of values selected from any given equivalence partition set is usually fewer than the number required for some reasonable level of statistical confidence. So, while we definitely want to include values representative of our customers, values derived from historical failure indicators, and even our own intuition, we should also apply scientific sampling methods and include unbiased, randomly sampled values or elements from our set of values or population to help reduce uncertainty and increase confidence.
For example, lets say that we are testing font size in Microsoft Word. Most font sizes range from 1pt through 1638pt and include half-sized fonts as well within that range. That is a population size of 3273 possible values. If we suspected that any value in the population had an equal probability of causing an error the standard deviation would be 50%. In this example, we would need a sample size of 343 statistically unbiased randomly selected values from the population to assert a 95% confidence level with a sampling error or precision of ±5%. Even in this situation, the number of values may appear to be quite large if the tests are manually executed which is perhaps one reason why extremely small subsets of hard-coded values fail to find problems that are exposed by other values within that equivalent partition (all too often after the software is released). Fortunately, statistical sampling is much easier and less costly with automated test cases and probabilistic random test data generation.
Testing is Sampling
Statistical sampling is commonly used for experimentation in natural sciences as well as studies in social sciences (where I first learned it while studying sociology an anthropology). And, if we really stop to think about it; any testing effort is simply a sample of tests of the virtually impossible infinite population of possible tests. Of course, there is always the probability that sampling misses or overlooks something interesting. But, this is true of any approach to testing and explained by B. Beizer’s Pesticide Paradox. The question we must ask ourselves is will statistical sampling of values in equivalence partitions or other test data help improve my confidence when used in conjunction with customer representative data, historical data, and data we intuit based on experience and knowledge? Will scientifically quantified empirical evidence help increase the confidence of the decision makers?
In my opinion anything that helps improve confidence and provides empirical evidence is valuable, and statistical sampling is a tool we should understand put into our professional testing toolbox. There are several well established formulas for calculating sample size that can help us establish a baseline for a desired confidence level. But, rather than belabor you with formulas, I decided to whip together a Statistical Sample Size Calculator that I posted to CodePlex and also on my Testing Mentor site to help testers determine the minimum number of samples of statistically unbiased randomly generated test data from a given equivalence partition to use in a test case to help establish a statistically reliable level of confidence.
Cockamamie chaos causes confusion; controlled chaos cultivates confidence!
Better Bug Reports
Originally Published Wednesday, May 20, 2009
When we report a bug our hope is that bug is fixed. But, of course we know that isn’t always the case which is why there are usually several alternative resolutions developers, project managers, or managers may choose for resolving a bug such as postponed, won’t fix, and by design. It is unfortunately quite common to see a tester metaphorically explode into passionate fits of outrage when one of their bugs is resolved as postponed, won’t fix, or by design. It is unfortunate because these tantrums often involve the tester hurling personal insults (e.g. “How can the developer be so stupid not to fix this bug"?”), decrying product quality (e.g. “If we don’t fix this bug this product will totally suck!”), and playing the whiny customer card (e.g. “We will loose customers if we don’t fix this bug.”). Yes, in my early years I was also guilty of these sorts of irrational outbursts of hyperbole when a bug that I thought was important was resolved not fixed. But, of course, I quickly learned that such sophistical speculations rarely resulted in the bug being fixed, and mostly lessened my credibility with developers and managers.
The other day I was speaking with a tester who was a bit miffed because the developer had resolved a few of her bugs as by design and won’t fix and she asked how she could ‘fight’ these resolutions. “Well,” I began, “Getting people to change their minds usually involves negotiation and the logical presentation of facts in a non-judgmental approach. Sometimes you will succeed, and sometimes you will not succeed. As testers surely we want all our bugs to be fixed; however, from a practical standpoint that may not always be the case especially if the bug is subjective.” I previously wrote about 10 common problems with bug reporting, but, in this case I proceeded to discuss a few strategies I use to advocate bugs.
Make it easy for the developer to fix the bug
As a minimum a tester must provide a description of the problem, the environmental conditions in which the problem occurred (if localized to a specific environment), the shortest number of exact steps to reproduce the bug, and the actual results versus the expected results. Occasionally a screen shot may be beneficial, but mostly if there is a contrasting example. But, I will also point the developer to my test; especially if it is automated. Providing the developer an automated mechanism to reproduce a problem reduces a lot of overhead. Of course, in this case I am talking about an automated test case that runs in a few seconds, or an automated script that even assists the developer reproduce the problem quickly.
Provide specific contradictions to specified and/or implied requirements or standards
Of course, if the product design or functionality deviates from stated requirements pointing this out in a non-confrontational way is a no-brainer. The key here is our argument must be non-confrontational because sometimes we may misinterpret the requirements, and sometimes the requirements may change without us being aware of those changes. There are also occasionally deviations from implied requirements such a UI design guidelines as a result of the introduction of new technologies, or changes in how customers use the product based on usability studies. Other implied standards include competing products or previous versions of the product. In any case, when arguing for a bug fix based on specified or implied requirements I recommend using a compare and contrast type of approach to better illustrate the problem as I perceive it.
Provide concrete examples of customer impact
This is really important! Providing a real world scenario that clearly illustrates not only how this bug will manifest itself to the customer, but also providing corroborating evidence from customers presents a strong case in favor of a bug fix. There are several useful repositories of customer feedback testers can use to bolster their point of view such as newsgroups, popular blogs, trade journal reviews of past or similar products, at Microsoft we also have Watson and SQM data, and product support reports. Using ‘real-world’ constructive feedback is often more meaningful than an internal mutiny by a portion of the test team.
Know your primary target customer profile
Testers often like to think we are representative of our customers. However, this may not always be the case. (It has always puzzled me as to why testers seem to think they have some greater affinity to the end user customer as compared to others on the product team.) Yes, it is important that testers understand who the primary target customer is for the current project or release and that is why many teams have detailed personas of primary, secondary, and sometimes even tertiary customer audiences. Of course, if we are in the commercial software business we want our customer base to be as large as possible. But, as the number of customers increase so does the diversity of value, and as they say…you can never please everyone! So, when defending your position to fix a particular bug it is always better to frame the discussion from the point of view of the primary customer persona as compared to your own personal bias.
Use your brain, not your emotions
Passion has long been an admired trait in software testers. However, unbridled passion fraught with antagonistic accusations can be detrimental to a successful bug resolution (and sometimes even a career). Some bugs obviously need to be fixed, while others may be more dependent on several mitigating (and competing) factors such as where you are in the software lifecycle, business impact, primary customer impact, risk, etc. I think it is largely agreed that perhaps the primary role of testers is to provide information, but that means we must also gather the pertinent information and represent that information logically within the relevant context to the management team (or decision makers). Remember…reckless rants rarely render reasonable results!