I.M. Testy

Treatises on the practice of software testing

Archive for the ‘Testing Practices’ Category

Testing is Sampling

with one comment

Originally Published Thursday, July 16, 2009

It seems it is about this time of year that I need to detach a bit from the world to reflect back on the past year and reevaluate my personal and professional goals moving forward. Perhaps I am just getting older or perhaps just a bit wiser (that is synonymous with ‘sapient’ for the C-D crowd), but I find it refreshing to break away this time of year to tend to my gardens, work on my boat, read some novels, and contemplate life’s joys. Now, the major work projects are (almost) finished on my boat, the garden is planted and we are harvesting the early produce, and I reset both personal and professional development objectives for the next year and beyond. So, let me get back to sharing some of my ideas about testing.

Many of you who read this blog also know of my website Testing Mentor where I post a few job aids and random test data generation tools I’ve created. I am a big proponent of random test data using an approach I refer to as probabilistic stochastic test data.  In May I was in Dusseldorf, Germany at the Software & Systems Quality Conference to present a talk on my approach. I especially enjoy these SQS conferences (now igniteQ) because the attendees are a mix of industry experts and academia, and I was looking for feedback on my approach. I call my approach probabilistic stochastic test generation because the process is a bit more complex than simple random data generation. Similar to random data generation we cannot absolutely predict a probabilistic system, but we can control the feasibility of specified behaviors. And the adjective stochastic simply means "pertaining to a process involving a randomly determined sequence of observations each of which is considered as a sample of one element from a probability distribution." In a nutshell, my approach involves segregating the population into equivalence partitions, then randomly selects elements from specified parameterized equivalence partitions (which is how we know the probability of specific behaviors), finally the data may be mutated until the test data satisfies the defined fitness criteria. By combining equivalence partitioning and basic evolutionary computation (EA) concepts it is possible to generate large amounts of random test data that is representative from a virtually infinite population of possible data.

One of the questions that came up during the presentation was how many random samples are required for confidence in any given test case; in other words how to we determine the number of tests using randomly generated test data? This is not an easy question to answer because the sample size of any given population depends on several factors such as:

  • variability of data
  • precision of measurement
  • population size
  • risk factors
  • allowable sampling error
  • purpose of experiment or test
  • probability of selecting "bad" or uninteresting data
Using sampling for equivalence class partition testing

But, the question also brought to mind a parallel discussion regarding how we go about selecting elements from equivalence class partition subsets. I am adamantly opposed to hard-coding test data in a test case (automated or manual), but a colleague challenged me and said that since any element in an equivalent partition is representative of all elements in that partition then why can’t we simple choose a few values from that equivalence subset. I realize this approach is done all the time by many testers; which is perhaps why we sometimes miss problems. But, hard-coding some small subset of values from a relatively large population of possible values is rarely a good idea, and is generally not the most effective approach for robust test design. One problem with hard-coding a variable is that the hard-coded value becomes static, and we know that static test data loses its effectiveness over time in subsequent tests using the same exact test data. Also, by hard-coding specific values in range of values means that we have absolutely 0% probability of including any other values in that range that are not specified. Another problem with hard-coded values stems from the selection criteria used to choose the values from a set of possible values. Typically we select values from a set based on based historical failure indicators, customer data, and our own biased judgment or intuition of ‘interesting’ values.

However, the problem is that any equivalence class partition is a hypothesis that all elements are equal. Of course, the only way to validate or affirm that hypothesis is to test the entire population of the given equivalence class partition. Using customer-like values, or values based on failure indicators, and especially values we select based on our intuition are biased samples of the population, and may only represent a small portion of the entire population. Also, the number of values selected from any given equivalence partition set is usually fewer than the number required for some reasonable level of statistical confidence. So, while we definitely want to include values representative of our customers, values derived from historical failure indicators, and even our own intuition, we should also apply scientific sampling methods and include unbiased, randomly sampled values or elements from our set of values or population to help reduce uncertainty and increase confidence.

For example, lets say that we are testing font size in Microsoft Word. Most font sizes range from 1pt through 1638pt and include half-sized fonts as well within that range. That is a population size of 3273 possible values. If we suspected that any value in the population had an equal probability of causing an error the standard deviation would be 50%. In this example, we would need a sample size of 343 statistically unbiased randomly selected values from the population to assert a 95% confidence level with a sampling error or precision of ±5%. Even in this situation, the number of values may appear to be quite large if the tests are manually executed which is perhaps one reason why extremely small subsets of hard-coded values fail to find problems that are exposed by other values within that equivalent partition (all too often after the software is released). Fortunately, statistical sampling is much easier and less costly with automated test cases and probabilistic random test data generation.

Testing is Sampling

Statistical sampling is commonly used for experimentation in natural sciences as well as studies in social sciences (where I first learned it while studying sociology an anthropology). And, if we really stop to think about it; any testing effort is simply a sample of tests of the virtually impossible infinite population of possible tests. Of course, there is always the probability that sampling misses or overlooks something interesting. But, this is true of any approach to testing and explained by B. Beizer’s Pesticide Paradox. The question we must ask ourselves is will statistical sampling of values in equivalence partitions or other test data help improve my confidence when used in conjunction with customer representative data, historical data, and data we intuit based on experience and knowledge?  Will scientifically quantified empirical evidence help increase the confidence of the decision makers?

In my opinion anything that helps improve confidence and provides empirical evidence is valuable, and statistical sampling is a tool we should understand put into our professional testing toolbox. There are several well established formulas for calculating sample size that can help us establish a baseline for a desired confidence level. But, rather than belabor you with formulas, I decided to whip together a Statistical Sample Size Calculator that I posted to CodePlex and also on my Testing Mentor site to help testers determine the minimum number of samples of statistically unbiased randomly generated test data from a given equivalence partition to use in a test case to help establish a statistically reliable level of confidence.

Cockamamie chaos causes confusion; controlled chaos cultivates confidence!

Written by Bj Rollison

November 18th, 2009 at 10:27 pm

Posted in Testing Practices

Tagged with ,

Better Bug Reports

without comments

Originally Published Wednesday, May 20, 2009

When we report a bug our hope is that bug is fixed. But, of course we know that isn’t always the case which is why there are usually several alternative resolutions developers, project managers, or managers may choose for resolving a bug such as postponed, won’t fix, and by design. It is unfortunately quite common to see a tester metaphorically explode into passionate fits of outrage when one of their bugs is resolved as postponed, won’t fix, or by design. It is unfortunate because these tantrums often involve the tester hurling personal insults (e.g. “How can the developer be so stupid not to fix this bug"?”), decrying product quality (e.g. “If we don’t fix this bug this product will totally suck!”), and playing the whiny customer card (e.g. “We will loose customers if we don’t fix this bug.”). Yes, in my early years I was also guilty of these sorts of irrational outbursts of hyperbole when a bug that I thought was important was resolved not fixed. But, of course, I quickly learned that such sophistical speculations rarely resulted in the bug being fixed, and mostly lessened my credibility with developers and managers.

The other day I was speaking with a tester who was a bit miffed because the developer had resolved a few of her bugs as by design and won’t fix and she asked how she could ‘fight’ these resolutions. “Well,” I began, “Getting people to change their minds usually involves negotiation and the logical presentation of facts in a non-judgmental approach. Sometimes you will succeed, and sometimes you will not succeed. As testers surely we want all our bugs to be fixed; however, from a practical standpoint that may not always be the case especially if the bug is subjective.” I previously wrote about 10 common problems with bug reporting, but, in this case I proceeded to discuss a few strategies I use to advocate bugs.

Make it easy for the developer to fix the bug

As a minimum a tester must provide a description of the problem, the environmental conditions in which the problem occurred (if localized to a specific environment), the shortest number of exact steps to reproduce the bug, and the actual results versus the expected results. Occasionally a screen shot may be beneficial, but mostly if there is a contrasting example. But, I will also point the developer to my test; especially if it is automated. Providing the developer an automated mechanism to reproduce a problem reduces a lot of overhead. Of course, in this case I am talking about an automated test case that runs in a few seconds, or an automated script that even assists the developer reproduce the problem quickly.

Provide specific contradictions to specified and/or implied requirements or standards

Of course, if the product design or functionality deviates from stated requirements pointing this out in a non-confrontational way is a no-brainer. The key here is our argument must be non-confrontational because sometimes we may misinterpret the requirements, and sometimes the requirements may change without us being aware of those changes. There are also occasionally deviations from implied requirements such a UI design guidelines as a result of the introduction of new technologies, or changes in how customers use the product based on usability studies. Other implied standards include competing products or previous versions of the product. In any case, when arguing for a bug fix based on specified or implied requirements I recommend using a compare and contrast type of approach to better illustrate the problem as I perceive it.

Provide concrete examples of customer impact

This is really important! Providing a real world scenario that clearly illustrates not only how this bug will manifest itself to the customer, but also providing corroborating evidence from customers presents a strong case in favor of a bug fix. There are several useful repositories of customer feedback testers can use to bolster their point of view such as newsgroups, popular blogs, trade journal reviews of past or similar products, at Microsoft we also have Watson and SQM data, and product support reports. Using ‘real-world’ constructive feedback is often more meaningful than an internal mutiny by a portion of the test team.

Know your primary target customer profile

Testers often like to think we are representative of our customers. However, this may not always be the case. (It has always puzzled me as to why testers seem to think they have some greater affinity to the end user customer as compared to others on the product team.) Yes, it is important that testers understand who the primary target customer is for the current project or release and that is why many teams have detailed personas of primary, secondary, and sometimes even tertiary customer audiences. Of course, if we are in the commercial software business we want our customer base to be as large as possible. But, as the number of customers increase so does the diversity of value, and as they say…you can never please everyone! So, when defending your position to fix a particular bug it is always better to frame the discussion from the point of view of the primary customer persona as compared to your own personal bias.

Use your brain, not your emotions

Passion has long been an admired trait in software testers. However, unbridled passion fraught with antagonistic accusations can be detrimental to a successful bug resolution (and sometimes even a career). Some bugs obviously need to be fixed, while others may be more dependent on several mitigating (and competing) factors such as where you are in the software lifecycle, business impact, primary customer impact, risk, etc. I think it is largely agreed that perhaps the primary role of testers is to provide information, but that means we must also gather the pertinent information and represent that information logically within the relevant context to the management team (or decision makers). Remember…reckless rants rarely render reasonable results!

Written by Bj Rollison

November 18th, 2009 at 10:25 pm

Posted in Testing Practices

Tagged with

Exploratory Testing Inside The Box

without comments

Originally Published Friday, March 20, 2009

Much of the information about exploratory testing focuses on testing from an end-user perspective. Pundits of exploratory testing claim the approach is also useful from a white box test design approach, but I have yet to see any practical discussion or examples. But, professional testers use exploratory testing approaches all the time from a white box perspective to explore the code for untested paths. Professional testers learn about areas of the code that are at risk, and reactively design effective tests to evaluate previously untested or under-tested areas of the code.

Let’s use a simple example to get started. Suppose we had to drive from Lynnwood, Washington to Puyallup, Washington without a map or (GPS auto navigation system). Just as we have ‘clues’ to point us in the various directions while performing exploratory testing at the user interface we have the numerous highway signs to help us navigate various routes to complete our journey. And, it is up to us to decide which route to take. The shortest route is I-405 south to SR-167. But, I-405 is always at a stand-still, so another popular route is I-5 to SR-18 east then SR-167 south. Of course, after traversing those routes a couple of times the scenery (and crawling in traffic) gets a bit boring, so we might find additional less travelled routes. But, regardless of how many times we make the journey or how many different drivers we choose to complete this journey it is highly unlikely that we will traverse every possible route in any reasonable amount of time. Some routes may not be obvious such as I-5 south to Seattle, then taking the ferry from Seattle to Bremerton and continuing to SR-310 south to SR-16, then I-5 north to SR-167. And, of course some routes are impossible (or at least so convoluted they would be improbable).

map

Fortunately, control flow through even complex algorithms is not as labyrinthine as the state roadways in western Washington. And, just as the department of transportation uses various tools to measure traffic volumes testers can use path profiling tools to measure frequently traversed paths through the code. We can also use code coverage tools to see what paths have or have not been traversed, and which decisions are made at branching statements. Using code coverage and profiling tools to map control flow through the algorithm we are able to more thoroughly explore the code. Using our ‘map’ we can learn what paths have not been traversed and even whether or not certain paths through the code are even possible. After we explore the ‘map’ we can more effectively design additional tests to traverse un-tested paths through an algorithm. Common structural test design techniques include  to evaluate code statements, code blocks, simple decisions or branches, or multiple Boolean conditional clauses in a single predicate statement. Then, using those test designs we can execute those tests either using stubs or mock objects at the unit or component level, or through the user interface to traverse those paths to reduce overall susceptibility to risk.

I discuss the various techniques commonly used in structural testing in Chapter 6 of our book How We Test Software At Microsoft, and also address the subject here, and here. Of course, the application of structural techniques is usually referred to as code coverage analysis. But, using this simple analogy hopefully other testers can begin to understand how exploratory testing approaches are used not only from the user interface, but also below the GUI at the code level. As Boris Beizer initially stated, "all testing is essentially exploratory in nature," and code coverage analysis (analyzing code coverage results to learn about, design additional tests, then execute those tests) also makes great use of exploratory approaches inside the box.

Written by Bj Rollison

November 18th, 2009 at 8:26 pm

Posted in Testing Practices

Tagged with

Basic Blocks Aren’t So Basic

with 8 comments

Originally Published Friday, March 06, 2009

In the book How We Test Software at Microsoft I discuss structural testing techniques. Structural testing techniques are systematic procedures designed to analyze and evaluate control flow through a program. These are classic white box test design techniques, although my friend and respected colleague Alan Richardson states in his review of the book that he also employs similar techniques on models and I have to agree with him on that point.

Also, Peter M. sent me mail pointing out a reasonably obvious bug in the code chunks on pages 118 and 119. Both functions are declared as static void, but each has a return statement. Somehow this oversight made it through the review process, but of course a return statement in a function declared as static void would cause a compiler error. (Thanks for discovering that bug Peter and letting us know so we can fix it for the 2nd edition!)

Peter also asked for further clarification of how blocks are counted, and why a test that evaluated both conditional clauses in the compound expression as true in the below example (and on page 119) results in 85.71% coverage. Unfortunately, the answer for that is not simple.

Some surprising details…

   1: public static int BlockExample1(bool cond_1, bool cond_2)

   2: {

   3:   int x = 0, y = 0, z = 0;

   4:   if (cond_1 && cond_2)

   5:   {

   6:     x = 1;

   7:     y = 2;

   8:     z = 3;

   9:   }

  10:  

  11:   return x + y + z;

  12: }

The above code can be re-written as:

   1: public static int BlockExample2(bool cond_1, bool cond_2)

   2: {

   3:   int x = 0,

   4:   y = 0,

   5:   z = 0;

   6:   if (cond_1)

   7:   {

   8:     if (cond_2)

   9:     { 

  10:       x = 1;

  11:       y = 2;

  12:       z = 3;

  13:     } 

  14:   }

  15:  

  16:   return x + y + z;

  17: }

First, a ‘basic block’ is defined as a set of contiguous executable statements with no logical branches which seems pretty straight forward. So, based on our definition of basic blocks it appears there are 4 blocks of contiguous statements. However, the conditional clauses on line 4 and line 6 in the BlockExample2 method introduce logical branches which theoretically introduce 2 implicit blocks (e.g. one block when control flow follows the true path, and another block when control flow follows the false path). So, that is essentially how the 6 blocks are determined. But, that’s not the end of the story.

If we pass a Boolean true to both cond_1 and cond_2 conditional clauses the block coverage measure in BlockExample1 results in 85.71% coverage; however, the block coverage measure for BlockExample2 actually results in 100% coverage as illustrated below.

coverage What? How can this be? Both BlockExample1 and BlockExample2 are syntactically identical. Well, to understand this we would really need to dig deeper into compilers and coverage tools. That is well beyond the boundaries of this blog, but the IL does provide some insight.

msil

The MSIL for BlockExample1 is on the left and BlockExample2 is on the right. Now, I don’t want to do a deep dive into MSIL, but  those who are really observant can see that for some reason the Visual Studio compiler evaluated a branch in BlockExample1 to false (instruction IL_0008), and then instruction IL_000c compares the 2 values for equality and instruction IL_0015 appears to evaluate the optimized compound conditional expression to true. Compare that to BlockExample2 MSIL which shows the first comparison of 2 values occurs at IL_0009 and the branch is evaluated as true (IL_000f) and the second comparison of 2 values occurs at IL_0014 and again evaluates to true at instruction IL_001a.

But wait…it gets even more confusing. We typically measure structural coverage using the debug build. So, imagine my surprise when I recompiled the code using the retail build settings and again passed true arguments to the cond_1 and cond_2 parameters for BlockExample1 and BlockExample2 and the coverage tool in Visual Studio indicated these methods now only had 4 blocks, and the block coverage measure for both methods was 100% as illustrated below.

coverage2

Also, interestingly enough the compiler optimized the code so both methods had identical MSIL op code instructions as illustrated below.be2Steve Carroll (a senior developer in Visual Studio) wrote we "shouldn’t be too concerned if you can’t exactly identify where all the blocks are.  When you turn the optimizer on your binary, block counts are fairly unpredictable. Don’t worry though, the source line coloring will almost always lead you to the parts of the code that you need to worry about targeting to get your coverage stats up."

I agree with Steve when he states block counts are unpredictable when the code is optimized (and different tools that measure block coverage may provide different results). However, I only partially with his statement that source line coloring leading us to parts of the code we need to test. Maybe it will, maybe it won’t. But, professional testers performing an in-depth analysis of code coverage results will help us identify important parts of the code that require further investigation and testing.

So, what does it all mean?

Block testing is useful for unit testing and designing white box tests for switch statements and exception handlers (based on how we can track control flow through source code using a debugger as opposed to through the IL Disassembler). But, as I stated in How We Test Software at Microsoft block testing is the weakest form of structural testing. But, it does provide a different perspective as compared to other structural approaches or techniques and is useful when used by a professional tester in the right context.

But, the important point here is that just as we wouldn’t rely on only one tool to tune the carburetor on an automobile, we certainly would rely on only one technique or approach for designing structural tests; and we certainly wouldn’t only rely on structural testing as a single approach to testing. This example further reinforces another important point that I make in the book; code coverage is not directly related to quality. Any professional tester can clearly see that although we are able to achieve high levels of coverage with one test, these methods are not at all well tested.

Only a fool would use code coverage metrics to derive some measure of quality, or suggest the implication that high coverage measures equal greater quality. In truth, the value of code coverage is in its ability to help professional testers identify areas of the code that have not been previously exercised and to design tests to evaluate those areas of the code more effectively to help reduce overall risk.

If we don’t execute an area of code then we have zero probability of exposing errors in that code if they exist. However, just because we do execute a code statement doesn’t mean we expose all potential errors. But, it at least increases the probability from 0% and helps reduce risk.

Written by Bj Rollison

November 18th, 2009 at 8:20 pm

Prescriptive vs. Descriptive ‘Scripted’ Tests

with 2 comments

Originally Published Tuesday, December 16, 2008

Something that raises red flags in my brain is hard-coded strings or test data in either a manual test or an automated test. Yes, I know that sometimes there are times when a test must be very prescriptive and use specific data and follow specific procedures, but I am absolutely amazed how often I see examples of test cases that are so prescriptive in the detail of execution that it completely takes any thought out of executing that test. While it can well be argued that the execution of that test might very well be a brain-dead activity, I would also argue that the person who wrote such a test also lacks creativity and generally has no clue of how to actually design a test.

We did a simple experiment on test design and execution. The purpose was to see if we could design a ‘scripted test’ that provided the tester with greater freedom, cognitive engagement, and deductive reasoning.

The simulation in this experiment was a simple web page in which the user entered a stock ticker symbol (test data), pressed the "Get Quote" button, and compared the displayed result against the expected price at that time (using a real-time 3rd party stock quote monitoring system). The test was a positive test, but was written 3 different ways as illustrated below.

The first test was very prescriptive and was written as follows:

Purpose: Verify the web page displays the most recent quote for a valid stock ticker symbol registered on a major stock exchange

Steps:

    1. Enter "MSFT" in the Stock symbol text box
    2. Press the "Get Quote" button on the web page

Verify: The displayed quote matches the real-time quote.

Given this test over 95% of the subjects simply entered MSFT and looked for a result. Some did not even appear to compare the result against the real-time quote.

We modified the steps in the test as follows and used a second study group.

Steps:

  1. Enter a valid stock ticker symbol in the Stock symbol text box (e.g. "MSFT")
  2. Press the "Get Quote" button on the web page

In this second session more than 75% of the the test subjects still only entered MSFT and looked for a result. On a later day in the week, we asked the same group to run the test again. Once again over 75% entered MSFT as the test data.

So, we modified the steps in the test once more as follows and used a third group in the experiment.

Steps

  1. Enter a valid stock ticker symbol in the Stock symbol text box from a list of available stock ticker symbols at
    <Link to NYSE>
    <Link to NASDAQ>
    <Link to S&P>
    <Link to London stock exchange>
    <additional links>
  2. Press the "Get Quote" button or press the Enter key

In the third part of the experiment over 95% of the third group clicked the links and selected a stock ticker symbol at random. Some testers copied and pasted the ticker symbols from the linked web pages into the Stock symbol text box, but the majority simply entered the symbol via the keyboard. Some (less than 5%) of the participants simply entered MSFT. (Which is not really surprising since they work there!) What was more interesting was that when the same groups were given this test at a later time over 95% of the testers selected a different link and 99% selected a different stock ticker symbol.

This third part of the experiment essentially is the same test (proves the same hypothesis) but uses a descriptive ‘scripted’ test approach. A more descriptive test can still achieve the stated purpose and provides 2 more important benefits.

  • The purpose of the positive test (verify the web page displays the most recent quote for a valid stock ticker symbol registered on a major stock exchange) is achieved without hard-coding specific test data or specific results to check against. This means the tester has to use basic deductive reasoning in order to validate the results of the test.
  • The breadth of test data used in the test significantly increased (even if the test was executed by the same person), thus increasing the variability of each successive test and provides the tester with great freedom in selecting the data to use in each test and how to interact with the system under test.

Whether or not the "Get Quote" button is pressed or the Enter key is pressed is tangential to the purpose of the test, so in this case it is not important what action the tester takes to send the request to the web service to get the stock quote; he or she has to trigger that event.

I suspect that one reason why many ‘scripted’ tests are very prescriptive in nature is because they are written from the "watch me" perspective. In other words, the test is crafted from "this is what I did, so that is how I will write my test…word for word." I also suspect that in many of these cases the tester really doesn’t have a clear purpose of what he or she is trying to prove or disprove, that tester is simply writing a script or developing an automated test to satisfy some thoughtless process or increase some magic number.

Watching a tester perform a set of steps and then recording that same set of steps in either a manual or an automated test does not constitute test design; it is a brain-dead activity. Test design is not simply watching a person perform a set of actions and ‘scripting’ that into a ‘test.’ And test design doesn’t mean reacting to the results of one test and thinking of another test ‘on-the-fly.’ Designing robust tests is a separate activity from test execution. Designing a robust, descriptive scripted test that enhances the effectiveness of the testing effort requires incredible creativity in order to achieve its desired objective.

So, the next time someone tells you that scripted tests are too restrictive, impede the freedom of the tester, or limits your creativity I would suggest to you that that perspective is rather narrow-minded and limited to a vision in which ‘scripted’ tests are all highly prescriptive in nature and result in a set of brain-dead steps. Conversely, any professional tester realizes that test design is a very creative process involving the application of your cognitive and analytical skills to help you design a test that aids you in proving or disproving your deduced hypothesis from pluralistic perspectives within the context of the situation, and to ultimately enhance the effectiveness of your testing effort.

Written by Bj Rollison

November 18th, 2009 at 7:55 pm

Posted in Testing Practices

Tagged with

How We Test Software At Microsoft

with 7 comments

Originally Published Saturday, December 06, 2008

hwtsamsThis past year has been quite busy for me. Too busy. Besides trying to keep up with my busy teaching schedule, driving some key initiatives and collaborating on others, planning new course development for SDETS, I presented at 11 conferences around the world, wrote a few magazine articles, and developed a new software test automation program at the University of Washington. Somewhere in the midst of all that I co-authored a book with Alan Page and Ken Johnston that is now available to order, and should be on bookstore shelves within a week.

Collectively we have more than 3 decades of experience in various roles and business groups around the company. Coupled with insights and experiences from the many other testers (past and present) at the company the book is filled with great ideas and examples of some of the testing processes and procedures used around the company.

But it is not just another book of how to test software. This book provides a lot of insight into Microsoft, illustrates some of our best practices (and also reveals some of our faux pas’), and answers the question (albeit indirectly) we get all the time; “How do you test software at Microsoft?”

Written by Bj Rollison

November 18th, 2009 at 7:48 pm

Posted in Testing Practices

Tagged with

Boundary Testing Isn’t Guessing at Numbers!

with one comment

Originally Published Tuesday, November 04, 2008

At a recent conference a speaker posed a problem in which a field accepted a string of characters with a maximum of 32,768 bytes, then asked the audience what values they would use for boundary testing. Immediately some of the attendees unleashed a flurry of silly wild ass guesses (SWAG) such as “32,000,” “64,000,” and, of course, what attempt at guessing would be complete without someone yelling out “how about a really large string!” One person asked whether it was bytes or characters? A reasonable question, but the speaker than began talking about double byte characters (DBCS). (Double byte is, in technological time, a relatively antiquated character encoding technology since most modern operating systems process data as Unicode.)

So, while some folks in the audience continued to shout out various SWAGs, I was still pondering why anyone in their right mind would artificially constrain a user input to such a seemingly ridiculous magic number within the context of computer processing and programming languages. Programming languages allow specific ranges of numeric input. Most strongly typed languages such as the C family of languages have explicit built in or intrinsic data types that include signed and unsigned ranges. For example, an unsigned short is 2^16 or 0 through 65,535, and a signed short is also 2^16 but the range is -32,768 through +32,787. Since the speaker didn’t indicate what programming language was used in this magical field, the only logical conclusion a professional tester can rationally deduce is that 32,768 is a magic number, or in other words a “hard-coded” constant value embedded somewhere in the code.

Asking questions is important! But, asking a bunch of contextually-free questions or throwing out random guesses is usually not the most efficient or productive use of one’s time. Asking specific rational questions or making logical assertions based on knowledge and understanding is important, and is generally more productive; especially when testing the boundary conditions of input or output values in software. Boundary testing is a technique that focuses on linear input or output values that are fixed, or fixed-in-time and used for various computations or Boolean decisions (branching) within the software. Similar to most testing techniques boundary testing focuses on exposing one category of issues based on a very specific fault model, and is an extremely efficient systematic approach to effectively expose that particular category of issues. In particular boundary testing is useful in identifying problems with:

  • improperly used relational operators
  • incorrectly assigned constant values
  • and computational errors that might cause an intrinsic data type to either overflow or wrap especially when casting or converting between data types (proper identification of the data type and knowledge of the minimum and maximum ranges is critical)

I previously wrote about approaches to help the tester identify potential boundary conditions, and how to design tests to adequately analyze those specific boundary values. As I previously stated, boundary testing involves the systematic analysis of a specific value. For example, a long file name on the Windows platform (both the base file name and the extension) should not exceed 255 characters. For file types that use a default 3-character extension the most interesting boundary values are 1 character (minimum base file name length) 251 characters (maximum base file name length assuming a standard 3-character extension), and 255 characters (with or without an extension to test what occurs with a base file name equal in length to the maximum base file name with a standard 3-character extension. (Of course, if the default extension is 1-character, or 2-characters, or 4-characters, etc., than the maximum base file name without extension needs to be recalculated.) Now, let’s see why specific values are important and critical to accurately analyze boundaries.

On Windows Xp I used Notepad to test file name boundaries with a default 3-character extension. Of course the minimum -1 value is an empty string, and minimum and minimum +1 is saving a file with a 1-character and 2-character file name respectively. Next I entered a base file name of 250-characters (maximum -1) and 251-characters (maximum allowed assuming a default 3-character extension) and these file names were saved to the system with the default extension. Then I entered a 252-character file name and I got the expected error message indicating the file name is too long. But, what about my boundary of 255 characters maximum. (IMPORTANT – boundary values are not just at the edges of the extreme ranges of values, but there could be sub or supra boundary values within a range of values that may occur at the edges of equivalent class ranges, or specific values in special or unique equivalence class subsets.) So, I wondered what would happen if I entered just a base file name of 255 characters (which is the maximum length of a file name assuming an extension is also part of that file name)? Interestingly enough, on Windows Xp the operating system saved a file with 255-characters, but it did not have any extension which means that there was no application associated with the file. The same occurred with a 254-character base file name, and when I tried the maximum +1 of the overall complete file name range I was again presented with the same message I got with a 252 character base file name indicating the file name was too long.

Fortunately, the above issue was fixed in Windows Vista. But, as sometimes occurs in complex systems one fix occasionally leads to a different (but related)issue in the same functional area which is why regression testing is typically an effective testing strategy. So, when I ran my ‘regression tests’ on Windows Vista I quickly discovered the system would not save a file with only a base file name of any number of characters greater than 252-characters via Notepad. But, as I ran the specific boundary tests I realized something very important! When I entered a base file name of 252-characters I received the following error message.

image

And when I attempted the test with a base file name composed of any number of characters greater than 252 I received the following error message.

image

Now, those of you who are paying attention realize these 2 messages are different. Of course, in either case a file is not saved to the system which is what I expect; however, there is a strange anomaly here. Although one might notice the first message prepends the drive letter to the string of 252 characters, and the second message does not. But, the important question doesn’t really have anything to do with the message text per se, in this case the professional tester tester should ask, “why is there an apparent conditional branch in the code that shunts control flow one way for a base file name of 252 characters and a different path for a base file name greater than 252 characters?”

Of course, if we just guessed, or tested ‘a really large string of characters, we might have never exposed this anomaly which occurs only at the maximum + 1 length of a base file name (assuming a default 3 character extension). Interestingly enough, if a highly skilled, technically savvy tester had designed white box tests for decision testing or path analysis then I suspect he or she could have very easily found this anomaly with even greater efficiency and exposed it earlier in the cycle.

The point here is that boundary testing is simply not random guessing, wild speculation, or simple parlor tricks. The technique of boundary value analysis requires in-depth knowledge of what the system is doing behind the user interface, and careful analysis of system and data to accurately determine the specific boundary conditions and a rigorous analysis of linear values immediately above and below each identified specific boundary value. Testers must be able to properly identify the specific and interesting boundary values based on in-depth knowledge of the system, an understanding of what is happening beneath the user interface, and experience.  Then we can perform a more systematic analysis of any identified boundary conditions and potentially increase our probability of identifying real anomalies caused by this specific fault model. Boundary value analysis is a prime example of where good enough is simply not good enough in our discipline…we must be technically spot on!

Written by Bj Rollison

November 18th, 2009 at 7:40 pm

Equivalence Class Partitioning: Is It Real Or Is It a Figment In Our Imagination?

with 8 comments

Originally Published Tuesday, September 30, 2008

Last week I attended the Software Testing and Performance conference in Boston. I presented a workshop on Systematic Testing Techniques, as well as a talk on random test data generation, and combinatorial analysis. One way I continue to learn about our profession and increase my own knowledge is by going to conferences to hear different points of view from practitioners from around the world. So, I also attended several talks during the conference, but there was one talk in particular that was particularly entertaining (and I don’t mean that in a good way).

When I listen to other testers sometimes I hear something that is new to me and I desire to learn more about it. Sometimes I hear something prophetic that makes me think, contemplate alternatives, or reflect more deeply on my own personal perspectives. Sometimes I hear something revolutionary that causes me to reevaluate my position. And, sometimes I hear something so irrational I almost barf up a lung!

In this case the speaker opened his talk with an attack on a quote from the ISTQB foundation syllabus used to describe boundary testing which states, "Behavior at the edge of each equivalence partition is more likely to be incorrect…" Now I know the speaker a bit, and I know he disdains the ISTQB and other certification organizations, but what surprised me was his initial rebuttal by emphatically stating, "equivalence class partitions are figments of our imaginations!"

These days I usually just try to shake off wild and baseless comments as bombastic bloviations used to generate controversy. But, in this case what caught my attention was when the speaker later said that he and another well-known person defined boundaries as "a dividing point between two otherwise contiguous regions of behavior; or a principle or mechanism by which things are classified into different sets." What!? I couldn’t believe what I heard, so I had to stop reading email and look up at the presentation. As I visually processed the words I thought my head was going to explode from the so seemingly obvious contradiction.

Now, I am not a linguistic expert, but I am pretty sure that "otherwise contiguous regions of behavior" and "classifying things into different sets" are just overly simplistic ways of describing equivalence class partitions. But, I could be wrong. So, I began thinking that since most people start learning about sets in elementary schools they probably understand the foundation of equivalence class partitioning is set theory which basically states "a set is an aggregate, class, or collection of objects," and the collection of objects or ‘classification of things’ in different sets is based on an equivalent relation between the elements in each set. The application of equivalence class partitions in our profession is elegantly explained by Lee Copeland in his excellent book A Practitioner’s Guide to Software Test Design by stating "An equivalence class consists of a set of data that is treated the same by the module or that should produce the same result." Equivalence class partitioning is also discussed in-depth in books by noted experts in the industry such as Beizer, Binder, Myers, Jorgensen, Perry, and Marick just to name a few.

In fact, the concept of sets and equivalence almost seems instinctive in most humans and is generally expressed at a young age. I remember my young daughter at age 2 or so separating beads by color into "different sets" on the carpet. The red beads in one group, blue in another, and so on. She was diligent to make sure the different sets of beads did not touch as she put them into the appropriate piles. If a pile of beads got to close to another pile she would run the edge of her hand between the "contiguous regions" to clearly delineate the "dividing point."  When I asked her to get me a red bead, she would randomly grab one from the pile, because all the red beads were…red, and there were no significant differences among the red beads (elements) in the set she created that were relevant in that context of that game.

Perhaps the majority of the industry’s experts are wrong and I wasted my time reading books on software testing principles and practices because this person is right and equivalence class partitioning is really only a figment of our imagination.

However, on the other hand, although I certainly have never claimed to be an expert, I am still pretty darn sure the underlying foundation of computers and computer software is somewhat influenced by mathematical principles, and as a tester I might be able to use those same principles to help me design effective tests that might help me better evaluate discrete functional capabilities and attributes of software components and more efficiently expose certain categories or patterns of errors.

But, why should we get mired down and confused with facts (especially all that boring math stuff) when it is much easier to appeal to some peoples’ emotions. So,forget everything you just read…and if anyone asks why testing is so hard just tell them testing is an art with no practical foundation in logic because software is…well, it’s just magic!

Written by Bj Rollison

November 18th, 2009 at 7:33 pm

Functional Techniques are More Than Black Box Techniques

without comments

Published Thursday, August 07, 2008

Too often many tester’s mistakenly assume that functional techniques such as equivalence class partitioning, boundary value analysis, combinatorial analysis, etc. are simply "black-box" testing techniques. I suspect this rather narrow perspective of functional testing techniques is due to a lack of in-depth understanding of testing throughout the product lifecycle, and confusion between test design and test execution. I also suspect this incorrect assumption is perpetuated by ‘testers’ whose only approach to testing is interacting with the software via the user interface with the intent to find bugs. Of course when a person can only ‘test’ via the user interface then everything is a "black-box."

When I teach software testing courses I reinforce the concepts of black, white, and gray boxes as perspectives for test design; not for test execution. Using these concepts as test design approaches designing tests from black box test design makes no assumptions about the code and focuses on inputs and outputs from the end user interface, white box test design explicitly uses the code to help the tester design more effective tests, and gray box principles use an in-depth understanding of the entire system to design effective tests within the context of the software solution.

Functional testing techniques were derived years ago by industry experts and the founding fathers of software testing to provide systematic procedures that help identify specific categories of problems based on fault models that are commonly encountered during the development process. Certainly, functional techniques such as combinatorial analysis are very effective when applied from the end-user interface; however, we all know the fault model which combinatorial testing is effective in identifying involves semi-coupled or directly dependent unordered parameters. So, in order for this functional technique to be applied correctly the tester must have an in-depth understanding of the domain space and know which unordered parameters are semi-coupled or directly dependent. Boundary testing is very effective in evaluating how specific linear variable conditions of independent parameters are handled and is useful for identifying problems with overflowing intrinsic data type ranges, casting between data types, incorrect usage of relational operators, etc. Effective boundary testing means the tester must have some knowledge of data types and programming concepts and insight into the code is helpful in order to design the most effective tests.

Many of the functional techniques we apply at the end-user interface actually require a solid understanding of the underlying system to be most effective. So, designing tests using many functional testing techniques actually approaches test design from a gray-box perspective. These functional testing techniques can also be effectively applied during unit testing to drive quality upstream and prevent problems from getting into the product. Unit tests, of course, that include comprehensive boundary value analysis are tests designed from a white box perspective.

This doesn’t mean that functional testing techniques shouldn’t be executed from the end-user interface. But, the purpose of functional testing techniques is to help testers design tests from multiple perspectives in order to perform a more in-depth, systematic analysis of software, and potentially expose very specific categories of problems based on empirical fault models. The execution of those tests may be applied either at the code level or from the end-user interface. But, effective application of functional techniques requires in-depth system level knowledge and cognitive skills that extends below the end-user interface.

Testers must embrace and extend the role of testing beyond the simple ability to find bugs at the end-user interface. Testers must start testing earlier as a partner in the process rather than an adversary of the development team. Testers must become proficient in various approaches necessary to provide a wide variety of information to the key decision makers throughout the development lifecycle instead of at the end of it. Quite simply, professional testers need to break beyond the "black-box "barrier!

Written by Bj Rollison

November 18th, 2009 at 7:24 pm

Posted in Testing Practices

Tagged with

Do Testers Do Code Reviews?

with one comment

Originally Published Tuesday, March 11, 2008

This weekend on the flight from Seattle to Ireland I finally got to catch up on some reading. One of the books I grabbed off the shelf that I hadn’t gotten around to reading yet was Best Kept Secrets of Peer Code Review by Jason Cohen. The book is 160 pages packed with great information, and I highly recommend it for some fresh perspectives of the value or ideas on improving code reviews. For decades the industry collected mounds of empirical data that pretty clearly illustrates the value of code reviews in a software development lifecycle in the early detection and removal of issues and anomalies. With the industry wide push to drive quality upstream via approaches such as agile programming and test driven development there is a resurgence of effort by developers to engage in more frequent code reviews.

Many people seem assume the primary reason why Microsoft (and other companies) is recruiting more technically skills testers is simply to automate tests. But, in fact, many testers at the company engaged in writing test automation long before our efforts towards engineering excellence. The skills and knowledge that people who have a much deeper understanding of the ‘system’ bring to an organization extends well beyond their ability to write automation.

But, should testers be performing code reviews? In my opinion, code reviews are simply another approach in a myriad of approaches to testing a software project, and professional testers should be able to engage the testing effort from a variety of testing perspectives.

Testers who participate in project code reviews are part of the team effort to drive quality upstream, reduce certain classes of issues before they are checked into the build, improve overall long-term maintainability of the code base, and effectively reduce long term costs. Just as testers participate in the early stages of the design and requirements phase of a project to provide valuable input, some testers on every team should engage in code reviews either with other developers, or perhaps as the primary reviewers of project code. Some people will argue that testers may be biased by participating in code reviews and lose the ‘customer’ perspective. This may be true in some cases, but I suspect that most professional testers clearly understand the different classes of issues that are more easily found in a code review versus other approaches of testing. But, testers who engage in code reviews tend to have a much deeper understanding of the overall project, and they are also able to identify potential areas of the code that may be more problematic and focus additional testing approaches in those areas.

Written by Bj Rollison

November 18th, 2009 at 6:17 pm

Posted in Testing Practices

Tagged with