Archive for the ‘Testing Practices’ Category
Sometimes bugs find you
Well, it’s another new year. Like a lot of folks I have spent the last few days reflecting on the past year and contemplating this coming year. I won’t bore you by rambling on about my thoughts, reflections, or ambitions; they are mostly personal. Professionally, I will continue to strive to improve myself in my chosen discipline, seek out new challenges and opportunities, and share my experiences with those who follow my posts. The beginning of the year is a busy time for me arranging my conference schedule (which I am cutting back on) and preparing to teach a software testing course and a software test automation course at the University of Washington, and getting engaged with 2 key internal projects intended to help improve our internal engineering processes at Microsoft.
Many of you know that I started at Microsoft on the Windows International Test Team, and internationalization (I18N), globalization (G11N) and even localization (L10N) are topics that I have always been interested in. In 2009 I posted a series on localization testing (Part 1, Part 2, Part 3, and Part 4). Last year, I designed and developed an internal class on globalization testing basics for non-globalization experts who want to incorporate globalization test strategies into their test designs in an attempt to find bugs sooner and drive quality upstream. As the world becomes even more connected and there is rapid growth of customers around the globe it just makes sense that we need to design our software that our customers will value in their (local) context. So, over the next few weeks I will do a series of posts on globalization testing.
But first, I’d like to share a globalization type bug that I found…or should I say that found me. This is a great example of bugs that you might come across using a strategy I used in my previous teams to help drive international sufficiency testing upstream. In this post on international sufficiency testing I explained one strategy was to get folks on the team to set their default user locale to something other than US-English. This particular bug was initially detected the day after writing the post discussing a bug that was found by an SDET taking my globalization testing basics course. To replicate the bug for screen shots, I customized my number format via the Region and Language control panel applet by changing the decimal symbol from the default period character (‘.’) to the small Latin letter d (‘d’). After writing the post, I did not restore the machine to it’s default state (using the period character as the decimal symbol).
The next day I came into the office, logged onto my computer and noticed the following cryptic scripting error message on the desktop. I hadn’t launched any other app yet, so I deduce this is likely caused by some process that starts automatically when logging in. So, I give the faithful Windows 3 finger salute to bring up the task manager and scan through the list of running processes.
It didn’t take long to associate this error dialog with the communicator.exe process. But, at first I wasn’t quite sure what was causing this error. Later that morning, I reset the user locale settings back to the default settings. After lunch I logged back onto my machine and no error message!
So, being a tester, curiosity got the better of me and I felt compelled to try to replicate the initial anomaly. To make a long story short, it didn’t take too long for me to put the pieces of the puzzle together and figure out the customized decimal symbol was causing an error. But, after troubleshooting a bit it was much worse than thought because this error just didn’t occur with a letter character, it also occurred when I changed my decimal symbol to a comma character (‘,’) which some international locales use as the decimal symbol. So, I contacted my friend and colleague Alan Page who just so happens to work on the Communications team. He quickly looked into it for me and announced this problem does not occur in the latest release. (Note to self…update your machine to the latest self host bits.)
What is important here is that this bug did not require a different language version. I didn’t need to know a different written language. This bug was exposed simply by customizing the settings in the Regional and Language control panel applet and using the system as a ‘normal user.’ So, what I did need to know was a bit of technical knowledge of the system (specifically around the National Language Support or NLS) and how to configure the system to help me expose potential globalization issues. It’s not magic and it’s not rocket science. So, over the next few weeks I plan to share some testing approaches to help other testers find problems similar to this and incorporate globalization testing into their test strategies.
Combinatorial Testing: Testing with Negative Values
Much of western Washington got its first look of winter starting Monday. This year our winter is supposed to be especially harsh, and it started with a pre-Thanksgiving snow and below freezing temperatures. I woke up yesterday morning with the sun shining down on a beautiful white blanket of snow about 10 inches deep. Since it was a beautiful day, I got my cross country skis out and went for a trek around the neighborhood. I got back home after about 2 hours and went sledding in the backyard with my daughter. Of course, we had to make several snow angels before heading in for some hot chocolate beside the fire.
Up to this point in our discussion of combinatorial testing our models and the resulting tests have focused primarily on valid input values. This is commonly known as positive testing because all valid input combinations should result in a valid output condition or state. Testing different combinations of valid values for multiple input parameters that affect a common output condition or state sometimes exposes unexpected issues, and it also helps improve overall confidence in the feature being tested. Also, since positive testing should result in the behavior or output condition expected by the customer (most customers don’t throw junk inputs to see how many error messages they can find) then designing an automated oracle for positive tests is generally easier than designing automated oracles for negative tests. Basically, an automated oracle for a positive combinatorial test should evaluate whether the output condition or state matches the expected result and are there no unexpected issues such as an error message, an unhandled exception, or other failure.
But, shouldn’t we also test for negative input values as well? That’s a really great question, and I must say at this point I am not overly convinced it is necessary for 2 reasons.
- Most exception handling is the result of single mode errors. In other words, let’s say we have a Windows form that takes 3 integer inputs and the inputs are not validated until the OK or Apply button is clicked. And let’s say that we input the character ‘A’ in each input control. Typically, when the user presses OK or Apply an exception will be thrown (and the application will display an error message) on the first invalid condition detected. Once we throw an exception control flow through the program usually does not look for additional errors or the user would likely get cascading error messages.
- Most errors are not the result of multiple invalid input values in different combinations. For example, even in the case of some web forms that check for multiple input errors the web form usually does not get posted until all input errors on the form are corrected by the user. Even in this case it is unlikely that multiple invalid input values would result in an unexpected error condition.
So we must ask ourselves 3 questions.
- Do we think that an invalid input value in some combination with different valid input values would produce an unexpected result that would not be detected with a test designed to expose single mode errors?
- Do we think that we need to extensively test for different combinations of invalid input values?
- Do we need to rerun theses types of tests throughout the product’s software development lifecycle (SDLC)?
These are important questions to ask because when done properly and in the right context combinatorial testing of inputs is more ‘expensive’ as compared to other approaches to testing. For example, we know that combinatorial testing can find single mode errors but there are more efficient (less ‘expensive’) ways to find single mode bugs. Testing combinations of input values that affect a common output condition involves identifying the output condition or state, identifying the input parameters and variables per parameter, modeling the inputs, reviewing the output from a tool, tweaking the model, and ideally automating a data driven test to test. This is a lot of work (‘expense’) to test whether an invalid input (e.g. a alpha character entered into a textbox that only takes integer values) will produce an appropriate response (e.g. an error message). Another consideration when testing with negative values in our various combinations is that if an unexpected error does occur we will likely have to spend some time investigating whether the anomaly is caused by interaction between different valid input values, or different valid and invalid inputs values, or if this is a single mode fault.
But, if the answer is yes to any of these questions, then we might want to use a combinatorial testing approach. Of course, including invalid input values in our model will produce negative tests in our baseline set of tests that are output from the tool. For example, let’s consider our different expected output conditions for the font dialog simulation.
- If all input values are valid we would expect the characters (actually glyphs) in our edit control to display the appropriate properties as defined by the input values, and no error messages, unhandled exceptions, and the application is not in a ‘not responding’ state.
- If the font color value is any invalid input when the OK or Apply button is clicked no error message is displayed but the font color reverts back to the last known valid font color (or the default font color).
- If the font size is less than 1 or greater than 1638 an error message is displayed, and the size reverts back to last valid value.
- If the font size is a decimal value other than n.5 than the number an error message is displayed (or the number might be rounded to the nearest whole number or n.5 value), and the size reverts back to last valid value.
- If the font color is invalid and the font size is invalid the color will revert to last known valid input and an error message will indicate an invalid font size.
So, let’s look at 2 different scenarios: multiple invalid input values, and single invalid values in combination with valid input values.
Testing multiple invalid input values
The assertion with combinatorial testing is that the interaction of 2 (or more) input values causes an unexpected condition or output. So, if we think that 2 or more invalid input values might result in an unexpected anomaly then we can simply add invalid input values to our model. For example, in our simple font dialog example we would include invalid inputs (in bold) for color and size in our model as illustrated below:
# Model File for MyFontDialog
Font: Arial(50), Tahoma, BrushScript, MonotypeCorsive
Style: Bold, Italic, BoldItalic, None(10)
Effects: Strike, Underline, StrikeUnderline, None(10)
Colors: Black(10), White, Red, Green, Blue, Yellow, Purple, Orange, randomString, emptyString
Size: small, smallHalf, nominal(10), nominalHalf, large, largeHalf, xLarge, xLargeHalf, xxLarge, xxLargeHalf, emptyString, integerLessThan1, integerGreaterThan1638, floatValueOtherThan.5# Conditional constraints necessary to prevent mutually exclusive variable settings
# See previous post for dealing with mutually exclusive variables
if [Font] = "BrushScript" then [Style] in { "Italic", "Bold/Italic" };
if [Font] = "MonotypeCorsive" then [Style] in { "None", "Bold/Italic" };
The set of combinatorial tests produced by a tool will include both positive and negative tests. For example this model would produce a set of tests that includes combinations such as Color == emptyString and Size == emptyString, and Color == Purple and Size == integerGreaterThan1638 in combination with values for the other input parameters. It would also include tests such as Color == randomString with valid inputs for the other parameters. In this situation we would have to go through the set of tests produced by our tool one by one and identify the expected output condition(s) based on each combination of inputs. This approach might be more practical if we were executing these test combinations manually and we evaluated the outcome of each test. But, this would be a very time consuming process and require several complex oracles if we wanted to automate a data-driven test.
Testing single invalid input values
In some cases we may want to test invalid values in each input parameter individually in combination with valid values in other input parameters to test for specific expected error conditions (e.g. error messages). In this situation we need a way to identify the invalid input values so that they are not used in combination with other invalid values for other input parameters. Fortunately, the PICT tool supports this type of analysis. In our input model we can identify invalid values in our model with the tilde (~) character. The modified model file below now includes the negative values for the size and colors parameters.
# Model File for MyFontDialog
Font: Arial(50), Tahoma, BrushScript, MonotypeCorsive
Style: Bold, Italic, BoldItalic, None(10)
Effects: Strike, Underline, StrikeUnderline, None(10)
Colors: Black(10), White, Red, Green, Blue, Yellow, ~randomString, ~emptyString
Size: small, smallHalf, nominal(10), nominalHalf, large, largeHalf, xLarge, xLargeHalf, xxLarge, xxLargeHalf, ~emptyString, ~integerLessThan1, ~integerGreaterThan1638, ~floatValueOtherThan.5# Conditional constraints necessary to prevent mutually exclusive variable settings
# See previous post for dealing with mutually exclusive variables
if [Font] = "BrushScript" then [Style] in { "Italic", "Bold/Italic" };
if [Font] = "MonotypeCorsive" then [Style] in { "None", "Bold/Italic" };
The above model file produces a baseline set of combinatorial tests that includes all valid combinations as well as combinations that include 1 invalid value in n-way combinations with other valid values as illustrated in the tab-delimited output file. Notice that only one invalid value occurs in each individual test. This approach makes our oracle problem a bit easier to solve because each test should produce an expected output condition as described above.
But, we still have to identify the expected output for each test in our set of combinatorial tests produced by the PICT tool. Fortunately, an undocumented feature in the PICT tool allows us to specify the expected output or result. To model the expected result we simply include a parameter starting with the dollar sign symbol ($). For example, we would modify our model file to include the “$Result:” parameter and assign expected output conditions to that parameter as illustrated below.
# Model File for MyFontDialog
Font: Arial(50), Tahoma, BrushScript, MonotypeCorsive
Style: Bold, Italic, BoldItalic, None(10)
Effects: Strike, Underline, StrikeUnderline, None(10)
Colors: Black(10), White, Red, Green, Blue, Yellow, ~randomString, ~emptyString
Size: small, smallHalf, nominal(10), nominalHalf, large, largeHalf, xLarge, xLargeHalf, xxLarge, xxLargeHalf, ~emptyString, ~integerLessThan1, ~integerGreaterThan1638, ~floatValueOtherThan.5
# Expected Results Parameter
$Result: ErrorMessage, DefaultColor
# Conditional constraints necessary to prevent mutually exclusive variable settings
# See previous post for dealing with mutually exclusive variables
if [Font] = "BrushScript" then [Style] in { "Italic", "Bold/Italic" };
if [Font] = "MonotypeCorsive" then [Style] in { "None", "Bold/Italic" };
# Expected Outputs
if [Colors] in {"randomString", "emptyString"} then [$Result] = "DefaultColor";
if [Size] in {"emptyString", "integerLessThan1", "integerGreaterThan1638", "floatValueOtherThan.5" } then [$Result] =
"ErrorMessage";
Now our the output from the PICT tool also includes a column for the expected output conditions as illustrated. Notice that in this case that if the expected result is a valid outcome (the font properties match the input values) the Result column has a question mark symbol (?). If there are multiple expected output conditions the ‘valid’ output conditions will be listed with the question mark character as in this example.
If the expected result is binary (e.g. Error or NoError) then we can use one statement in our model file such as:
if [param] = “InvalidCondition” then [$Result] = “Error” else [$Result] = “NoError”;
The value of the $Result parameter is that we can use the $Result values as flags in our automated tests to switch between different automated oracles to help validate the specified expected result.
This approach models both valid and invalid input values and the PICT tool produces both positive and negative tests in the the output set of combination tests. Using the $Result parameter as a flag is an effective solution to switch between automated oracles and allows us to design a single automated data-driven test. Even with this approach I ask myself whether an invalid input in combination with valid inputs would likely cause an unexpected error, or would this likely be a single mode failure. But, if I really don’t know how invalid inputs are validated before being passed to the appropriate function then including invalid values in our model could also increase our overall test coverage and improve our confidence or potentially expose some really random errors!
Combinatorial Testing: Complex Interactions
It is a rainy day in Seattle. It has been a busy week at work. I spent 2 days in training then spent 2 days training. Monday was mostly a blur with the exception of 1 memorable meeting. It is also pretty cold and they are predicting snow this weekend in the higher elevations (which includes my home). Fortunately I have 3 chords of wood cut and stacked (hopefully more than enough), the backup generator is tuned and ready (for those pesky power outages), and most of the outside work is done. Also, it could mean an early ski season…YEAH!
Over the past few weeks we have been discussing combinatorial testing. I have tried to focus on how to overcome some obstacles and common mistakes made by inexperienced or untrained testers, and also how we can use this technique and an effective tool to help us significantly improve test coverage. So far I have covered:
- How to effectively model of the input parameters that affect a common output condition
- Modifying the model to deal with mutually exclusive input value combinations
- Using random data generation to increase test coverage
- Automatically generate different combinatorial sets of tests
- Add weights to important values in our model file
- Include sub-models within a model to increase test coverage
- Automatically seed the output with a set of combinations that must be tested
Up to this point we have been looking at a rather simple feature with a limited number input parameters. But, I mentioned previously that combinatorial testing is really demonstrates its full potential when we are faced with a highly complex feature that has numerous input parameters that affect a common output condition. Also, our outputs so far have been based on all variable combinations for every pair of parameters, or pairwise analysis.
Pairwise or 2-way analysis is actually pretty effective because the combinatorial fault model suggests that most errors that result from the interaction of input variables occurs between the simple interaction of 2 inputs. Also, many studies demonstrate that a pairwise or 2-way analysis of input values is very effective in exposing a high percentage of multi-modal (and single mode) defects. But, although pairwise testing is effective in revealing more than 50% of the multi-modal errors there could still be bugs lurking that are caused by more complex interactions of input variables. In a multiyear study of various types of software products Richard Kuhn and Raghu Kacker found that some additional multi-modal bugs were exposed when 3 or more input values interacted as illustrated in the graph below.
R.Kuhn and R. Kacker, “Combinatorial Methods for Cybersecurity Testing”, 2009
Fortunately, in order to increase the order of n-way combinations being tested we don’t have to change our model. But, we do need a tool capable of performing a (2 + n)-way analysis of input parameters, and PICT (and other tools) can do just that. Passing the /o:n switch to the PICT command line (where n is the order up to the maximum number of parameters) will cause PICT to generate a n-way analysis of the input values.
Now, you may be asking why don’t we just start with 3-way or 4-way analysis? The simple answer is cost. Each increment in the n-way order causes an approximate quadratic increase in the number of test combinations as illustrated in the table below. This particular feature had 421,200 possible combinations. We can also see an incremental increase in the number of blocks of code exercised, but this may not always be the case. (Also, remember code coverage does not correlate to quality, but it may help improve confidence and reduce overall risk.)
|
2-way (pairwise) |
3-way |
4-way |
|
| Number of Tests |
136 |
800 |
3533 |
| Blocks Covered |
979 |
994 |
1006 |
Even if our test is automated we might not want to immediately do a 3-way or higher analysis of value interactions. (Automation != free testing.) The strategy we use is to start with 2-way interactions, then generate different 2-way sets until no more bugs are found. Then generate an output for 3-way interactions to look for more complex issues. If no bugs are found with 3-way interactions we might jump to 4-way or not depending on the complexity and criticality of the feature, and the confidence of the tester.
The bottom line is that sometimes pairwise or 2-way analysis of values for interdependent input parameters is not sufficient to find bugs cause by more complex interactions. To help resolve this problem testers need a tool that can generate n-way combinations beyond the a basic pairwise analysis.
In the final installment in this series I will discuss negative testing.
Combinatorial Testing: Testing Highly Probable Combinations
Autumn is in full swing here in Seattle. I love autumn. The vibrant colors, the smell in the air, the brisk morning air, and the anticipation of snow in the hills. It is also a busy time for me, chopping and stacking wood (it is supposed to be a rough winter here in Seattle), turning over the vegetable gardens, trimming various plants around the house, and getting the generator ready for our almost yearly power outages in my somewhat remote neighborhood. There is something rewarding about this time of year…perhaps it is simply that I made it through another year still mostly sane, and will soon have a new year to look forward to with some exciting changes.
In the previous posts (here and here) on selecting the right values we discussed strategies for potentially improving the likelihood of testing with and increased distribution of input values if the number of possible inputs is very large for a given input parameter, and how the PICT tool can randomize the base set of combinations output by the tool to increase test coverage of the total number of possible combinations. In both situations, these are simply strategies that you can use to improve the effectiveness of your tests by using different values and different combinations that might help improve the probability of selecting the “right” values or combinations to test with.
Weighting important input combinations
The output of most pairwise tools treats all input values equally. Of course we know that not all values or combinations are not used equally by our customers. There are some values or some combinations that are more likely to be used by some customers, and other customers might simply use default or specific settings. Of course, one thing we should do as testers is identify our customer configurations and usage patterns to make sure we are covering highly likely user scenarios.
One way we can increase the likelihood of including frequently used input values in combination with other values is to weight the important or likely input values. For example, we discover that our customers are most likely to use the Arial font, with a font color of black, a nominal size, no style and no effects. So, we can give these values more weight to emphasize their importance. To add weight to an input value we simply enter an integer value within parenthesis after the input value as illustrated in our updated PICT model file below.
# Basic Model File for MyFontDialog
Font: Arial(50), Tahoma, BrushScript, MonotypeCorsiveStyle: Bold, Italic, BoldItalic, None(10)
Effects: Strike, Underline, StrikeUnderline, None(10)
Colors: Black(10), White, Red, Green, Blue, Yellow
# This does not include abstract size ranges for half-size font sizes (e.g. 1.5 – 1637.5)
Size: small, smallHalf, nominal(10), nominalHalf, large, largeHalf, xLarge, xLargeHalf, xxLarge, xxLargeHalf# Conditional constraints necessary to prevent mutually exclusive variable settings
# See previous post for dealing with mutually exclusive variables
if [Font] = "BrushScript" then [Style] in { "Italic", "Bold/Italic" };
if [Font] = "MonotypeCorsive" then [Style] in { "None", "Bold/Italic" };
The weight values have no absolute value. By weighting Style value None with a value of 10 does not mean that it will be used 10 times more than other values. Also, weighted values may not change the frequency of use because PICT will generate all n-way combinations in the smallest number of tests. When a value is already used in all possible combinations and there is a ‘tie among values for a particular input parameter then weighted input values will have a greater probability of usage in the output.
For example, in the baseline set of tests we can see a difference in our outputs from the PICT tool by comparing an output from a model without weighted input values (the spreadsheet on the left) with an output based on the above model file with weighted input values (the spreadsheet on the right). In this case the number of occurrences of the use of the Black font color increased by 2.
Further analysis of the output revealed the number of instances of the use of Arial font, the None style, and the None effects all increased. But, there was no increase in the selection of the font size.
Weighting doesn’t always mean that a particular value will be used more frequently then other values for that a given input parameter. But, weighting input values is one strategy that may increase the probability of using important values in different combinations.
Covering highly probable combinations
Another strategy for increasing the test coverage different combinations is to create a sub-model of important parameters that are more likely to be used, or more likely to potentially cause an error. For example, let’s say we think that there is a greater likelihood of error between the Effects and the Style input parameters irrespective of the color, font, or size. One thing we can do with our PICT tool is to modify the model file to include a sub-model of those 2 parameters as illustrated below.
# Basic Model File for MyFontDialog
Font: Arial(50), Tahoma, BrushScript, MonotypeCorsiveStyle: Bold, Italic, BoldItalic, None(10)
Effects: Strike, Underline, StrikeUnderline, None(10)
Colors: Black(10), White, Red, Green, Blue, Yellow
# This does not include abstract size ranges for half-size font sizes (e.g. 1.5 – 1637.5)
Size: small, smallHalf, nominal(10), nominalHalf, large, largeHalf, xLarge, xLargeHalf, xxLarge, xxLargeHalf# Sub-model of Style and Effects input parameters
{ Style, Effects } @ 2# Conditional constraints necessary to prevent mutually exclusive variable settings
# See previous post for dealing with mutually exclusive variables
if [Font] = "BrushScript" then [Style] in { "Italic", "BoldItalic" };
if [Font] = "MonotypeCorsive" then [Style] in { "None", "BoldItalic" };
Sub-models improves the thorough testing of the specified parameters. By adding the above sub-model the number of combination tests increased from approximately 60 to just over 100 for our font dialog example.
Testing Specific Combinations
In some cases, there are specific values or combinations of values that we want to test because they are the default values, and we know that most of our customers don’t change the default settings. Or, maybe we have gathered usage patterns from our customers and identified combinations of values that a significant portion of our target customers use regularly. Or perhaps we know from experience or intuition there are combinations that are more likely to be problematic or are somehow “interesting.”
From a manual test execution perspective including specific combinations is straight forward. But, what if we want to test specific combinations of values once per sprint or milestone (or after each build) in our data-driven automated test. Again, we could simply specify the values in the output produced by the tool. But, what if we wanted to generate different sets of input combinations as illustrated in this post. In this situation it is not very efficient to add the specific combinations for each output; especially when the output can be generated during the runtime of the automated test quite efficiently.
A solution to this situation is to be able to seed the output with specific combinations that must appear in the baseline set of tests. In this case, I can create a tab-delimited file such as the one illustrated to the right.
Now whenever I need my automated test to execute these specific combinations of input values all I need to do is pass an argument to the PICT tool to include the seed file in the output as follows
pict.exe basicmodel.txt > output.xls /e:[path]/myseedfile.txt
The PICT tool will then use the seeded combinations in the baseline set. Even if we generate a random set using the /r:n as discussed in an earlier post these seeded inputs will still be included as long as we pass the /e:[filename] switch to the PICT tool.
So, up to now we have discussed:
- A key to understanding how to use combinatorial or pairwise testing depends largely on our ability to effectively model the input parameters that affect a common output condition
- We can modify the model to deal with mutually exclusive input value combinations using a simple built in syntax for conditional or invariant constraints
- We can increase the likelihood of testing different values using random data generation techniques
- We can increase the test coverage of different combinations by using PICT to randomize the output of the baseline set of tests
- We can increase the potential to use certain values with greater frequency by weighting important values
- We can increase the thoroughness of test coverage between specific input combinations with sub-models
- We can force the tool to include specific input value combinations in a complete n-way output of other combinations using a seed file that specifies those combinations
Next week, I will discuss increasing the order beyond pairwise, and later we still have to discuss negative testing. I hope you find these posts valuable, and if you have any questions please don’t hesitate to ask via the comments or email.
Combinatorial testing: Invalid Combinations & Output Condition
I am finally getting caught up at work after the trip to Korea and Israel. I managed to cut the lawn in the front of my house on Sunday, but still have an acre and a half in the back yard to tend to, rake some leaves, till the garden, tend to the pond, and there is probably some other stuff on the list for this weekend. In the meantime, it is time for another installment of the series on combinatorial testing.
Another reason cited by this paper calling out risks associated with pairwise testing was, “The problem, as we see it, is that the key concept of how program inputs variables interact to create outputs is missing from the pairwise testing discussion. The pairwise testing technique does not even consider outputs of the program (i.e., the definition and application of the pairwise technique is accomplished without any mention of the program’s outputs).” This statement completely surprises me because the foundational principles (heuristic) of this technique or pattern of test is that some errors result from the interaction of input variables adversely affecting a single common output condition or state. So, I wonder how testers can approach a combinatorial testing problem if they don’t know or consider the output condition or state being evaluated? Do testers really simply plug in input variables and wait to see what happens?
Then the authors state,“…it seems to us to become easier for novice testers, and experienced testers alike, to blindly apply it to every combinatorial testing problem, rather than doing the hard work of figuring out how the software’s inputs interact in creating outputs.” I totally agree. But, I would say that people who don’t understand how to use tools properly (novice or otherwise) are likely to get ‘hurt.’ This isn’t necessarily a problem with the technique or tool; this is simply the lack of knowledge or skill of a novice or untrained tester. To effectively use the technique of combinatorial testing and apply test tools in the appropriate context testers must be trained, learn to select the right tools for the job, and be knowledgeable of how the input values impact the output for the feature being tested.
Up to this point we have blindly considered that all combinations of input values for a given output condition are possible. However, this is not always the case. For example, a common testing problem is the increasing matrix of operating systems, service packs, browser versions, flash versions, etc. in configuration or setup testing. Suppose that one of the OS versions is Windows Xp and another is Windows 7, and we want to include IE 6.0, IE 7.0 and IE 8.0 as part of our possible customer configurations. Even if we hacked our way into installing IE 6.0 on Windows 7 it would be a completely unsupported, out of context scenario. The combination of Windows 7 and IE 6.0 are mutually exclusive and so we need to find a way to prevent such combinations from being generated by our tool.
In the font dialog example we also have exclusive combinations. The Brush Script MT font can only be Italic or Bold/Italic, and the Monotype Corsive font can only have styles of Bold/Italic or regular (neither bold nor italic selected). But when we examine the baseline set of combinations output by our combinatorial test generation tool we can see there are several invalid combinations.
This is a problem because if we tried to test these combinations our test would indicate a failure because the user cannot force these combinations to occur. But, in this case the failure is not a bug in the feature being tested it is a false positive. The bug is in our test data (values used in the combinations). We must remember…the output from the tool is based on our the tester’s input…it’s only a tool! This why it is critical to review the output from the tool and validate the test combinations.
But, we do not want to simply remove these combinations from our baseline test set, nor do we don’t want to arbitrarily change the the font or the style values because that may change the n-way combinations with other input parameter values.
Applying Conditional Constraints
We don’t want to arbitrarily change our output from the tool, but we need a way to effectively handle the mutually exclusive values for the Monotype Corsive and Brush Script MT fonts and the font styles. To solve this problem, the PICT tool employs a simple scripting language that enables the tester to modify how the output baseline set of combinations are generated. In other words we can ‘program’ the tool to constrain specified parameter values from being used in combinations.
In our font dialog example, we will need to add 2 statements to our model file as illustrated below.
# Basic Model File for MyFontDialog with Conditional Constraints
Font: Arial | Tahoma, BrushScript, MonotypeCorsive
Style: Bold, Italic, BoldItalic, None
Effects: Strike, Underline, StrikeUnderline, None
Colors: Black, White, Red, Green, Blue, Yellow
Size: small, smallH, nominal, nominalH, large, largeH, xLarge, xLargeH xxLarge, xxLargeH# Conditional constraints prevent mutually exclusive variable settings
if [Font] = "BrushScript" then [Style] in { "Italic", "Bold/Italic" };
if [Font] = "MonotypeCorsive" then [Style] in { "None", "Bold/Italic" };
Revalidate Output
Now we take our modified model file and pass it to the PICT tool. The PICT tool will again generate a set of baseline tests, but this time the test combinations are generated based on the conditional constraints we ‘programmed’ into the model. We need to revalidate the tool’s output to ensure our model is producing a set of combinations for the intended purpose of our test. Our new baseline set of positive test combinations should not generate a false positive for the mutually exclusive font and font style values.
Some may argue that we want to include erroneous behavior to see how the feature responds. That is a perfectly valid argument if are performing manual testing and only limiting our combination tests to a small baseline subset of test. But, if the font dialog behavior allowed me to set the Brush Scritpt MT font to bold only, that is most likely a single-mode fault in the event handler. In other words that error is likely to occur regardless of what the other parameter variables are, and I don’t need to invest in a combinatorial testing approach to identify that issue. Another consideration involves the automation approach. If I am programmatically accessing a control’s window handle to set the state then I can actually force a control into an invalid state which would again result in a false positive because my automated test case is forcing an invalid state.
I will discuss strategies for negative testing of input combinations in a later post. I personally think separating positive tests and negative tests is a good strategy. Positive tests are generally designed to provide confidence while negative tests are usually intended to expose issues. Also, the (automated) oracles tend to be different for negative and positive tests.
In highly complex features the model may have to be ‘tweaked’ several times. However, once the model is complete we won’t have to change it again unless the feature changes. And every time we get a new build we can increase our positive test coverage of combinations that should work by generating numerous test combinations from a single model file using the PICT tool output.
Up to now we have discussed the importance of decomposing the feature based on the output state or condition under test, identifying the variables for each parameter, modeling the variables when appropriate, randomly selecting value from large sets of test data, randomizing the set of combinations tested, and how to prevent false positives by modifying the model using conditional constraints to deal with mutually exclusive variables. In the next post we will discuss how to test with important variable combinations and increase the likelihood of testing with important values.
Combinatorial Testing: Selecting the ‘right’ values (Part 2)
In the previous post we discussed how hard-coding an extremely small subset of values out of a large population of all possible values for a given input parameter is rarely a good idea because it precludes any chance of testing with other values. By dividing a large population of possible input values into smaller subsets we can effectively increase our distribution across the range of possible values. And by randomly selecting values from each smaller subset each time a test specifies that subset in a combination we increase the probability of testing with a greater number of possible values.
But, another ‘cause’ of failure of combinatorial testing mentioned in the aforementioned paper was “Similarly, a number of "not found" faults were 2-way faults that were not detected because a particular combination of data values had not been selected.” This is indeed a difficult problem and there is no definitive solution. But, I have found over the years that professional testers are those individuals who constantly search for alternative ways to help them solve a difficult problem rather than simply complain about them. While there is no way to guarantee that we select a “particular combination” of data values that might expose an issue perhaps there is a way to test different combinations with different data values. Increasing variability is a great way to increase coverage and potentially trigger an error caused by unknown “particular combinations” or values for a given variable.
In the last post we modified our model file to use abstract ranges of font sizes and we are using a random number generator to select a value within each specified range to get a better distribution of font sizes across the entire population of possible values. This way we help prevent issues associated with using only a small number of hard-coded values in our tests. So now our model file for the font dialog is similar to:
# Basic Model File for MyFontDialog
Font: Arial, Tahoma, BrushScript, MonotypeCorsive
Style: Bold, Italic, BoldItalic, None
Effects: Strike, Underline, StrikeUnderline, None
Colors: Black, White, Red, Green, Blue, Yellow
Size: small, smallH, nominal, nominalH, large, largeH, xLarge, xLargeH, xxLarge, xxLargeH
Now instead of the over 1,250,000 combinations (assuming we would test each font value), the total number of combinations for the font dialog based on our current model is 3840 (assuming all combinations are valid). Using this model the the PICT tool will generate approximately 60 test combinations as the baseline set. But, the baseline set of tests generated by the tool is only 1.5% of all combinations. Most studies indicate the baseline set of combinations to be pretty effective in defect detection effectiveness (DDE) and improved code coverage. But, the baseline set of tests may not include hidden or unknown ‘particular combinations’ that might be problematic.
If we were to visualize the total number of combinations as the large circle and the baseline set of tests as the smaller red circle we can see that the baseline only covers a relatively small portion of the total number of combinations. Obviously there are a lot of other combinations that are not being tested. If we know of particular combinations that should be tested we can easily include those in our set of n-way tests (and we will see how we can do that in a later post). But, if we don’t know what ‘particular combinations’ might cause an error we need to find a way to more effectively increase the number of combinations tested.
Unfortunately, most combinatorial tools will only generate a single baseline set of test combinations. Even if we randomize the values for any given input parameter we are most likely only to expose single mode or one-way faults (errors that are caused by a single input parameter value regardless of the other input parameter values).
The question becomes how can we effectively expand the our test coverage to include different combinations beyond our original (or only) baseline set of combinations. If we select different ‘sets’ of combinations we are effectively testing different combinations in the population of all possible combinations.
Fortunately the PICT tool has the ability to generate random sets of combinations from a single model file. By passing a ‘/r:n’ switch argument in the command line to call PICT the tool will generate a different set of combinations. The outputs below illustrate 2 different sets of combinations generated by the from our basic model file.
The set of tests in the simpleout1.xls file are the baseline set of tests generated by the PICT tool. Most tools are only capable of generating a single baseline set of tests. However, using the the PICT tool we passed the /r:42 switch as a command line argument. With a user defined seed value the PICT tool generated a different baseline set of tests illustrated in the simpleout2.xls file on the right.
Occasionally there may be duplicate combinations in the various sets of tests. But, if we generate a ‘new’ set by passing different seed values with the /r switch for each new build we can effectively increase our test coverage and the probability of testing (unknown) ‘particular combinations’ by changing the set of combinations generated by the tool.
How many tests do we need?
A common question that is often asked is, “given a large number of possible tests or values, how many tests or values do we need to test?” There is no simple answer to this question. But, we do know that testing with an extremely small subset of values (or combinations) may not provide us with high levels of confidence.
When dealing with a large population of possible combinatorial tests (or values in a variable), one way to improve confidence is to increase the sample, or the number of different combinations tested. In a previous post I discussed the concept of sampling from a testing point of view. Sampling is often used in scientific research and experimentation. For example, if we assert that all 1,250,000+ combinations should produce an equivalent output (in our case the appropriate changes to the glyphs in our edit control), then we can increase our confidence that assertion holds true by increasing the number of samples (n-way test combinations).
We would only be 100% confident if we tested all possible values for all possible combinations. But, that may not be feasible in all cases, and the output of combinatorial test tools tend to optimize on a minimum or baseline set of tests. So, one approach to help us increase our confidence in test coverage is to use a statistical sample calculator to help us approximate the number of samples (or different combinations) that we should test to achieve a desired level of confidence. In our demo, we stated there are now 3840 possible combinations (assuming all combinations are valid). Given this population of possible tests the number of samples (different combinations) we would need for a statistical confidence of 99% with a 3% sampling error and a standard deviation of .5 is 1248.
(NOTE: In statistical sampling, the smaller the total population, the greater the number of samples. Automating your combinatorial test for a non-trivial feature is a best practice and can help us effectively expand our coverage of n-way combinations.)
Of course, even if we tested 1248 different combinations there is still no guarantee that we will test the ‘particular combinations’ that might trigger an unexpected anomaly. But, no other approach to testing can guarantee we test with these unknown ‘particular combinations’ either. However, systematically increasing the sample size of any given population of possible combinations (or values) might likely increase our likelihood of exposing an anomaly not detected by a static output from a basic combinatorial testing tool, or at least increase of overall confidence.
Testing essentially helps provide confidence and reduce risk because we can’t test everything!
In the next post we will discuss how to effectively deal with invalid variable combinations.
Combinatorial Testing: Selecting the ‘right’ values (Part 1)
This week I am in Seoul, Korea attending the ASTA Seoul International Software Testing Conference (SSTA 2010). It has been several years since I have been to Korea, so being invited to give the opening keynote at this conference was a real honor and an opportunity that I couldn’t pass up. This is a relatively small conference of about 175 people and the attendees are Korean and all the presentations were translated in real time. The speakers; however, came from around the world. I was mostly impressed with the representation from large companies such as LG, and Samsung. Since these testers worked mostly on devices a lot of their tests were without a GUI, so they really understood the idea of moving quality upstream and defect prevention and the importance of low level automation, or automation below the user interface. I was also impressed with their passion for testing, but also their concern over the maturity of the discipline beyond bug finding expeditions, and career growth as an test engineer without having to become a manager.
This week I will continue the saga of posts on combinatorial testing. Sometimes it still surprises me how some people discount this technique, or simply assume that they can come up with a ‘better’ set of tests by randomly selecting ‘interesting’ combinations. I suspect some of the skepticism is a result of misapplication of the technique and/or tools, or based on white papers such as a widely distributed paper Pairwise Testing: A Best Practice That Isn’t. The title is certainly provocative, but unfortunately it is also very misleading. In fact, compared to other approaches to this problem, in the correct context there is a lot of empirical evidence to suggest that pairwise or combinatorial testing is in fact the best approach when used by a competent tester using powerful toolset.
However, this paper does a good job of pointing out several ways this technique is commonly misapplied, and also illustrates limitations of some tools. Unfortunately the paper fails to offer any other demonstrable solution to the combinatorial testing problem or to the percieved limitations of this technique. So, let’s take a look at the limitations or misapplication of this technique highlighted in the paper and propose potential solutions to help overcome those limitations and more effectively use this technique in the proper context.
(BTW…the “random selection” mentioned in the paper is not a person randomly selecting input combinations, but this study used a computer algorithm used to randomly select a set of combinations from all possible combinations. The actual study concluded “In this study we found no significant difference in the FDE of n-way and random combinatorial test suites. …the result is not unexpected.” The study is interesting from an academic perspective, but adds little value in the ‘real-world.’)
Selecting the ‘right values.’
The first misapplication of this technique identified in the paper is actually a non sequitur. “Pairwise testing fails when you don’t select the right values to test with.” Now perhaps this is self-evident, but any experienced tester can tell you that you can replace the first 2 words of this sentence with any other testing approach and experience the same results (e.g. exploratory testing fails when you don’t select the right values to test with”). Also, the inverse of this statement is not true. We can’t just arbitrarily say “pairwise testing succeeds when you select the right values to test with.” As I said previously, combinatorial testing is but one potential solution to a very complex problem; it is not a silver bullet in all situations.
But, this conclusion identified a common mistake in modeling the input variables based on a common misuse of the technique of equivalence (or domain) partitioning. Equivalent partitioning is also a modeling. Equivalent partitioning is grouping similar elements into sets based on a set of heuristics explained in the renowned book The Art of Software Testing by Glenford Myers. However, this technique is often misused by amateurs who
- fail to adequately identify special or unique values in any given set, and
- simply assume that we can take 1 or 2 elements of any large set and conclude they are representative of the entire set that we identified
The technique of equivalent partitioning depends largely on our ability to adequately model test data into the appropriate sets for the given context. Then we also must realize that our sets are a model or an assertion of how that data might be handled by the application. When dealing with a large set of input values it is fool-hearty to randomly select a limited set of values and hard-code those values into our tests for 2 reasons:
- the number of values selected from a large population of possible values is usually a lot less than required to gain any degree of confidence, and
- we eliminate the possibility of testing with any other values in that population
Increasing the number of possible values
If we artificially constrain our test values to “an extremely small subset of the actual number of possible values” then of course we limit the potential effectiveness of these techniques. So, how can we increase the number of values from a large population of possible values for any given input parameter?
For example, in the case of the simple font dialog used in the example we have a large set (3273) of possible font size values (1 – 1638 and half-sizes from 1.5 – 1637.5). Now certainly we don’t want to test every possible value! But, testing only a small subset of values might not provide the confidence necessary to support our assertion that all values in this range are valid and would result in the expected output state. So, perhaps we can increase we sub-divide our single large set into several smaller abstract subsets, and then randomly select values from each defined subset. Let’s see how this plays out…
For our Font Size input parameter instead of hard-coding values such as:
FontSize: 1, 8, 10, 12, 42, 72, 100, 256, 1024, 1638, 1.5, 1637.5, 11.5
What if we created abstract ranges such as:
FontSize: Small, SmallHalf, Nominal, NominalHalf, Large, LargeHalf, XLarge, XLargeHalf
And then we gave more concrete definition to our abstract ranges such as
- Small = 1 – 9
- SmallHalf = 1.5 – 9.5
- Nominal = 10 – 18
- NominalHalf = 10.5 – 18.5
- Large = 19 – 72
- LargeHalf = 19.5 – 72.5
- XLarge = 73 – 1638
- LLargeHalf = 73.5 – 1637.5
Now our output provides an abstract range and the tester (or automated test) has greater creativity over the value selected for that particular input parameter. In our demo there are 6 combination tests (out of 43) that require a nominal size value. Now, in your head randomly pick 6 numbers from 10 through 18; those are the values you use the first time you run this test. Next, ask someone else to randomly select 6 numbers from 10 through 18; those are the values you use in your test on the next build, or as appropriate. As you can see, the smaller the number of values in any given subset the greater the probability of testing with the ‘right’ values in that subset. But, more subsets require more tests (a non-issue if your combinatorial tests are automated using a data-driven automation approach). The larger the set of values in any given subset the fewer the number of values that will be tested. If a particular subset seems too large, then you can always sub-divide it into smaller subsets as well to get a better distribution of values from all possible values.
Now in our automated test script we have a method that sets the appropriate ranges based on the abstract value for the Font Size, and another method to randomly select a number from the range we specify similar to the 2 methods below:
1: private static void SetFontSizeMinAndMaxRangeValues(
2: string combinationTestValue,
3: out int minValue,
4: out int maxValue)
5: {
6: minValue = 1;
7: maxValue = 1;
8: switch (combinationTestValue.ToLower)
9: {
10: case combinationTestValue.Contains("small"):
11: maxValue = 9;
12: break;
13: case combinationTestValue.Contains("nominal"):
14: minValue = 10;
15: maxValue = 18;
16: break;
17: case combinationTestValue.Contains("large"):
18: minValue = 19;
19: maxValue = 72;
20: break;
21: case combinationTestValue.Contains("xlarge"):
22: minValue = 73;
23: maxValue = 1638;
24: break;
25: }
26: }
27:
28: private static string GenerateRandomFontSize(int minValue, int maxValue, string combinationTestValue)
29: {
30: Random r = new Random();
31: string sizeVal = r.Next(minValue, maxValue + 1).ToString();
32:
33: if (combinationTestValue.Contains("Half"))
34: {
35: if (sizeValue == "1638")
36: {
37: sizeValue = "1637";
38: }
39:
40: sizeValue += ".5";
41: }
42:
43: return sizeVal;
44: }
Of course, if there were particular values in the font size range that we explicitly wanted to test then we could also easily specify them in our model of input values as well. But, hard-coding only a small subset of values from a large number (population) of possible values is rarely a good idea in any testing approach.
So essentially, to increase your data coverage and to increase your probability of testing with the ‘right’ values (especially if you don’t know what the ‘right’ values are) then one possible solution is to create smaller subsets and randomly select values in each subset each time the combinatorial test case is executed.
Also, you now are probably starting to understand that the limitation of any technique or approach is not necessarily the fault of that technique or approach, but due to the misuse or misapplication of that technique or approach by novice or untrained testers. Also, the more we understand about the ‘system’ we are testing the greater the effectiveness of the technique or approach we use when used in the appropriate context.
Later this week, I will discuss another potential solution to the other part of this problem…in a large set of possible values how do we increase the probability of testing with the ‘right’ values when we don’t know what the right values are.
Combinatorial Testing: Getting started
Last week I started discussing combinatorial testing. Justin Hunter added some great comments and an link to his blog with several great posts on this subject as well. He also asked to elaborate on the training I designed inside of Microsoft to teach our testers as well as what I discuss in my workshops at conferences. Rather than try to write one post that covered 8 hours of lab based content I will attempt to capture the essence of the training in a series of posts starting with getting started.
First, let me say that combinatorial testing isn’t a sliver bullet. Similar to other techniques and approaches used in software testing it is susceptible to Beizer’s Pesticide Paradox. In other words, it is effective in finding certain categories of bugs when applied smartly in the correct context, but it is not effective in finding all categories of issues.
Combinatorial testing is most useful when testing complex configuration scenarios and/or situations where there are multiple input parameters with numerous variables per parameter that have some change effect on a single output condition or state. For example, it you need to test your application on multiple versions of Windows, and different browser versions, and different protocols and connection speeds then this testing technique will help define a baseline set of test environment configurations. Or, if you are testing an API that has several parameters with multiple arguments values that can be passed to those API parameters then this testing technique will also help testers establish a baseline set of tests. Of course, this can also be applied to input controls on a graphical user interface (GUI) that affect a common output state or condition such as how changes to the settings on a font dialog change the properties of the glyphs in an edit control.
It’s all about modeling
The key to combinatorial testing lies in the testers ability to identify the interdependent input parameters that act on the single output condition being tested, and create an abstract description of the input variables or parameter behavior in a model file that is used by a tool to produce a baseline set of combinatorial tests. In other words the effectiveness of baseline set of combinatorial tests is primarily based on:
- A tester’s ability to identify the appropriate input parameters or configuration parameters
- A tester’s ability to adequately identify the appropriate variables for each input parameter or configuration settings
- A tester’s ability to describe the parameters and the variables or settings in a model of the feature being tested
- The limitations of the tool used to produce a baseline set of combinatorial tests
Models are abstract representations of ideas or real objects. Creating models is not easy and takes a lot of creativity and critical thinking. For example, let’s refer back to the font dialog that I use in my demos. Then let’s look at the two checkbox controls for Bold and Italic. The most simple way to model these two inputs is to simply list each input parameter and the 2 check states for each checkbox.
Bold: check, uncheck
Italic: check, uncheck
This is the example I used in the article because it is easy for a novice to understand quickly. However, another way to represent these 2 inputs is to conceptualize what they are doing individually, but what happens to the output state or condition (the properties of the glyphs in an edit control). In this example when either or both checkboxes are checked or unchecked the style of the displayed glyphs is changing. So, another way to model the style inputs is:
Style: bold, italic, bold/italic, regular
Both models of these input parameters accomplish the same thing. I personally prefer the second abstract model of styles because developing the rules for mutually exclusive variables (I will discuss that in a few weeks) is a bit easier, and it is also easier to modify the model if the types of styles that can be applied increases in the future.
Honestly, I don’t think I can teach someone to model something via a blog, or a book. In a class I can demonstrate different ways to model something and then provide feedback at people work through problem. But, modeling is a skill that takes practice. And one thing I have learned in my experience is that the less we know about the feature being modeled the greater the probability of less than adequate outcome in our testing.
George Box said, “All models are wrong; some models are useful.” In this situation it is clearly our ability as testers to create a model of configuration or input parameters that is useful and provides value to us; otherwise it is simply wrong. It is not necessarily this technique that is broken; it is more likely that our limited understanding of what is being tested or misapplication of the technique that causes us to produce a less than adequate model. In other words, if your model is wrong, the tool output is certainly going to be wrong!
Combinatorial Testing
Fall is now upon us here in Seattle. I really like fall; the bright vibrant colors of the changing leaves, the crisp morning and evening air, leaves starting to blanket the lawn, and harvesting the crops from my garden. Of course along with the harvest comes the work of canning the veggies mixing up new batches of jam and conserves. But, it is fun work and it fills the house with delicious aromas that remind me of my boyhood and helping my mom in the kitchen canning the bounty of crops from our garden. Now I try to pass the tradition onto my daughter and we have fun trying different combinations of berries when we make our jam. My favorite is still just plain huckleberry, but our strawberry/blackberry mix is also darn good…probably because we pick the berries right from our backyard. But, not all combinations of berries work well in a recipe. The flavor of some berries overpower or mask the flavor of other berries. Similarly, in software testing not all combinations of input variables that directly impact a single common output work well together and result in a bug.
Pair-wise testing more specifically combinatorial testing, is a functional test technique intended to help testers more effectively expose issues caused by the interaction of 2 or more input variables that directly affect a common output state or condition. In simple situations a tester can often pick out various combinations of inputs based on likely customers settings, historical failure indicators (combinations of inputs that have been problematic in the past) and intuition. However, as the number of input parameters and the number of variables that can be applied to those interdependent input parameters increase in more complex features the potential number of combinations is overwhelming. Of course we still want to focus on common customer inputs, failure indicators and intuition. But, does guessing at other various combinations of inputs really provide us with sufficient confidence in our test coverage? Or, would a more systematic approach to testing combinations of input variables be more effective and more efficient?
There is a lot of empirical research both in academia and industry to suggest that the answer to this last question is…yes! Over the past few years there has been quite a lot written on the topic of pair-wise testing, and I and a few other people have presented at conferences on the topic. In fact, I recently published an article in Better Software magazine on the topic, and also gave a presentation at the recent VistaCon 2010 software testing conference in Vietnam. I have also posted my slides and the demo files (including the source code for a sample data-driven automated test) from the article and the conference presentation on my website.
In the coming weeks I intend to provide more information and tips to help testers think about how to model input parameters and variables for use in a tool to generate a subset of combinatorial tests and overcome some of the limitations of misuses of this best practice. Until then, if you have specific questions or comments please let me know.
Code Coverage: Unreachable Code and Hard to Reach Code
Well, I am back from a sailing excursion to the San Juan Islands. I wanted to go to the Gulf Islands, but considering an unexpected ordeal with a kidney stone just before taking off on the trip I decided it might be better to be a bit closer…just in case. The weather was great, and we spent a lot of time exploring Stuart and James Islands, and dropped into Roche Harbor the first night and Friday Harbor the last night in the islands. We limited out on Dungeness crabs on all but 2 days where we only managed to get 3 legal size crabs on those 2 days. Basically this translates to a lot of crab cakes in the freezer…yum! This was the first time my daughter went crabbing with me. My daughter would ride out in the dinghy with me to check the pots, and would point out the male crabs for me, but she wouldn’t reach in an help me throw back the females or undersized males. Come to think about it, she didn’t help me cook and clean the crabs either…she just ate the crab cakes we made on the boat. I think the rules might have to change next year! All in all it was a great month decompressing and recharging, and contemplating my personal and professional future.
But enough about me. My last 2 posts have been discussing code coverage analysis. The primary purpose of using code coverage tools as either a developer or as a tester is not to try to obtain some magical ROMA number. The biggest value of measuring code coverage is to help us analyze untested areas of code and make informed decisions of whether or not we need to design additional tests to increase test coverage and help reduce exposure to risk.
The last post illustrated how we might use code coverage results to help us design additional tests we might have missed during the execution of any pre-defined tests (automated or manual) and additional exploratory testing efforts. But remember, the goal is simply not to design tests in order to get the tool to report 100% code coverage. In fact, in just about any complex system executing 100% of the statements in code may not be feasible or provide any practical value. This is generally referred to as unreachable code.
For example, let’s look at this (albeit antiquated) code snippet.
The code coverage tool is indicating that this conditional statement has been exercised to its true outcome, but not it’s false outcome. This was a common approach used in 16-bit applications to prevent multiple instances of the same application on a single machine. However, in the 32-bit world hPrevInstance always returns null, which means there is no practical way to make this conditional statement return false.
This is a bit of an obscure example, but is used to illustrate why a greater understanding of the programming language used by the development team would help testers from banging their heads against the wall ‘trying’ different things until someone realizes we could never make this conditional statement return false. By analyzing this section of the code coverage results we might suggest refactoring for a Win32/64 environment, or at least be able to explain why this conditional will not return false. (Remember…it’s all about information.)
Another example of unreachable code is sometimes caused by coding style or possibly unnecessary code. For example, the following lines are also in the WinMain() function that is called when the ‘user’ launches the application.
In this situation when the application initially starts and calls the WinMain() function these 2 conditional statements in WinMain() determine whether the Frog and Car bitmaps are true. Since we just launched the application and the bitmaps have not yet been loaded by any other calls to LoadBitmap() then the conditional statements in lines 104 and 109 will never go true, and lines 105 and 110 will never be executed. Again, following an analysis of the section of the code we can provide information regarding why we can’t design a test to cause these conditional statements to return true without fault injection or code mutation. Additional information that we might provide based on our analysis of the code coverage results may be a suggestion to refactor this code to improve testability.
A similar example of unreachable code is a common coding style involving switch statements where developers included a case statement for each possible value, and also included a default statement. For example in the last post we saw how we saw this code chunk which is essentially the menu structure.
![]()
When the menu item is selected the submenu displays the submenu items Start and Exit. When the submenu is displayed the only actions possible is to select the Start submenu item (line 270), or the Exit submenu item (line 274). Without fault injection there is no practical way to execute the default statement in line 277. Again, this may be another example where refactoring could improve testability because if the default statement is removed control flow would simply pass out of the switch block.
However, this is not always the case with switch statements. Here is an example in which a modal message box is displayed and draws 2 buttons; a yes button to restart the game, and a no button to quit the game. But notice that the default case statement (line 295) has not been executed.
In this situation if we launch the game, move the frog to get hit by a car this modal message box will display showing the ‘end user’ 2 possible buttons to press (in this case the ESC key or the Close control button in the upper right corner of the modal dialog are not possible options). However, if we put the game into a state where this modal dialog is displayed and then kill the application process using Windows Task Manager control flow will pass to line 295 as the process terminates.
Of course, it may not be practical or reasonable to terminate the application process from every possible machine state. Also, this simply increases the costs of testing without adding any real practical value. Providing this information to the decision makers along with suggestions to refactor and improve testability to reduce overall testing costs is another way code coverage analysis can be a valuable tool in a tester’s toolbox.
Another example of hard to reach code. In this case the conditional statement is if the RegisterClass() function fails then we want to return false.
The RegisterClass() function is also called within the WinMain() function when the application initially launches. So, while analyzing the code coverage results the question we ask ourselves is, “Without fault injection can we make the conditional statement in line 88 return true, and if so how?”
Well, we can. All we have to do is launch about 450 instances of this application to cause line 88 return true. Now, we have to ask ourselves, “What value does this test provide?” Especially since the code design should only allow 1 instance of the application (although it fails to do that because it is a 16-bit app running on a 32-bit environment and that is the nature of hPrevInstance as explained earlier.
From a testing perspective the primary goal of code coverage is not to achieve some magic number; the objective of code coverage is to analyze code coverage results in order to
- Improve test coverage
- Reduce overall risk
- Potentially increase testability of the project
The code coverage number is not really useful information to anyone. It is the analysis of the code coverage results that can help us decide whether we need to design additional tests, identify areas of the code that can’t be executed without even more expensive testing such as fault injection and/or code mutation, or refactor the code to improve testability (which often increases the code coverage measure).
But, this is not to suggest that we should employ code coverage and analyze the results for all software projects. Analyzing code coverage results and designing additional tests from a white box perspective, or refactoring code are all additional expenses for any project. For each project we (or our managers) must decide whether the cost is worth the improved coverage and potentially a reduction in overall risk.
Another way to look at it…it is our responsibility as testers to provide valued information to the decision makers. If the only information we are providing is that we achieved 80% code coverage then we really aren’t doing an effective job. Yes, many managers are number focused; however, the valuable information is in the rest of the story about the 20% that has not been executed.