Archive for August, 2010
Well, I am back from a sailing excursion to the San Juan Islands. I wanted to go to the Gulf Islands, but considering an unexpected ordeal with a kidney stone just before taking off on the trip I decided it might be better to be a bit closer…just in case. The weather was great, and we spent a lot of time exploring Stuart and James Islands, and dropped into Roche Harbor the first night and Friday Harbor the last night in the islands. We limited out on Dungeness crabs on all but 2 days where we only managed to get 3 legal size crabs on those 2 days. Basically this translates to a lot of crab cakes in the freezer…yum! This was the first time my daughter went crabbing with me. My daughter would ride out in the dinghy with me to check the pots, and would point out the male crabs for me, but she wouldn’t reach in an help me throw back the females or undersized males. Come to think about it, she didn’t help me cook and clean the crabs either…she just ate the crab cakes we made on the boat. I think the rules might have to change next year! All in all it was a great month decompressing and recharging, and contemplating my personal and professional future.
But enough about me. My last 2 posts have been discussing code coverage analysis. The primary purpose of using code coverage tools as either a developer or as a tester is not to try to obtain some magical ROMA number. The biggest value of measuring code coverage is to help us analyze untested areas of code and make informed decisions of whether or not we need to design additional tests to increase test coverage and help reduce exposure to risk.
The last post illustrated how we might use code coverage results to help us design additional tests we might have missed during the execution of any pre-defined tests (automated or manual) and additional exploratory testing efforts. But remember, the goal is simply not to design tests in order to get the tool to report 100% code coverage. In fact, in just about any complex system executing 100% of the statements in code may not be feasible or provide any practical value. This is generally referred to as unreachable code.
For example, let’s look at this (albeit antiquated) code snippet.
The code coverage tool is indicating that this conditional statement has been exercised to its true outcome, but not it’s false outcome. This was a common approach used in 16-bit applications to prevent multiple instances of the same application on a single machine. However, in the 32-bit world hPrevInstance always returns null, which means there is no practical way to make this conditional statement return false.
This is a bit of an obscure example, but is used to illustrate why a greater understanding of the programming language used by the development team would help testers from banging their heads against the wall ‘trying’ different things until someone realizes we could never make this conditional statement return false. By analyzing this section of the code coverage results we might suggest refactoring for a Win32/64 environment, or at least be able to explain why this conditional will not return false. (Remember…it’s all about information.)
Another example of unreachable code is sometimes caused by coding style or possibly unnecessary code. For example, the following lines are also in the WinMain() function that is called when the ‘user’ launches the application.
In this situation when the application initially starts and calls the WinMain() function these 2 conditional statements in WinMain() determine whether the Frog and Car bitmaps are true. Since we just launched the application and the bitmaps have not yet been loaded by any other calls to LoadBitmap() then the conditional statements in lines 104 and 109 will never go true, and lines 105 and 110 will never be executed. Again, following an analysis of the section of the code we can provide information regarding why we can’t design a test to cause these conditional statements to return true without fault injection or code mutation. Additional information that we might provide based on our analysis of the code coverage results may be a suggestion to refactor this code to improve testability.
A similar example of unreachable code is a common coding style involving switch statements where developers included a case statement for each possible value, and also included a default statement. For example in the last post we saw how we saw this code chunk which is essentially the menu structure.
When the menu item is selected the submenu displays the submenu items Start and Exit. When the submenu is displayed the only actions possible is to select the Start submenu item (line 270), or the Exit submenu item (line 274). Without fault injection there is no practical way to execute the default statement in line 277. Again, this may be another example where refactoring could improve testability because if the default statement is removed control flow would simply pass out of the switch block.
However, this is not always the case with switch statements. Here is an example in which a modal message box is displayed and draws 2 buttons; a yes button to restart the game, and a no button to quit the game. But notice that the default case statement (line 295) has not been executed.
In this situation if we launch the game, move the frog to get hit by a car this modal message box will display showing the ‘end user’ 2 possible buttons to press (in this case the ESC key or the Close control button in the upper right corner of the modal dialog are not possible options). However, if we put the game into a state where this modal dialog is displayed and then kill the application process using Windows Task Manager control flow will pass to line 295 as the process terminates.
Of course, it may not be practical or reasonable to terminate the application process from every possible machine state. Also, this simply increases the costs of testing without adding any real practical value. Providing this information to the decision makers along with suggestions to refactor and improve testability to reduce overall testing costs is another way code coverage analysis can be a valuable tool in a tester’s toolbox.
Another example of hard to reach code. In this case the conditional statement is if the RegisterClass() function fails then we want to return false.
The RegisterClass() function is also called within the WinMain() function when the application initially launches. So, while analyzing the code coverage results the question we ask ourselves is, “Without fault injection can we make the conditional statement in line 88 return true, and if so how?”
Well, we can. All we have to do is launch about 450 instances of this application to cause line 88 return true. Now, we have to ask ourselves, “What value does this test provide?” Especially since the code design should only allow 1 instance of the application (although it fails to do that because it is a 16-bit app running on a 32-bit environment and that is the nature of hPrevInstance as explained earlier.
From a testing perspective the primary goal of code coverage is not to achieve some magic number; the objective of code coverage is to analyze code coverage results in order to
- Improve test coverage
- Reduce overall risk
- Potentially increase testability of the project
The code coverage number is not really useful information to anyone. It is the analysis of the code coverage results that can help us decide whether we need to design additional tests, identify areas of the code that can’t be executed without even more expensive testing such as fault injection and/or code mutation, or refactor the code to improve testability (which often increases the code coverage measure).
But, this is not to suggest that we should employ code coverage and analyze the results for all software projects. Analyzing code coverage results and designing additional tests from a white box perspective, or refactoring code are all additional expenses for any project. For each project we (or our managers) must decide whether the cost is worth the improved coverage and potentially a reduction in overall risk.
Another way to look at it…it is our responsibility as testers to provide valued information to the decision makers. If the only information we are providing is that we achieved 80% code coverage then we really aren’t doing an effective job. Yes, many managers are number focused; however, the valuable information is in the rest of the story about the 20% that has not been executed.
[NOTE: This was written last week but due to a glitch did not get automatically posted before I left on a boat trip where I disconnected from the world. How refreshing…but more about that later.]
Well, we got a buy in the quarter final play-offs, and we won our semi-final game against the Sockeyes (2 – 0) last Tuesday. Unfortunately, while celebrating with my teammates I was literally knocked to the floor by a kidney stone in my bladder. After a trip to the hospital I am now taking a battery of pain-killers until I pass the stone. Unfortunately, this put me on the injured roster and knocked me out of the final game and until I get an OK from the doctor. Fortunately, I am not prone to kidney stones and this shall pass (literally). I have only had one kidney stone in the past and that was about 20 years ago. For those of you who have not experienced a kidney stone, trust me on this…be very, very glad and I hope you never experience this malady.
Last week I attempted to to illustrate how we might achieve high levels of code coverage (structural control flow) but potentially overlook critical tests, especially from a ‘black-box’ testing approach. The bottom line message…high code coverage does not necessarily equal good test coverage. In reality it is really unlikely to get 100% measured code coverage of an reasonably complex application under test. Unfortunately this often begs the question, “What is the right amount of code coverage for [my] application?” To which I have heard several leads/managers reply, “Our goal is 80% code coverage?” Really? C’mon…that’s just plain ROMA data. Setting arbitrary goals for code coverage is about as pointless as tits on a boar hog. The real answer is that we simply don’t know what measure of code coverage is the ideal level for any product.
Also, for those who have read this article will know that regardless of the testing approach at some point the effectiveness of our tests to hit untested areas of code diminishes. While more time and effort may increase data flow coverage and expose issues, it is unlikely to increase control flow (code) coverage. Remember, just because you exercise a line of code doesn’t mean you found all the bugs, but you have 0% probability of finding any bugs in any untested code.
Fortunately, the majority of customers will traverse the same code paths covered by many of our tests. Also, if your team/organization does robust unit testing then there is a good probability that unit tests at least provided some minimal level of code coverage. (NOTE: While I highly recommend that unit tests and coverage results should be transparent to the test team, I do not recommend using unit tests as part of the battery of tests designed by the test team and executed against the whole build to measure code coverage.) So, there are a couple of questions we have to ask ourselves.
“Does the untested code present significant risk to our customers?”
“Do we need to reduce exposure to risk in the untested areas of code?”
“What is the most efficient way to effectively evaluate the untested code?”
As I wrote last week, code coverage is not about the number. Code coverage is about analyzing the results and potentially designing additional functional tests, or at least being able to explain why areas of the code are untested. If we determine that it is important for our business to better understand the untested code, or improve overall confidence and reduce potential risk then we should use a tool to measure code coverage. But, again it is not about simply measuring code coverage and reporting some magical metric.
Code coverage analysis is the most efficient method to help testers evaluate untested code. Code coverage analysis basically involves the tester reviewing untested code reported by the code coverage tool and determining why some code was not exercised, and possibly design additional tests to exercise the previously untested code. (Remember I also wrote last week the future of professional testing is about analyzing information and designing tests…so, here we go!)
For several years I’ve used the triangle simulation to help set a ‘test effectiveness’ baseline for new testers who had never been formally trained in different test techniques, patterns, or approaches. After a few years of analyzing the results we found that there was about a 70 to 75% probability of tests exercising true branch of the first conditional expression in a compound predicate statement in a key method in the program. There was about a 20 to 25% chance of tests exercising the true branch of the second conditional expression, and there was less than a 10% probability of tests exercising the third conditional expression in the predicate statement. When I found these results I could hardly believe it, so I changed the third conditional expression to inject a bug and sure enough the results held true; in any class of 20 people on average only 1 or 2 people found the bug in the software.
From a black box approach let’s say our tests used the following values for sides A, B, and C respectively:
- 1, 2, 3 – an error message indicating the values would not produce a triangle
- 2, 1, 3 – an error message indicating the values would not produce a triangle
- 4, 5, 6 – scalene triangle
- 2, 1, 2 – isosceles triangle
- 5, 5, 5 – equilateral triangle
In this case our code coverage tool would report our coverage is less than 100%. As we drill down we see that the IsValidTriangle() method illustrated below is not completely covered. So, (assuming the arguments values passed to the parameters in this method are all validated to be greater than 0) we analyze the code below in our coverage tool and realize that we need a test to evaluate the third conditional expression to true (e.g. 1, 3, 2 for sides A, B, and C respectively).
It has been awhile since my last post, and I apologize for the few folks who follow my rants. Over the last few months I have been busy working on my boat (she is almost 30 years old now and needs some major refitting). Since starting to play hockey again I have been spending a lot of time at the gym and on the ice and trying to keep my body from getting too battered in the games (for I too am older now…let’s just say more than 30). I have also done a bit of soul searching on the past year or so reflecting on the high points and the low points as well.
So, the refitting on the boat is almost complete and she is ready for a nice long trip to the Gulf Islands starting next week. The Monarch’s Div 7A team finished the regular summer hockey season in first place and playoffs are this week. Winter season doesn’t start until October so I have a month and a half to recuperate. And, I have contemplated a few things to help nurture my professional and personal growth that I will proactively begin working on…beginning with a vacation to the Gulf Islands next week.
But, for now I want to talk more about code coverage. Not the number! Forget the measure! The code coverage percentage is simply a magic metric for pointy-haired managers and other thoughtless chowderheads who like to wave it around as if it actually means something. Look…as a manager I really don’t give a rat’s butt about what percentage of coverage was achieved by testing. If you tell me that your testing achieved 80% code coverage I will probably say, “That’s great! Now tell me about the 20% of the code that you didn’t test!” As a manager what I want to know is:
- Did we run the right tests? Did the test suites we ran give us the information that we need to make good business decisions?
- What is my potential exposure to risk in the areas of untested code and how do/can we reduce that risk?
I have said for years now that the future of professional testing is not simply about beating on a product and finding bugs. The future of testing lies in our ability to design effective tests and critically analyze results in order to provide better information to our primary customers (the decision makers). Code coverage is about analyzing the results and potentially designing additional functional tests, or at least being able to explain why areas of the code are untested.
Did we run the right tests?
The code coverage metric simply measures the number of statements in the code have been exercised by monitoring control flow through the code. Control flow through code is sequential until it hits some type of branch such as an if statement, a for loop, or an exception. For example, when we call the CalculateMonthlyMortgage method below control flows sequentially from the statement in line 6 to the statement in lines 7 and 8 and finally to the return statement in line 9.
So, by passing in a value of 250000 for the principle, 4.5 for the annual interest rate, and 360 for the months parameter the method would return an expected value of 1266.71327456471 and we would measure 100% coverage of this method. That’s a happy path test. Yes, it does what we think it is supposed to do. But, what happens when we pass in a value of 0 to the annualInterest parameter? Control flows through the method as described above giving us 100% code coverage, but the return value is NaN, or Not a Number. Similarly, if the value for the months parameter is 0 we get 100% code coverage and the return value is Infinity. Of course, negative values for any of the parameters would also produce undesirable output results.
This example illustrates sequential control flow through a method that contains simple statements. When branching conditions occur in software the control flow through the method becomes a bit more complicated as does the testing required. The following snippet counts the number of characters in a string. But, to prevent a null reference exception we use a predicate or conditional statement in line 4 that branches control flow from line 4 to line 12 if the input argument is null, and from line 4 to the loop structure starting in line 6 if the string is not null. If the input argument is an empty string control will flow from line 6 to line 10 bypassing the inner block that increments the count variable. But, if the input argument is a string of at least 1 character control flows from line 6 to line 8 and loops back to line 6 until all characters in the string are counted. (This previous post shows how you can create a model or control flow graph to map control flow through an algorithm.)
These are just 2 simple examples designed to illustrate how we can get high levels of code coverage yet still have bugs lurking in our software. Code coverage tools tell us whether we exercised code; it doesn’t tell us if we ran the right set of tests to expose potential issues.
Next week let’s continue this with some thoughts on analyzing code coverage results to help reduce risk.