Refactoring for Testability

Teaching my daughter about bullet seating depth.
One of my hobbies is shooting CMP matches and long range precision shooting. Besides lots of practice perfecting the techniques a big part of precision shooting depends on the ammunition and studying the ballistic patterns of various loads. All precision shooters custom load their ammunition and it is not as simple as simply reading a reloading manual. Slight variations of .001” of an inch in seating depth of a bullet or .1 grain of powder may determine whether the group of shots at a target 600 yards away is 1” MOA or 6” MOA. So, getting the ammunition to match the rifle requires continually analyzing your shots, making slight adjustments to the load, and repeating; in computer jargon we might call that refactoring. Reloading for precision is a continually optimizing process until we find the optimal load. Similarly, one of the things we do in the Engineering Excellence group at Microsoft is to continually analyze our internal processes and practices to see how we can help our business groups constantly improve and optimize towards their target. One of the big things on our plate these days is testability.
In Testing Object-Oriented Systems: Models, Patterns, and Tools, a book I consider one of the most important books on software testing practices, the author Robert Binder defines testability as “The relative ease or difficulty of producing and executing an economically feasible test suite to determine whether the [system under test ] SUT (i) conforms to stated requirements and specifications, and (ii) exhibits an acceptably low probability of failure.” This and several definitions of testability floating around on the web and all generally agree that testability generally involves
1.) The ease with which the SUT can be tested
2.) The cost of testing is reasonable
So, as the testability increases the ease with which our tests can determine whether the SUT satisfies implicit and explicit requirements and has a lower chance of failure at reduced testing costs. This all sounds nice, but unfortunately testability cannot be directly measured; testability is a qualitative measure. Although we can’t accurately measure testability we can sometimes do small things to improve the characteristics of testability and help reduce testing costs by reducing the number of tests required to determine whether the SUT satisfies the stated requirements and also has a low chance of failure, or finding ways to test more efficiently through better designs.
In last week’s post I referred a pseudo code example that was written to illustrate how bugs could linger in code despite a high measure of code coverage. Of course we should realize that pseudo code is generally a far cry from the real implementation of the code. Pseudo code is simply a model, and there are many ways to implement that model. The advantage of a model is that we can often test a model earlier to identify potential issues before a single line of code is written. In this particular pseudo code sample, there were a couple of things that stood out that could likely impact the testability of an implementation of the pseudo code model. So, the neurons in my brain starting firing with lots of testing related questions.
So, let’s use that example to discuss potential testability issues. The sample was based on a requirement that stated “Student ID’ are seven digit numbers between one million and 6 million inclusive.” The function is relatively simple in that it takes a string type passed to the sid parameter, and returns a Boolean true or false to the calling function depending on whether the string satisfies the internal Boolean conditions it is being compared against. But this function also calls 2 other functions; the length () function, and the number () function. From the function names I would think the length () function provides a numeric value that represents the number of characters in the string passed to the sid parameter. I am also betting the number () function returns a numeric value (it converts the string variable to a numeric type such as an integer. The pseudo code example was
function validate_studentid(string sid) return
TRUEFALSE
BEGIN
STATIC TRUEFALSE isOk;
isOk = true;if ((length(sid) is not 7) then
isOk = False;if (number(sid) <= 1000000 or number(sid) > 6000000 then
isOk = False;return isOk;
END
One of the reasons that we hire testers with a programming background at Microsoft is that they can help the developer identify potential issues, reduce the probability of failure, and improve testability by stepping through the code during peer reviews, or while designing additional tests to cover un-tested or under-tested areas of the code that are exposed by code coverage analysis. So, when I come across a code sample, I generally step through it to
- See if it will work as intended (basic unit test)
- See if there are any potential obvious errors in logic
- Identify tests necessary for branch or conditional coverage (because developers are usually only concerned with block coverage)
- Identify argument values for negative testing that might expose undesirable results (bugs)
So, in this pseudo code example, once I got to the second conditional clause (if (number (sid)) <= 1000000 or number (sid) > 6000000 then) the little cranks in my brain began to turn. I thought to myself, why are we checking the length of the string? I mean, if the number can only be between 1,000,000 and 6,000,000 then it seems to me that checking the length of the string is simply redundant.
If we remove the first conditional clause (if ((length(sid) is not 7) then) then we actually reduce the number of tests to 3 instead of 4 assuming short-circuiting since short-circuiting compound Boolean expressions is one of several code optimization techniques. (By the way, the first caveat example in Wikipedia on short-circuiting where a function used as a Boolean conditional also “performs some required operation regardless of whether the first conditional evaluates true or false” is simply poor architectural design and is very, very likely to be problematic.) The 3 tests for condition (and basis path) coverage to exercise the true and false outcome of every single Boolean conditional expression are listed in the table below.
| Conditional 1 | Conditional 2 | ||
| Test | number (sid) <= 1000000 | number (sid) > 6000000 | Expected Result |
| Any value between 1000000 and 6000000 | false | false | true |
| Any value > 6000000 | false | true | false |
| Any value < 1000000 | true | (short-circuited) | false |
Of course, even testing several samples from the equivalent partitions may not expose the bug in this code because the bug in this code is a typical boundary error. (In a previous post I explained the basic fault model that caused many boundary issues. In a nutshell, boundary bugs are generally caused by incorrect relational operators or magic numbers in code.) Without recognizing that we also need to test the boundaries (999999, 1000000, 1000001, and 5999999, 6000000, 6000001) also we could easily overlook the error in the pseudo code.
Another thing that caught my attention was the lack of exception handling. Some people may not consider including exception handling in pseudo code and take it as a given. But, as a tester when I don’t exception handling in pseudo code in a review then I need to start asking questions so I can better design tests to exercise the exception handling control flow paths that directly impact code coverage measures. Another reason this is an important consideration is because results of code coverage analysis indicates that exception handlers are generally under-tested. It seems we are really good at finding unhandled exceptions with our negative tests (which is really good), but we do not seem to be as thorough in testing the logical code paths of exception handlers. This is especially true for predicate statement with multiple Boolean sub-expressions might trigger an exception. We tend to test one of the conditionals, and the other conditionals expressions in that statement are often under-tested.
So, we can surmise the number () function must be converting the string parameter (the sid variable) to a numeric type and returning a type of number because the conditional clause is comparing it to magic numbers (1000000 and 6000000). But if we entered a string that contained non-numeric characters my initial thought was that the number () function would throw an exception that is unhandled by the validate_studentID () function.
Then I thought a bit more, and considered that the number () function might swallow the exception and return a 0 or even a -1. Now, there are some arguments in favor of swallowing exceptions, but in general it is not a good idea. In this case, it is probably a bad idea because one of the primary purposes of a separate function is reusability. If the number () is reused in some other code, or other part of the code where we need to convert a string to a numeric type regardless of the range (within the range of the data type being converted to), I would suspect we would want to throw an exception, and then rethrow the exception in the calling function. Of course, this is where the rubber hits the road, and a professional tester needs to dig in and start asking some hard questions as to how the developer is going to handle this situation. If the number () function is not going to be reused, then most modern programming languages include a function call that will easily convert the string to a numeric type and do it more efficiently as compared to calling a separate function. And may in that case we could swallow the exception in the validate_studentID () function and simply return false as illustrated in the C# code below.
1: try
2: {
3: if (int.Parse(sid) < minValue || int.Parse(sid) > maxValue)
4: {
5: isOk = false;
6: }
7: }
8: catch (FormatException)
9: {
10: isOk = false;
11: }
12: catch (OverflowException)
13: {
14: isOk = false;
15: }
With the push to drive quality upstream, reduce costs (especially testing costs), and improve testability I envision that many testers will be working alongside our development counterparts to help them prevent defects from getting into the product code base, and improve the maintainability of the code. This doesn’t mean that testers will become developers or visa versa; it simply means that testers are (generally) experts in designing tests, and developers are experts in designing solutions that adhere to requirements. Rather than an adversarial relationship, I suspect in the future developers and testers will have a more symbiotic relationship to improve the intrinsic quality of our code bases.
The bottom line of all this is that in teams where testers are designing white box tests for improved code coverage (control flow testing), or where testers are engaged in design reviews or peer reviews of code prior to check in, I hope this gives you some things to think about.