Archive for December, 2009
Did you ever notice that when you ask someone to test something the first thing they do is to start ‘testing?’
I often see this in my classes and I ask the person, “what is the purpose of your test?” Typically the response is, “I’m testing this,” or “I’m trying to find a bug.”
Unfortunately this seems to indicate there is no or very little pre-thought that goes into the act of software testing. To some people, testing appears to be little more than simply pounding away at the keyboard and trying whatever flies into our subconscious mind as we interact with the software and declare a bug when we stumble upon unexpected behavior or see something we might disagree with.
This is why I found it especially interesting in my own research, and the case studies by Juha Itkonen that testers who were trained in formal software testing techniques or patterns there was no significant difference in terms of defect rates or coverage between pre-defined test cases and an exploratory testing approach. This is not to say that one approach to testing is preferred over the other. It is not an either or proposition as I explained in my post on the pesticide paradox, and there are certainly more than 2 approaches to software testing. Testing requires multiple approaches to most effectively aid us in collecting and presenting the appropriate information to the decision makers.
But, I am often puzzled that it seems we can easily think of negative or destructive tests once we have the product in hand, yet when we are designing a set of tests from the requirements the tests simply test the requirements and little else. I wonder why it is that we can think of ‘tests’ while executing other tests, but we can’t think of those same tests before hand. Is there some limitation in our psyche that prevents us from analyzing a problem until we are actually faced with the problem (software in hand)?
I don’t think so, but I suspect there is a mental hurdle in that we sometimes feel more productive when we are interacting with software as opposed to sitting back and analyzing the problem more prior to executing well-designed test cases. (More tests doesn’t equal better testing!)
The bottom line is that if we are given a set of requirements and can only design tests that only test the requirements, then we are probably not thinking critically about how to design test cases.
Things are winding down for the year. The Christmas lights are up on the house, my gardens are tilled and mulched for next spring, people are disappearing from the office like there is a plague, it hasn’t snowed in a while which means the mountains are mostly ice (I dislike skiing on ice), the next GSHL ice hockey league doesn’t start for awhile and pick up games are few and far between (I suck at hockey but it is fun). So, what to do? Oh…I forgot Christmas shopping. I hate Christmas shopping! So, I have spent the past few idle nights refactoring the automation libraries for some of my test data generation tools after my daughter goes to bed.
One of the most popular random test data generators that I have developed so far has been a tool called CCMaker to generate random valid and invalid credit card numbers. (Sometimes I wonder why that is, but I don’t dwell on it for too long and I haven’t been interrogated by the FBI lately.) Testing forms that require a credit card has always been risky business because you certainly don’t want to use your own card. Often times developers will include a check on web forms or client apps to do a high level verification of a credit card number before sending all the data across the wire to be validated. This early or high level verification prevents flooding the pipe with bad data. So, one test we can do prior to testing the end-to-end scenario is to test to see if and how the developer is validating credit cards numbers prior to submission.
As far as data goes, generating credit card numbers are fairly simple. There is a bank ID number (BIN), there is a number of digits between 12 and 19 depending on the card type, and there is a checksum. So, if we know the valid BINs for each issuing bank, the valid number of digits for each card type, and how to calculate the checksum we can generate valid credit card numbers. (Of course this is a bit oversimplified because many credit and debit card companies are issued multiple BINs and use varying number lengths.)
Testing for invalid credit card numbers should include using numbers that look close to being correct in some way but are slightly altered. For example for the 3 defined equivalent partitions (BIN, length, checksum) there are seven possible invalid combinations (23 – 1) we could test.
- Valid BIN, invalid length and valid checksum
- Valid BIN, valid length, and invalid checksum
- Valid BIN, invalid length, and invalid checksum
- Invalid BIN, valid length and valid checksum
- Invalid BIN, invalid length, and valid checksum
- Invalid BIN, valid length, and invalid checksum
- Invalid BIN, checksum and length
This doesn’t mean I run 7 tests and call it good because there are numerous invalid lengths and invalid BINs for the different card types. A common mistake when using an equivalent partition testing approach is to simply plug in values for each combination listed above and call it good. The problem is that there are several hundred BINs and 8 different valid lengths. For example, for just the Discover card there are 829 valid BIN numbers, and for the Maestro cards there are 56 combinations of BINs and card lengths ranging from 12 to 19 numbers in length. This doesn’t include the permutations of the other numbers that compose the entire card number.
The question every tester must ask him or herself every day when designing tests is how many tests do I need to have any reasonable sense of confidence that risk is minimal and the perception of quality is high. Of course, there is no single right answer here and not magic formula, but since we can’t possibly execute every possible positive or negative test we should at least understand that ultimately testing is sampling.
For example, one strategy for positive testing might be to test every valid BIN for every valid card length for any given credit card. For example for American Express I would want to test at least one number with a BIN of 34 and a card length of 15 that satisfies the checksum requirement, and at least one number with a BIN of 37 and a card length of 15 that also satisfies the checksum requirement. For a card type of Visa I would need a minimum of 2 tests in which the BIN is 4, the checksum requirement is satisfied, and one has a card length of 13 numbers and the other has a card length of 16 numbers.
That probably sounds like quite a bit of testing, and tests which most likely would not produce an error (unless of course the BIN is miss identified (e.g. instead of checking for a BIN of 5020 the BIN is incorrectly assigned as 5002), or if a valid BIN is not recognized as valid because it is omitted from a list or enumeration of valid BINs for that credit card). Certainly testing of this magnitude would be expensive if done manually. But when automated using a random test data generator and a data-driven automation approach to set the random generator properties comprehensive testing becomes a much more reasonable proposition and can significantly increase overall confidence.
This is where my CCMaker 3.1 test data generator can help by randomly generating both valid and invalid credit card numbers. The updated CCMaker test automation library has just been posted to my web site with documentation and examples. If you have any questions, or find any issues with the new library please let me know.
This month’s issue of Testing Experience published my article that summarizes the findings of several case studies of exploratory testing both inside and outside of Microsoft. Although some people consider me to be a harsh critic of exploratory testing nothing could be further from the truth. When I started my career as a professional tester my approach to software testing was primarily exploratory in nature. I was focused on executing as many negative tests I could possibly conceive of in search of the most heinous bugs I could find; and I was good at it. My criticism is not of exploratory testing as an approach; however, I do ‘question’ the claim that claim exploratory testing is “orders of magnitude more productive.” And, I am also critical of the argument that we don’t understand exploratory testing if we don’t conform to one notion of the concept (or buy into an ideological doctrine) because I don’t believe that there is only one ‘right’ way to perform or think about exploratory testing.
Of course, I know it is un-unpopular to question the claims of exploratory testing ‘experts,’ but I just happen to be one of those people who question things that are founded on anecdotal observations without any hard data to substantiate those claims. I certainly don’t have all the information, but I personally like to be able to back up my position with facts (known at the time) and several verifiable/repeatable data points so I can answer questions from a defendable position rather than trying to convince or cajole someone with my subjective opinion. (I know a lot of studies show that many Americans base their decisions on their emotional state at the time. But I learned a long time ago that you should never buy the boat you fall in love with because you will spend more time maintaining her than sailing her.) Also, it’s easier to persuade me that I might be wrong with solid, verifiable information and repeatable data versus emotional rhetoric or personal insults.
I think most people who promote exploratory testing are well intentioned and realize in conjunction with other testing approaches that exploratory testing adds value to any testing effort. I also think that many practitioners realize that while we must not only hone our intellectual capabilities of critical thinking and logical reasoning, we must also constantly build our knowledge and skills of the other approaches, methods, and techniques used in our professional trade.
At Microsoft, I can’t think of any testing group that does not use exploratory testing as part of its overall strategy. We have learned not to rely on exploratory testing as our primary approach because it simply doesn’t scale as project size and complexity increase, and it is easy for testers to focus too much on out of context issues in hopes of finding another bug. As one Principal Test Manager summarized, exploratory testing helps
- flush out “low hanging fruit” (identify obvious issues very quickly)
- provide welcomed context switching by getting folks to look at other areas of the product
- to seed new testing ideas or helps identify holes (which is great as long as we have a way to preserve those ideas and they are learnable by other testers)
But, of course, it was also noted that greater ‘system knowledge’ and an understanding of other various testing techniques and approaches enriched the overall effectiveness of the testers on the teams. My job as a teacher and mentor of software testing is to take really smart people who already know how to think critically about problems and provide them with the foundational knowledge of alternative techniques, methods, approaches, and the skills that are specific to the profession of software testing that will enable them to decide what approach to use depending on the context.
Similar to other testing approaches exploratory testing has benefits and limitations and is more effective in exposing certain categories of issues, and is less effective at exposing other types of problems. (See post on Pesticide Paradox.) And now we have researched case studies that begin to help us understand how to utilize exploratory testing as part of our overall testing strategy. Of course, further research could be done in this area, but it is very interesting that the independent studies used in the article reached similar findings and conclusions.
Anyway, I look forward to comments or feedback on the article.
One of my hobbies is shooting CMP matches and long range precision shooting. Besides lots of practice perfecting the techniques a big part of precision shooting depends on the ammunition and studying the ballistic patterns of various loads. All precision shooters custom load their ammunition and it is not as simple as simply reading a reloading manual. Slight variations of .001” of an inch in seating depth of a bullet or .1 grain of powder may determine whether the group of shots at a target 600 yards away is 1” MOA or 6” MOA. So, getting the ammunition to match the rifle requires continually analyzing your shots, making slight adjustments to the load, and repeating; in computer jargon we might call that refactoring. Reloading for precision is a continually optimizing process until we find the optimal load. Similarly, one of the things we do in the Engineering Excellence group at Microsoft is to continually analyze our internal processes and practices to see how we can help our business groups constantly improve and optimize towards their target. One of the big things on our plate these days is testability.
In Testing Object-Oriented Systems: Models, Patterns, and Tools, a book I consider one of the most important books on software testing practices, the author Robert Binder defines testability as “The relative ease or difficulty of producing and executing an economically feasible test suite to determine whether the [system under test ] SUT (i) conforms to stated requirements and specifications, and (ii) exhibits an acceptably low probability of failure.” This and several definitions of testability floating around on the web and all generally agree that testability generally involves
1.) The ease with which the SUT can be tested
2.) The cost of testing is reasonable
So, as the testability increases the ease with which our tests can determine whether the SUT satisfies implicit and explicit requirements and has a lower chance of failure at reduced testing costs. This all sounds nice, but unfortunately testability cannot be directly measured; testability is a qualitative measure. Although we can’t accurately measure testability we can sometimes do small things to improve the characteristics of testability and help reduce testing costs by reducing the number of tests required to determine whether the SUT satisfies the stated requirements and also has a low chance of failure, or finding ways to test more efficiently through better designs.
In last week’s post I referred a pseudo code example that was written to illustrate how bugs could linger in code despite a high measure of code coverage. Of course we should realize that pseudo code is generally a far cry from the real implementation of the code. Pseudo code is simply a model, and there are many ways to implement that model. The advantage of a model is that we can often test a model earlier to identify potential issues before a single line of code is written. In this particular pseudo code sample, there were a couple of things that stood out that could likely impact the testability of an implementation of the pseudo code model. So, the neurons in my brain starting firing with lots of testing related questions.
So, let’s use that example to discuss potential testability issues. The sample was based on a requirement that stated “Student ID’ are seven digit numbers between one million and 6 million inclusive.” The function is relatively simple in that it takes a string type passed to the sid parameter, and returns a Boolean true or false to the calling function depending on whether the string satisfies the internal Boolean conditions it is being compared against. But this function also calls 2 other functions; the length () function, and the number () function. From the function names I would think the length () function provides a numeric value that represents the number of characters in the string passed to the sid parameter. I am also betting the number () function returns a numeric value (it converts the string variable to a numeric type such as an integer. The pseudo code example was
function validate_studentid(string sid) return
STATIC TRUEFALSE isOk;
isOk = true;
if ((length(sid) is not 7) then
isOk = False;
if (number(sid) <= 1000000 or number(sid) > 6000000 then
isOk = False;
One of the reasons that we hire testers with a programming background at Microsoft is that they can help the developer identify potential issues, reduce the probability of failure, and improve testability by stepping through the code during peer reviews, or while designing additional tests to cover un-tested or under-tested areas of the code that are exposed by code coverage analysis. So, when I come across a code sample, I generally step through it to
- See if it will work as intended (basic unit test)
- See if there are any potential obvious errors in logic
- Identify tests necessary for branch or conditional coverage (because developers are usually only concerned with block coverage)
- Identify argument values for negative testing that might expose undesirable results (bugs)
So, in this pseudo code example, once I got to the second conditional clause (if (number (sid)) <= 1000000 or number (sid) > 6000000 then) the little cranks in my brain began to turn. I thought to myself, why are we checking the length of the string? I mean, if the number can only be between 1,000,000 and 6,000,000 then it seems to me that checking the length of the string is simply redundant.
If we remove the first conditional clause (if ((length(sid) is not 7) then) then we actually reduce the number of tests to 3 instead of 4 assuming short-circuiting since short-circuiting compound Boolean expressions is one of several code optimization techniques. (By the way, the first caveat example in Wikipedia on short-circuiting where a function used as a Boolean conditional also “performs some required operation regardless of whether the first conditional evaluates true or false” is simply poor architectural design and is very, very likely to be problematic.) The 3 tests for condition (and basis path) coverage to exercise the true and false outcome of every single Boolean conditional expression are listed in the table below.
|Conditional 1||Conditional 2|
|Test||number (sid) <= 1000000||number (sid) > 6000000||Expected Result|
|Any value between 1000000 and 6000000||false||false||true|
|Any value > 6000000||false||true||false|
|Any value < 1000000||true||(short-circuited)||false|
Of course, even testing several samples from the equivalent partitions may not expose the bug in this code because the bug in this code is a typical boundary error. (In a previous post I explained the basic fault model that caused many boundary issues. In a nutshell, boundary bugs are generally caused by incorrect relational operators or magic numbers in code.) Without recognizing that we also need to test the boundaries (999999, 1000000, 1000001, and 5999999, 6000000, 6000001) also we could easily overlook the error in the pseudo code.
Another thing that caught my attention was the lack of exception handling. Some people may not consider including exception handling in pseudo code and take it as a given. But, as a tester when I don’t exception handling in pseudo code in a review then I need to start asking questions so I can better design tests to exercise the exception handling control flow paths that directly impact code coverage measures. Another reason this is an important consideration is because results of code coverage analysis indicates that exception handlers are generally under-tested. It seems we are really good at finding unhandled exceptions with our negative tests (which is really good), but we do not seem to be as thorough in testing the logical code paths of exception handlers. This is especially true for predicate statement with multiple Boolean sub-expressions might trigger an exception. We tend to test one of the conditionals, and the other conditionals expressions in that statement are often under-tested.
So, we can surmise the number () function must be converting the string parameter (the sid variable) to a numeric type and returning a type of number because the conditional clause is comparing it to magic numbers (1000000 and 6000000). But if we entered a string that contained non-numeric characters my initial thought was that the number () function would throw an exception that is unhandled by the validate_studentID () function.
Then I thought a bit more, and considered that the number () function might swallow the exception and return a 0 or even a -1. Now, there are some arguments in favor of swallowing exceptions, but in general it is not a good idea. In this case, it is probably a bad idea because one of the primary purposes of a separate function is reusability. If the number () is reused in some other code, or other part of the code where we need to convert a string to a numeric type regardless of the range (within the range of the data type being converted to), I would suspect we would want to throw an exception, and then rethrow the exception in the calling function. Of course, this is where the rubber hits the road, and a professional tester needs to dig in and start asking some hard questions as to how the developer is going to handle this situation. If the number () function is not going to be reused, then most modern programming languages include a function call that will easily convert the string to a numeric type and do it more efficiently as compared to calling a separate function. And may in that case we could swallow the exception in the validate_studentID () function and simply return false as illustrated in the C# code below.
With the push to drive quality upstream, reduce costs (especially testing costs), and improve testability I envision that many testers will be working alongside our development counterparts to help them prevent defects from getting into the product code base, and improve the maintainability of the code. This doesn’t mean that testers will become developers or visa versa; it simply means that testers are (generally) experts in designing tests, and developers are experts in designing solutions that adhere to requirements. Rather than an adversarial relationship, I suspect in the future developers and testers will have a more symbiotic relationship to improve the intrinsic quality of our code bases.
The bottom line of all this is that in teams where testers are designing white box tests for improved code coverage (control flow testing), or where testers are engaged in design reviews or peer reviews of code prior to check in, I hope this gives you some things to think about.