I.M. Testy

Treatises on the practice of software testing

Archive for November, 2009

Thinking About Fly Fishing…

without comments

Originally Published Friday, February 06, 2009

I am an avid fly-fisherman, and I am spending a few of these last winter evenings tying flies in preparation for the new year. The lakes are still too cold so the trout are deep and lethargic, and many of the rivers are closed and too damn cold and swollen anyway. So, now is a good time restock my fly boxes, reflect on past years, and dream of the up-coming season. While fly fishing is an enjoyable escape from the day to day torrent of technology, when I can’t stand in a river and wave a stick I can still relax with a good book that conjures memories of years past or engulfs my mind in an adventurous narrative as if I were there. I don’t really enjoy reading most fiction books, but I do enjoy reading the memoirs of people such as John Gierach. (Perhaps it is the sailor in me that loves a good yarn, or the fact that I can relate personally to the stories.) Anyway, this weekend I acquired a book entitled Fishless Days, Angling Nights by Sparse Gray Hackle that introduced a legend in American fly-fishing by the name of Theodore Gordon. I had never heard of Mr. Gordon before this (perhaps that is because I tend to fish a lot more soft hackle flies rather than dry flies), but I found a quote by him quite interesting. He said, “The great charm of fly fishing is that we are always learning.”

This morning I thought how apropos this statement is to software testing. But, there of course, as it pertains to the practice of software testing I would rephrase Mr. Gordon’s statement to state, “The great demand of software testing is that we must always learn.”

There are different types of people who fly fish. There are those who buy a fly rod and a pocket full of store bought flies and head to the water just to catch fish. This group of people don’t seem to care much about the techniques of casting, learning how to read water, or understand patterns of fish or aquatic entomology; they are simply there to try to catch fish using whatever slop-shod mooching approach seems to work. Yes, they still catch fish…usually the small, dumb farm raised trout stocked into regional lakes by the state’s department of wildlife who will bite at anything. On the other end of the spectrum are true purists who fish with cane poles, tie their own flies to match the hatch (sometimes right beside the river), and study trout, regional entomological lifecycles, and the geological formations of a river bed to better pin-point where the big, smart trout are hiding. Then there are the group in between these extremes with varying degrees of skill and knowledge. Depending on how much time a person devotes to both practice and learn (and their capacity to learn) about the sport will often make a huge difference in both their enjoyment and their effectiveness in catching large, persnickety trout.

From my observations of the testing community over the past few decades, I can see a similar pattern regarding the spectrum of skills and knowledge of people who participate in the practice. In the past, it was not uncommon for some companies to hire ‘clever’ people who were simply good at finding bugs into testing roles. Some companies hired developers who would (often times begrudgingly) take the job as a stepping stone into a developer position. Unfortunately, some people at both extremes of this spectrum often stagnated because they did not learn more about software testing or the technological advances that were happening around them. At one extreme, I suspect that some people thought as long as they were finding behavioral type issues they were providing a benefit to the company because they were ‘good at representing the customer.’ At the other end of the spectrum the developer’s in testing roles who failed to realize the challenges in software testing. In both extremes complacency and stagnation usually occurs. Of course, there are many other testers between these extremes; some who will go on to become professional testers and have significant impact, and others who will simply belly-ache and whine about how unfair it is or claim how wrong any change is and why change won’t work.

As professional testers we must constantly strive to improve our knowledge of testing, technology, and the systems we are working on. We must also increase our skills and abilities as the demands of the role expand beyond the traditional comfort zone of behavioral testing and ‘playing customer advocate’ by executing ‘tests’ to find bugs at the end of a cycle. The challenges of testing complex systems built around advancing technologies significantly raises the aptitude bar for testers. Emerging practices such as TDD and agile lifecycles designed to drive engineering quality upstream and form closer customer connections is also impacting the role of testing and how testing adds value in the lifecycle (and I don’t think the role of testing in an agile lifecycle is trying to wedge testing between the end of a sprint cycle and the release to customers in order to provide a pseudo-proxy customer buffer…that’s a bottleneck.) Reinstituting best practices such as design and code reviews and inspections (when warranted), or developing new approaches or tools to help increase testing effectiveness and reduce costs also require greater skills and knowledge among testers.

The formidable challenges of testing software that lie ahead will require highly intelligent critical thinkers who also have an in-depth understanding of the systems they are working on, and who possess the technical aptitude to provide valued input in throughout the product cycle. Indeed we work in a very dynamic industry filled with diverse challenges that demand continued learning and greater proficiency of the skills used in our profession.

Written by Bj Rollison

November 18th, 2009 at 8:09 pm

The Minefield Myth (Part 2) – The Value of Regression Testing

without comments

Originally Published Friday, January 30, 2009

Last week I discussed the fallacy of the minefield analogy misrepresented by some people to suggest regression testing as uninteresting or unlikely to reveal new or important information.  Their premise is that executing the same test is similar to walking in someone’s footsteps through a minefield. While this argument seems logical on the surface, this interpretation of the minefield analogy trivializes the importance of regression testing. However, there may be specific instances where regression testing (especially the execution of redundant, poorly designed, basic scripted test cases that simply reuse hard-coded data or mindlessly follow a prescriptive steps to retest unchanged or unaffected areas of code) may not provide great value to the organization. Some of those situations include:

  • retesting scripted, or simple procedural programs
  • retesting programs that are relatively static (unchanging),
  • retesting programs with no internal or external dependencies
  • rapid testing approaches where the test strategy is to simply sample the program behavior in order to provide a quick assessment and find some bugs

So, everything being equal, I would tend to agree that walking through someone else’s footsteps in a minefield may be somewhat redundant. But, there is a huge difference between redundant testing and effective regression testing. Redundant testing has zero probability of revealing new or valuable information. But, well-designed tests are not all equal, and in iterative and agile software development lifecycles some portion of effective regression test suites have some reasonable probability in exposing new information. For example, regression testing is typically useful where:

  • the underlying code base is changing (new features, refactoring, or bug fixes)
  • poor or incorrectly used revision control processes
  • a change in one module affects other modules in software developed using object oriented and procedural programming paradigms
  • the complex system has other internal and external dependencies
  • the software is regulated by some governing authority or law (Sarbanes-Oxley, FDA, FAA, etc)
  • the system is highly critical
  • an established baseline is required or important
  • legacy code bases where new functionality can unintentionally destabilize older code
  • code bases contain lots of ‘spaghetti’ code that is simply patched together
  • the results of a well-designed regression test suite helps instill a sense of confidence in the decision makers

In these contexts, an effective regression testing strategy can not only help identify important and potentially destabilizing changes in expected functionality quicker, but can also increase overall confidence in the product; especially in key areas.

Overcoming some common misconceptions of regression testing

One common misconception of regression testing strategies is that all tests are equal and all documented (manual or automated) tests go into the regression test suite. But, the simple fact is that not all tests are equal, and not all tests need to be re-executed repeatedly (every build) throughout the development lifecycle. For example, a test to check for proper tab order on a dialog or property sheet doesn’t need to be re-executed on each new build unless the UI elements have changed on that dialog or property sheet. So, in most situations this type of test is probably a poor candidate for inclusion in a regression test suite. (As a side note, if the UI elements are changing every build, I wouldn’t even test tab order until there was at least some semblance of UI stabilization.) When redundant test cases or tests that don’t provide great value to the overall testing effort are slovenly added to the regression test suite to simply increase code coverage or to artificially inflate the number of tests for ‘feel-good’ generally only results in an overloaded test suite that rapidly grows out of control and becomes a management nightmare.

Another common misconception of regression testing strategies is the suggestion that unit tests are sufficient for retesting code churn. Of course, this is simply a fool-hearty assumption and idealistic. Unit tests are a type of smoke tests performed by developers. The primary purpose of unit testing is to verify a method or function does what it is supposed to do, and may include ancillary negative tests as well. However, unit tests generally do not include comprehensive data coverage, data permutations, combinatorial analysis of parameters, etc. (which is why API testing is important in a well-rounded test strategy). Also, unit tests are usually executed in a “clean-room” environment using stubs or mock-objects, although in some cases unit tests are reran on private builds before the code is checked into the main build tree as a form of low level regression tests. While there have been significant advances in both static and dynamic analysis tools to increase the robustness of unit tests, if unit testing were the answer for effectively evaluating code churn I suspect a lot of testers would be out of a job and testing would simply be a rudimentary process of behavioral-type acceptance testing performed by non-technical ‘end-user-like’ individuals poking about the UI looking for errant anomalies and subjectively evaluating their perception of ‘usefulness’ of a product.

Another common misconception of regression testing strategies is that regression tests are less valuable because of their ineffectiveness in identifying or exposing ‘new’ bugs. This is a myopic view of testing that focuses on the current version of the product, and on ‘bug-finding’ as the main purpose of testing. Of course, value is very subjective, and industry reports suggest that less than 15% of the bugs reported during a product development lifecycle are detected via regression testing. But, if even half that number are critical issues that would cause the company to pay upwards of $100K to release a patch (or even worse, the issue leads to a major calamity) then regression testing (especially highly automated regression testing) is most probably worth the time and effort. Of course, the cost of designing effective regression test cases is quite high and executing those tests requires valuable resources. So, we should not just consider the legal or financial liability issues to the company. We should also consider the product shelf-life as a factor. An effective suite of well-designed regression test cases will not only help provide a baseline assessment of the current version, but a significant portion of the test suite will/should be reused during maintenance or sustained engineering of that product for upwards of 10 years, and some subset of those tests in that suite will also be reused on the next release or version of the product. There are additional benefits to well-designed regression test cases (or any test case for that matter) in that it preserves knowledge and helps eliminate or at least reduce tribal knowledge and hero worship mentalities.

Some effective regression testing strategies

The regression test suite is essentially a set of test cases that provide a baseline measure of expected, or important functionality, and to verify previously fixed bugs do not reoccur. Based on those assertions, test cases included in the regression test suite must be carefully selected to prevent overloading or rerunning unimportant tests during a regression test pass. I generally recommend 4 types of test cases for inclusion in a regression test suite:

  1. Test cases designed to verify critical functional attributes and capabilities of the software program
  2. Test cases designed to validate baseline functionality/behavior specified in project requirements or user acceptance criteria
  3. Test cases designed to verify functional anomalies that are found and fixed during the software development lifecycle
  4. Test cases designed to evaluate collateral or dependent areas associated with a code fix

Even with these targeted tests the regression test suite can grow quite large. Within the regression test suite categorize the test cases by the feature or component that test directly targets, and sub-categorized to identify dependent modules or features. Associating test cases in the regression test suite with the primary features or components they are designed to evaluate allow the tester to prioritize the regression test pass to focus initially on the modules in which code churn occurred,  followed by dependent or interrelated modules, and finally the other remaining tests in the suite. Also, prioritize test cases within each category based on criticality. Prioritization of tests is another mechanism we can use to help us better identify an appropriate set of tests based on the constraints of available time and resources.

Finally, automate, automate, automate! In order to gain the full benefit of an effective regression test suite (similar to the build verification/acceptance (BVT/BVA) test suite ) automate as many regression test cases as is reasonable possible. But, I am referring to well-designed automated test cases; not just a bunch of simple, elementary UI script monkeys. While some regression test cases will be very targeted prescriptive tests designed to verify specific functionality, most well-designed test cases include variability in the execution while still evaluating the hypothesis, or purpose of the test case. Also, some test cases in the regression test suite will be executed through the UI layer, but in applications where the functional code is separated from the UI layer some regression test cases are more effectively executed below the UI or through an abstraction layer.

Similar to other testing approaches, regression testing is an organizational investment, and its value to the team must be considered as an investment and not done simply because it seems like the right thing to do. In some cases it may not make much sense to invest heavily in regression testing, but in cases where a missed regression in functionality results in fines, legal costs, or potential loss of hundreds of thousands of dollars in post-production costs I suspect regression testing is a critical component of the overall testing strategy to validate a baseline and instill confidence.

Written by Bj Rollison

November 18th, 2009 at 8:06 pm

The Minefield Myth (Part 1)

with 2 comments

Originally Published Monday, January 19, 2009

In my studies at university I studied anthropology. Several courses I took surveyed folklore and its relevance in modern society. Many people mistakenly believe that most folklore (folktales, legends, myths, ballads, etc.) are purely fictional and simply fanciful tales. However, folklore is usually based on some grain of truth, or is used to instill societal or religious mores and values. For example, social scientists have found that many ancient civilizations have folklore regarding a massive “flood” in the distant past which wiped out huge populations of people. Did this actually occur? Well, we don’t know for certain, but geological evidence does suggests is that at one time coastal waters did rise significantly. Was this caused by cyclical change in the earth’s temperatures or by a series of earthquakes causing tsunami’s to ravage coastal villages? We don’t know; but the folklore may indicate that at some point many societies suffered a devastating travesty caused by rising waters. Was the story embellished over time…certainly. Another example is the “Cinderella” story. There are over 450 versions of the “Cinderella” story around the world.  The story is about over-coming adversity and oppression, and avoiding self-pity and selfishness. Basically, it is much more than a Disney animation, in traditional folklore it has been passed down through the generations to reinforce societal values.

The first time I read about a minefield analogy was in the context of sampling. Later, Brian Marick used a similar analogy to suggest repeating tests (regression testing) is not likely to reveal new bugs. Marick’s analogy is perpetuated by Bach, Kaner, and others who tend to diminish the value of regression testing (especially automated testing) because we are simply traversing a minefield by following a previously cleared path.

The Marick minefield analogy is simply an alternate perspective of Beizer’s pesticide paradox which states “Every method you use to prevent or find bugs leaves a residue of subtler bugs against which those methods are ineffectual. Basically, no single approach to software testing is effective in identifying all categories of defects, and we must use many approaches in software testing and vary our tests. In that context I absolutely agree with the analogy.

However, a basic problem of Marick’s minefield analogy as it is often misrepresented is that it seems to treat the software under test as a static, unchanging field of easily exposed mines.

If you were hired as a consultant to come in a perform a rapid evaluation of a software product using a sampling approach such as exploratory testing, then Marick’s minefield analogy is a wonderful strategy. In that context re-running a test provides no new value and has little probability of exposing new information.  However, for the rest of us who work in iterative software development lifecycles (including agile lifecycles) building complex systems with interdependent components the minefield analogy may not be as useful.

For example, in complex systems with interdependent modules we know that a change in one module can adversely affect other modules that have some dependence on that module. So, a change in one module can impact the functional behavior of other modules. In layman’s terms, activating a mine while traversing one path through the minefield may reactivate an already cleared mine in another part of the minefield, or even plant a new mine in a previously traversed path.

In iterative development lifecycles, the minefield is in constant flux (at least until the code complete stage, but even then the code is changing as issues are being addressed.) In iterative lifecycles features are being added, changed, and possibly removed during the process. Depending on the length of your product lifecycle the changes can be massive. The PDC release of Windows 95 ‘looked’ very different as compared to the final release.  The build verification/acceptance test suite for Windows 95 was a relatively static baseline regression test suite that continued to find ‘regression’ problems up to the final weeks of the project due to code churn.

Also, not all mines are equal! Some mines are quite easy to detect while others are very hard to find which is why systematic probing is still used by professional’s to clear latent minefields. Similarly, an exploratory approach to testing software will easily reveal some bugs very quickly, but without ‘systematic probing’ we could just as easily overlook other types of issues.

There are also different types of mines which may be activated differently, so traversing a minefield with a size 10 boot may not activate the mine, but someone with a size 12 boot, or who weighs more than the previous person may in fact activate the mine. Likewise, traversing the same path through software using different data or applying a more systematic analysis of a path may reveal interesting information or expose anomalies that were not previously discovered. For example, throwing simple ASCII characters at a text input control is not likely to expose any bugs (or restated it is likely to show us a clear path through the minefield). However, when we take that same exact path using Unicode characters, or Unicode surrogate pair characters we are very likely to expose problems not revealed previously.

In part 2 I will discuss regression testing and specific situations where regression testing is very valuable.

Written by Bj Rollison

November 18th, 2009 at 8:04 pm

Data-Driven Testing

without comments

Originally Published Sunday, January 04, 2009

I am generally not a big fan of static data in test automation, but being a pragmatic person, I know there are clearly times when using data-driven testing is just plain common-sense. For example, data-driven testing is an effective automation approach when designing ‘black-box’ tests for testing an API.

Data-driven testing is a common approach to test automation where static test data is passed to application parameters and the expected result (which is usually also read from static data) is compared against the actual result. This automation approach is effective when the actual result compared to some expected result can be resolved as a Boolean outcome. In other words, if the actual and expected results match the outcome is true and the test passes; otherwise the outcome is false and the test fails. (Of course, if something occurs during the test where there is no actual result then that particular test is usually logged as indeterminate.)

Of course, the key to effective data-driven testing is the data! If we don’t identify the most appropriate data to use in the test then the test case may have holes and we might overlook important information or miss critical anomalies. If we have too much redundant data then we may be simply running unnecessary tests (yes, even with test automation redundant testing is not an efficient use of resources).

Let’s say we had to test an API method such as:

public bool IsValidNetBiosName(string name)

where the return value is true if the string argument passed to the name parameter is a valid NetBIOS name on the Windows operating environment; otherwise it returns false.

With a data-driven testing approach we could use a simple CSV file that contained the string arguments and the expected result for each string passed to the name parameter. A partial sample of the CSV data file would be:

a,true
validname,true
validnamexxxxxx,true
invalidnamexxxxx,false
invalidnamexxxxxxxxxxxxxxxxxxxxxxx,false
,false
null,false,
xx\x,false
xx/xx,false
x:x,false
xxx*xx,false
x?xx,false
xxxx",false
;,false
xxx|xxx,false

(NOTE: null is a special case in which we need to convert the string “null” to a null in the test code, and the test above null is an empty string. An empty string and null are two different things and both must be tested in this case.)

Next, we need to read in the CSV file into our automated test, and perhaps the easiest way I found to read in a text or CSV file in C# is with the File.ReadAllLines method. The ReadAllLines method opens a text file, reads each line of text as an element in a string array, and then closes the file. Once we have a array of all lines in our data file, we simply need to parse each element in the string array into test data and/or expected result, and then compare the actual result against the expected result as illustrated in this example.

   1: // Read each line in the entire CSV file into a string array

   2: string[] testDataArray = System.IO.File.ReadAllLines("myTestData.csv");

   3:  

   4: // Iterate through each element in the test data file

   5: foreach(string test in testDataArray)

   6: {

   7:     // Split each element in each line into an array where the elements are the

   8:     // test data and the expected result

   9:     string[] testElement = test.Split(',');

  10:     string testData = testElement[0];

  11:     string expectedResult = testElement[1];

  12:     

  13:     // Special case for passing a null to the API parameter

  14:     if (string.Equals(testData, "null", 

  15:         StringComparison.OrdinalIgnoreCase))

  16:     {

  17:         if (string.Equals(api.IsValidNetBiosName(null).ToString(), expectedResult, 

  18:             StringComparison.OrdinalIgnoreCase))

  19:         {

  20:             result = "Pass";

  21:         }        

  22:         else

  23:        {

  24:             result = "Fail";

  25:        }

  26:     }

  27:  

  28:      // Compare the return value against the expected result

  29:     else if(string.Equals(api.IsValidNetBiosName(testData).ToString(), expectedResult,

  30:         StringComparison.OrdinalIgnoreCase))

  31:     {

  32:         result = "Pass";

  33:     }

  34:     else

  35:     {

  36:         result = "Fail";

  37:     }

  38: }

This is rather simple example, but data-driven testing is effective for unit testing, API testing, and can even be used in automated GUI testing (although data-driven automation may only have limited applicability in GUI automation). I am a firm believer in the KISS principle when it comes to developing automated tests, and the ReadAllLines method is perhaps the easiest and most efficient way to read in data file for data-driven development. Of course, data-driven testing doesn’t solve all problems. Chan Chaiyochlarb has a good post on some pitfalls to watch out for. But, in the right context, data-driven testing can be one approach used in automated testing.

Written by Bj Rollison

November 18th, 2009 at 8:03 pm

The Ultimate Desktop Reference

with 4 comments

Originally Published Wednesday, December 24, 2008

I have a library of books and white papers on software testing, engineering processes and management, and software development that I have read and reference quite often. For new testers I generally recommend A Practitioner’s Guide to Software Test Design by Lee Copeland, and How to Break Software: A Practical Guide to Testing by James Whittaker. There are 5 books I highly recommend (not including How We Test Software at Microsoft which I co-authored and also highly recommend).

In my current role as a teacher, trainer, and mentor of new testers the 2 books that are constantly on my desktop are Testing Object-Oriented Systems: Models, Patterns, and Tools, by Robert V. Binder, and Software Testing Techniques, 2nd edition by Boris Beizer. Not that I don’t frequently reference other books, but to me these are the quintessential books on the foundational knowledge of software testing techniques and methodologies for intermediate to advanced testers with a strong technical background.

But, the booklet that I would keep in my shirt pocket if I tested products on a day-to-day basis would be Josh Poley’s Black Book. Josh’s Black Book is the ultimate desktop reference for software testers (and developers). While this book is primarily intended to aid those who work on projects developed in C/C++, it has loads of information that is valuable to any tester working on just about any technology. From decimal and named entities of ISO characters to error codes for DOS, VB, JScript, HTTP, and of course Windows Errors this book is jammed packed with great information and quick reminders for both developers and testers.

Written by Bj Rollison

November 18th, 2009 at 7:57 pm

Posted in General Testing Topics

Tagged with

Prescriptive vs. Descriptive ‘Scripted’ Tests

with 2 comments

Originally Published Tuesday, December 16, 2008

Something that raises red flags in my brain is hard-coded strings or test data in either a manual test or an automated test. Yes, I know that sometimes there are times when a test must be very prescriptive and use specific data and follow specific procedures, but I am absolutely amazed how often I see examples of test cases that are so prescriptive in the detail of execution that it completely takes any thought out of executing that test. While it can well be argued that the execution of that test might very well be a brain-dead activity, I would also argue that the person who wrote such a test also lacks creativity and generally has no clue of how to actually design a test.

We did a simple experiment on test design and execution. The purpose was to see if we could design a ‘scripted test’ that provided the tester with greater freedom, cognitive engagement, and deductive reasoning.

The simulation in this experiment was a simple web page in which the user entered a stock ticker symbol (test data), pressed the "Get Quote" button, and compared the displayed result against the expected price at that time (using a real-time 3rd party stock quote monitoring system). The test was a positive test, but was written 3 different ways as illustrated below.

The first test was very prescriptive and was written as follows:

Purpose: Verify the web page displays the most recent quote for a valid stock ticker symbol registered on a major stock exchange

Steps:

    1. Enter "MSFT" in the Stock symbol text box
    2. Press the "Get Quote" button on the web page

Verify: The displayed quote matches the real-time quote.

Given this test over 95% of the subjects simply entered MSFT and looked for a result. Some did not even appear to compare the result against the real-time quote.

We modified the steps in the test as follows and used a second study group.

Steps:

  1. Enter a valid stock ticker symbol in the Stock symbol text box (e.g. "MSFT")
  2. Press the "Get Quote" button on the web page

In this second session more than 75% of the the test subjects still only entered MSFT and looked for a result. On a later day in the week, we asked the same group to run the test again. Once again over 75% entered MSFT as the test data.

So, we modified the steps in the test once more as follows and used a third group in the experiment.

Steps

  1. Enter a valid stock ticker symbol in the Stock symbol text box from a list of available stock ticker symbols at
    <Link to NYSE>
    <Link to NASDAQ>
    <Link to S&P>
    <Link to London stock exchange>
    <additional links>
  2. Press the "Get Quote" button or press the Enter key

In the third part of the experiment over 95% of the third group clicked the links and selected a stock ticker symbol at random. Some testers copied and pasted the ticker symbols from the linked web pages into the Stock symbol text box, but the majority simply entered the symbol via the keyboard. Some (less than 5%) of the participants simply entered MSFT. (Which is not really surprising since they work there!) What was more interesting was that when the same groups were given this test at a later time over 95% of the testers selected a different link and 99% selected a different stock ticker symbol.

This third part of the experiment essentially is the same test (proves the same hypothesis) but uses a descriptive ‘scripted’ test approach. A more descriptive test can still achieve the stated purpose and provides 2 more important benefits.

  • The purpose of the positive test (verify the web page displays the most recent quote for a valid stock ticker symbol registered on a major stock exchange) is achieved without hard-coding specific test data or specific results to check against. This means the tester has to use basic deductive reasoning in order to validate the results of the test.
  • The breadth of test data used in the test significantly increased (even if the test was executed by the same person), thus increasing the variability of each successive test and provides the tester with great freedom in selecting the data to use in each test and how to interact with the system under test.

Whether or not the "Get Quote" button is pressed or the Enter key is pressed is tangential to the purpose of the test, so in this case it is not important what action the tester takes to send the request to the web service to get the stock quote; he or she has to trigger that event.

I suspect that one reason why many ‘scripted’ tests are very prescriptive in nature is because they are written from the "watch me" perspective. In other words, the test is crafted from "this is what I did, so that is how I will write my test…word for word." I also suspect that in many of these cases the tester really doesn’t have a clear purpose of what he or she is trying to prove or disprove, that tester is simply writing a script or developing an automated test to satisfy some thoughtless process or increase some magic number.

Watching a tester perform a set of steps and then recording that same set of steps in either a manual or an automated test does not constitute test design; it is a brain-dead activity. Test design is not simply watching a person perform a set of actions and ‘scripting’ that into a ‘test.’ And test design doesn’t mean reacting to the results of one test and thinking of another test ‘on-the-fly.’ Designing robust tests is a separate activity from test execution. Designing a robust, descriptive scripted test that enhances the effectiveness of the testing effort requires incredible creativity in order to achieve its desired objective.

So, the next time someone tells you that scripted tests are too restrictive, impede the freedom of the tester, or limits your creativity I would suggest to you that that perspective is rather narrow-minded and limited to a vision in which ‘scripted’ tests are all highly prescriptive in nature and result in a set of brain-dead steps. Conversely, any professional tester realizes that test design is a very creative process involving the application of your cognitive and analytical skills to help you design a test that aids you in proving or disproving your deduced hypothesis from pluralistic perspectives within the context of the situation, and to ultimately enhance the effectiveness of your testing effort.

Written by Bj Rollison

November 18th, 2009 at 7:55 pm

Posted in Testing Practices

Tagged with

How We Test Software At Microsoft

with 7 comments

Originally Published Saturday, December 06, 2008

hwtsamsThis past year has been quite busy for me. Too busy. Besides trying to keep up with my busy teaching schedule, driving some key initiatives and collaborating on others, planning new course development for SDETS, I presented at 11 conferences around the world, wrote a few magazine articles, and developed a new software test automation program at the University of Washington. Somewhere in the midst of all that I co-authored a book with Alan Page and Ken Johnston that is now available to order, and should be on bookstore shelves within a week.

Collectively we have more than 3 decades of experience in various roles and business groups around the company. Coupled with insights and experiences from the many other testers (past and present) at the company the book is filled with great ideas and examples of some of the testing processes and procedures used around the company.

But it is not just another book of how to test software. This book provides a lot of insight into Microsoft, illustrates some of our best practices (and also reveals some of our faux pas’), and answers the question (albeit indirectly) we get all the time; “How do you test software at Microsoft?”

Written by Bj Rollison

November 18th, 2009 at 7:48 pm

Posted in Testing Practices

Tagged with

Test Automation: Temporary Test Files

with 2 comments

Originally Published Tuesday, December 02, 2008

There are occasionally times during an automated test needs to create a temporary file during the execution of that test. The problem is that often this file is left behind on the system, or even worse stored in some obscure directory on a server. I say worse because those files will be discovered by someone approximately 9.25 months from the time they were created, and that person will spend about 6.5 hours trying to figure out who they belong to before realizing they are no longer needed and can blow them away.

The test designer must decide whether or not to leave test artifacts once created. My general rule of thumb is if the test run is a  system level (or dirty room) test then it is probably ok to leave any artifacts from previous tests lying about on the local machine. But, if the test run is a integration or component level (or primarily clean room) test, then in order to preserve the clean room environment (in other words, I want to eliminate or control the number of unknown environmental variables) then I probably want to remove any test artifacts that are created during the execution of a test.

There are many ways to create a file and delete a file in an automated test. But, one of the easiest ways is to use File class members in C# to create a file that will automatically delete itself when the automated test ends. The FileOptions enumeration in File.Create method provides additional options for FileStream objects including a member that will automatically delete the file when the thread that created it closes.

FileStream fs = File.Create(path, bufferSize, FileOptions.DeleteOnClose)

Of course if you wanted to design in the option to delete the file at the end of a test based on some Boolean decision you could simply create the file using WriteAllText(), WriteAllLines() or WriteAllBytes() methods.

Another problem with temporary test files is that the test designer frequently hard-codes a clever name such at “test.txt” to name the temporary file in the test case itself. Then of course, the next test fires off and that test also has a hard-coded file name which also happens to be “text.txt” that will either overwrite the first file, or in the worse case append to it. Again, the tester can easily create a random string generator to generate random strings for use as file names, or use another method that is already built into C#.

Another useful method for creating temporary file names is found in the Path class. The Path class member Path.GetRandomFileName() will generate a random file name. This random file name also includes a random extension which may not be desired, so you can use the Path.ChangeExtension method to automatically generate a randomly named file and change the extension to the desired extension.

string filename = Path.ChangeExtension(GetRandomFileNamePath.GetRandomFileName(), "txt");

Now, we can create a temporary test file at a desired location (say the My Documents folder) on the local machine with a random name and that file will be automatically deleted at the end of the test.

FileStream fs = File.Create(Path.Combine(
    Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments),
    Path.ChangeExtension(Path.GetRandomFileName(), "txt")),
    bufferSize, FileOptions.DeleteOnClose);

Written by Bj Rollison

November 18th, 2009 at 7:45 pm

Posted in Test Automation

Tagged with

Training is Controversial…Really?

without comments

Originally Published Monday, November 24, 2008

I just returned from a business trip to Israel. I was a long time on the road (a week at EuroStar followed by a trip to Israel to teach at our 2 R&D centers there). So, I really lucked out because I got the opportunity to go sailing this past Saturday and unwind a bit. I checked the day before and all the boats were all reserved, but since the sky was a bit gray many people cancelled their reservations. Having lived in the PNW for the past 15 years or so I have become quite accustomed to gray skies, so it didn’t bother me in the least.

Wow…sailing in the Med, and sailing a Performance Club 420. It has been a long time since I was in a 14’ racing boat, and even longer since I was in one that we launched from the beach in 1 – 2’ waves. It was wet, it was wild, and it was fast. Saturday brought back a lot of memories from my childhood, and it also reminded me of all the various things I learned about sailing as a teenager. These days I mostly sail aboard my Cooper 416. She is a heavy cruiser at 42’ long and weighing in at 30 thousand pounds; she is very stable, turns like a dead elephant, she is generally is very forgiving, and most of all…she is comfortable. Small racing boats are light (@ 250 lbs), nimble, wet, and not so forgiving when you make a mistake. In brisk winds and breaking seas you must be very vigilant, act quickly, time your tacks, and utilize every bit of skill and knowledge to keep her upright and prevent pitch-poling or swamping when running downwind and returning to the beach.

In all honesty, sailing itself isn’t really all that hard. Any chowder-head can eventually learn how to work lines and get a sailboat moving in some odd direction. But, knowing how to sail well and how to sail in a variety of circumstances takes a lot of skill and knowledge. Sailors are at the mercy of mother nature, so they must understand such things as weather patterns, and geographical influences on the wind and tidal flows. Sailors must also have in-depth knowledge of navigation and navigational techniques, and boat handling in different contexts such as light air or heavy seas. Sailors (not sunny weather day sailors) must also understand math and physics, and theoretical concepts such as Bernoulli’s principle. While some sailors may not fully understand Bernoulli’s principle, they understand the concept every time they trim the sails for maximum performance. Those who do understand it can easily explain to novices the physics of why a Marconi rigged sailboat can sail faster into the wind  compared to running the wind from dead astern or on a broad reach, or why and how adjusting the rake of the mast affects sailing angle.

As a child my father taught me that if I really wanted to be good at something it wasn’t enough to just do it. Practice is important, but if I wanted to really understand something I have to also understand how and why things work. He taught me the more I understood how something works from both a theoretical and practical perspective the better equipped I would be to apply critical thinking skills to various situations, to face new challenges, and also potentially come up with alternate ideas and approaches because I could think through the situation both logically and abstractly.

So, you can imagine that I was somewhat surprised when I came across a posting on a software testing distribution list in which someone suggested that teaching testers various techniques and methods commonly used in our profession is controversial. Considering studies indicate a significant number of testers have never received formal training in software testing, and anecdotal evidence suggests less than 10% have read more than 2 books on the specific subject it boggles my mind to try to understand how anyone who considers themselves to be a professional in this discipline could even contemplate the notion that training testers in the foundational knowledge of our profession and its ‘systems’ is controversial?

Of course, if your entire perspective of testing is simply bug finding, and you are easily amused by parlor tricks that expose inane issues, blindly accept wild hyperbole without empirical evidence or a logical explanation then perhaps actually studying about software testing practices or computer engineering, and learning about professional practices used in the discipline might be controversial.

Clarke’s third law – “Any sufficiently advanced technology is indistinguishable from magic.”

Written by Bj Rollison

November 18th, 2009 at 7:43 pm

Posted in General Testing Topics

Tagged with

Boundary Testing Isn’t Guessing at Numbers!

with one comment

Originally Published Tuesday, November 04, 2008

At a recent conference a speaker posed a problem in which a field accepted a string of characters with a maximum of 32,768 bytes, then asked the audience what values they would use for boundary testing. Immediately some of the attendees unleashed a flurry of silly wild ass guesses (SWAG) such as “32,000,” “64,000,” and, of course, what attempt at guessing would be complete without someone yelling out “how about a really large string!” One person asked whether it was bytes or characters? A reasonable question, but the speaker than began talking about double byte characters (DBCS). (Double byte is, in technological time, a relatively antiquated character encoding technology since most modern operating systems process data as Unicode.)

So, while some folks in the audience continued to shout out various SWAGs, I was still pondering why anyone in their right mind would artificially constrain a user input to such a seemingly ridiculous magic number within the context of computer processing and programming languages. Programming languages allow specific ranges of numeric input. Most strongly typed languages such as the C family of languages have explicit built in or intrinsic data types that include signed and unsigned ranges. For example, an unsigned short is 2^16 or 0 through 65,535, and a signed short is also 2^16 but the range is -32,768 through +32,787. Since the speaker didn’t indicate what programming language was used in this magical field, the only logical conclusion a professional tester can rationally deduce is that 32,768 is a magic number, or in other words a “hard-coded” constant value embedded somewhere in the code.

Asking questions is important! But, asking a bunch of contextually-free questions or throwing out random guesses is usually not the most efficient or productive use of one’s time. Asking specific rational questions or making logical assertions based on knowledge and understanding is important, and is generally more productive; especially when testing the boundary conditions of input or output values in software. Boundary testing is a technique that focuses on linear input or output values that are fixed, or fixed-in-time and used for various computations or Boolean decisions (branching) within the software. Similar to most testing techniques boundary testing focuses on exposing one category of issues based on a very specific fault model, and is an extremely efficient systematic approach to effectively expose that particular category of issues. In particular boundary testing is useful in identifying problems with:

  • improperly used relational operators
  • incorrectly assigned constant values
  • and computational errors that might cause an intrinsic data type to either overflow or wrap especially when casting or converting between data types (proper identification of the data type and knowledge of the minimum and maximum ranges is critical)

I previously wrote about approaches to help the tester identify potential boundary conditions, and how to design tests to adequately analyze those specific boundary values. As I previously stated, boundary testing involves the systematic analysis of a specific value. For example, a long file name on the Windows platform (both the base file name and the extension) should not exceed 255 characters. For file types that use a default 3-character extension the most interesting boundary values are 1 character (minimum base file name length) 251 characters (maximum base file name length assuming a standard 3-character extension), and 255 characters (with or without an extension to test what occurs with a base file name equal in length to the maximum base file name with a standard 3-character extension. (Of course, if the default extension is 1-character, or 2-characters, or 4-characters, etc., than the maximum base file name without extension needs to be recalculated.) Now, let’s see why specific values are important and critical to accurately analyze boundaries.

On Windows Xp I used Notepad to test file name boundaries with a default 3-character extension. Of course the minimum -1 value is an empty string, and minimum and minimum +1 is saving a file with a 1-character and 2-character file name respectively. Next I entered a base file name of 250-characters (maximum -1) and 251-characters (maximum allowed assuming a default 3-character extension) and these file names were saved to the system with the default extension. Then I entered a 252-character file name and I got the expected error message indicating the file name is too long. But, what about my boundary of 255 characters maximum. (IMPORTANT – boundary values are not just at the edges of the extreme ranges of values, but there could be sub or supra boundary values within a range of values that may occur at the edges of equivalent class ranges, or specific values in special or unique equivalence class subsets.) So, I wondered what would happen if I entered just a base file name of 255 characters (which is the maximum length of a file name assuming an extension is also part of that file name)? Interestingly enough, on Windows Xp the operating system saved a file with 255-characters, but it did not have any extension which means that there was no application associated with the file. The same occurred with a 254-character base file name, and when I tried the maximum +1 of the overall complete file name range I was again presented with the same message I got with a 252 character base file name indicating the file name was too long.

Fortunately, the above issue was fixed in Windows Vista. But, as sometimes occurs in complex systems one fix occasionally leads to a different (but related)issue in the same functional area which is why regression testing is typically an effective testing strategy. So, when I ran my ‘regression tests’ on Windows Vista I quickly discovered the system would not save a file with only a base file name of any number of characters greater than 252-characters via Notepad. But, as I ran the specific boundary tests I realized something very important! When I entered a base file name of 252-characters I received the following error message.

image

And when I attempted the test with a base file name composed of any number of characters greater than 252 I received the following error message.

image

Now, those of you who are paying attention realize these 2 messages are different. Of course, in either case a file is not saved to the system which is what I expect; however, there is a strange anomaly here. Although one might notice the first message prepends the drive letter to the string of 252 characters, and the second message does not. But, the important question doesn’t really have anything to do with the message text per se, in this case the professional tester tester should ask, “why is there an apparent conditional branch in the code that shunts control flow one way for a base file name of 252 characters and a different path for a base file name greater than 252 characters?”

Of course, if we just guessed, or tested ‘a really large string of characters, we might have never exposed this anomaly which occurs only at the maximum + 1 length of a base file name (assuming a default 3 character extension). Interestingly enough, if a highly skilled, technically savvy tester had designed white box tests for decision testing or path analysis then I suspect he or she could have very easily found this anomaly with even greater efficiency and exposed it earlier in the cycle.

The point here is that boundary testing is simply not random guessing, wild speculation, or simple parlor tricks. The technique of boundary value analysis requires in-depth knowledge of what the system is doing behind the user interface, and careful analysis of system and data to accurately determine the specific boundary conditions and a rigorous analysis of linear values immediately above and below each identified specific boundary value. Testers must be able to properly identify the specific and interesting boundary values based on in-depth knowledge of the system, an understanding of what is happening beneath the user interface, and experience.  Then we can perform a more systematic analysis of any identified boundary conditions and potentially increase our probability of identifying real anomalies caused by this specific fault model. Boundary value analysis is a prime example of where good enough is simply not good enough in our discipline…we must be technically spot on!

Written by Bj Rollison

November 18th, 2009 at 7:40 pm