I.M. Testy

Treatises on the practice of software testing

Archive for the ‘Testing Practices’ Category

Vista Bug – Incredible Disappearing Files

with 4 comments

Originally Published Monday, August 20, 2007

If you like obscure bugs, then I think you’ll love this little gem!

USB flash drives are wonderful little gadgets, and I have several of them to store various files. Tonight I was using a flash drive to move some files around between machines when I noticed a slight problem. It seems when I deleted one of the files on the flash drive all the other files in file list disappeared! But, wait…I know I only had one file highlighted, so what happened to the other files? Fortunately, a quick press of the F5 function key refreshed the window and my remaining files reappeared.

This defect manifests itself in various ways , but here is the easy way to reproduce it:

  • Insert your favorite flash drive and open an explorer window.
  • Resize the explorer window’s horizontal axis to a point where the tree list pane just begins to collapse
    (The size of the tree list pane is not especially critical to this problem, but this step makes it easier to reproduce the defect)

image

  • Right click in the file list pane and select New -> Text Document from the context menu
  • Rename the file with a long file name (string length must exceed the width of the file list pane)

image

  • Press the F5 function key to refresh the window
    (Notice the file name extends beyond the window rather than being truncated by the ellipsis)

image

Right click in the file list pane and select New -> Text Document from the context menu

image

  • Abracadabra! No files! (Also notice there is no scrollbar.)
  • Press the F5 function key again…and abracadabra….the file(s) reappear!

As noted above there are multiple ways to reproduce this particular bug, but the root cause is the same regardless of whether the files preexist and you highlight multiple files, highlight and select to delete that file, etc.

It does not require a flash drive to replicate this defect. But, I was using a flash drive when I encountered this problem and you can probably imagine my initial reaction (somewhere between surprise and horror) when the remaining files on the flash drive seemed to have disappeared…lost forever!

Now, this is not an especially nasty bug and it is rather obscure. But if you have a lot of files with long file names, and you don’t maximize explorer views then I bet your F5 function key might get a good workout simply because there appear to be so many ways to reproduce this problem.

First, Windows taught us the 3 finger salute. And now with Windows Vista (which I still think is way cool) we have a 1 finger (F5 function key) salute!

Have fun playing with this one!

Written by Bj Rollison

November 13th, 2009 at 9:13 pm

Posted in Testing Practices

Tagged with

The Code Coverage Metric is Inversely Proportional to the Criticality of the Information It Provides.

with 2 comments

One of the best aspects of my current role is the opportunity to interact with so many talented, highly skilled, and extremely intelligent testers at Microsoft and other companies around the world. Last week I was teaching a new group of SDETs at Microsoft, and during our discussion of code coverage (the metric) and code coverage analysis (the process of analyzing areas of untested or unexercised code) Alex Kronrod (an intern from UC Berkeley who attended the class) stated "so basically what you’re saying is the code coverage measure is inversely proportional to the amount of information it provides." Now, I don’t know whether or not there is exact proportionality in the code coverage metric and the information provided by the measure itself, but I thought about it a moment and thought to myself, "Wow, what a great perspective!"

Code coverage is a frequently sought after measure in software testing. Code coverage is an important metric and it should not be ignored; however, as a measure it must not be abused or over-rated, nor should we attempt to correlate code coverage as a direct measure of quality. While many teams strive for higher percentages of code coverage at the system level (which is good), the code coverage metric simply tells us if statements, block of statements, or conditional expressions have been exercised. Low measures of code coverage may sometimes result from software complexity and lack of testability or from testing ineffectiveness, but are generally indicative of a software project in peril (with regards to risk). Higher percentages of code coverage certainly help reduce perceived overall risk, but the code coverage measure by itself doesn’t necessarily tell us HOW it was exercised, and it doesn’t provide useful information about the areas of the code that have not been exercised other than what percentage of the code is at 100% at risk. (If we don’t test it; we can’t qualitatively say anything about it, so risk must be assumed to be 100%.)

Let’s examine the following simple example to explain this search algorithm to better understand how increased measures of code coverage provide less valuable information regarding testing effectiveness. This algorithm searches for a particular character in a string of characters and returns the index position of the character if found; otherwise it returns 0.

   1: private static int CharSrch(string s, char c)

   2: {

   3:   int i = 0;

   4:   int retVal = 0;

   5:   char[] cArray = s.ToCharArray();

   6:  

   7:   while ((i < cArray.Length) && (cArray[i] != c))

   8:   {

   9:     i++;

  10:   } 

  11:  

  12:   if (i < (cArray.Length))

  13:   {

  14:     retVal = i;

  15:   }

  16:  

  17:   return retVal;

  18: }

Using Visual Studio Team System to measure block coverage and executing a test in which s = "" and c = ‘c’ the code coverage measure is only 72.73% for the CharSrch method as illustrated in the figure below. In this example it is easy to understand why the relatively low code coverage measure is giving us valuable information (perceived risk is great, overall confidence is low). Clearly we have more testing to do!

CodeCoverageTest3

Again using Visual Studio Team System to measure block coverage and executing a test in which the search string is "abc" and the character to search for is ‘c’ the code coverage measure jumps up to 90.91% for the CharSrch method as illustrated in the figure below. Using just the code coverage measure as an indication of test effectiveness we might feel much more confident and perceive our exposure to risk is greatly reduced, and the algorithm is doing the right thing! But, we are still not at 100% (which is easy for this example), so we need just one more test to achieve that magic number.

CodeCoverageTest2

A third test in which the search string is "a" and the character to search for is ‘c’ we see the resultant code coverage is again 90.91% as illustrated in the figure below. By merging the code coverage results in Visual Studio Team System we can achieve 100% block coverage by merging the results of Test 1 ( s = "a" and c = ‘c”) and Test 2 (s = "abc" and c = ‘c’) . If unit tests were written in such a way as to check for an output of retVal == 0 for Test 1, and an output of retVal != 0 for Test 2, then both tests pass. Overall, my perceived risk is relatively low, and my confidence is relatively high as compared to the first test based on the code coverage measure. But, did we miss something?

CodeCoverageTest1

Although the percentage of block coverage is relatively high (OK…100% is the max), the information provided by the measure itself is actually less valuable because it may have actually failed to detect the defect in which the CharSrch method returns a value of 0 if the character is not found, and also returns a value of 0 if the search character is the first character in the string.

This simple example is not meant to discount the overall value of code coverage as a software metric. However, as professional testers we must realize that high levels of code coverage do not directly relate to quality, and code coverage is only an indirect indication of test effectiveness. From my perspective, the most important measure with regards to code coverage is not how much has been exercised, but the percentage of code that has been unexercised by our testing. That is the purpose of code coverage analysis (which is a great segue for a follow up blog post).

Written by Bj Rollison

November 13th, 2009 at 9:07 pm

And More on Testing Mp3 Files and the Boundary Testing Debate

with one comment

Originally Published Wednesday, March 07, 2007

Over the past 3 days I have learned more about Mp3 file encoding and decoding than I have since the technology was introduced. I don’t spend time downloading files from the Internet to burn CD’s, I don’t own an iPod or Mp3 player, or a digital video recorder. So, prior to this I haven’t really paid attention to this technology and was quite ignorant of the various tools available and their capabilities. But, I must say it is pretty fascinating from a technology standpoint even though I am not an audiophile or videophile.

I still disagree with Pradeep’s assertion regarding boundary testing and the notion of no fixed boundaries, but respect Pradeep’s expertise in the area of Mp3 technology. An Aussie gentleman by the name of Dean Harding pointed out my incorrect assumption regarding bitrate encodings and explained the LAME encoder does allow a freeformat option in "expert" mode to produce a fixed bitrate in one kilo bit increments between 8 kb/s and 640 kb/s. (Thanks for serving up the pie Dean.) However, of 30+ common decoders I only discovered 4 decoders supported freeformatted Mp3 files even if the encoded bitrate is less than 320 kb/s. Only one decoder (WinAmp MAD) is capable of decoding files above 600 kb/s.

So, (other than me having to eat a big helping of humble pie) where does that leave us in the specific debate about boundary testing, and Pradeep’s question "As a tester have you ever seen a boundary?" To that, I shall adamantly reply "yes" there are specific boundary conditions in software. Some are easy to find, some are not so easy. A tester’s ability to correctly identify a boundary value are heavily influenced by his/her in-depth domain and ‘system’ knowledge. For example, using the knowledge of Mp3 encodings I have learned over the past 3 days let’s go back and review what tests I would design based on Pradeep’s original description of the audio decoder that played an Mp3 file within the range of 24 kb/s to 196 kb/s.

Since 196 kb/s is not a standard Mp3 encoding supported by ISO standards let’s assume the Mp3 player used either a Cdex, LAME, I3dec, or WinAmp MAD decoder. Using this as a reference, and some recently acquired domain knowledge I would design a set of initial tests using the following sample test data (files encoded with the specified criteria).

  1. 23, 24, 25 kb/s – Specified minimum value and minimum -1, and minimum +1 values to analyze relational operators used to artificially constrain the encoding range to a low end of 24 kb/s.

  2. 195, 196, 197 kb/s – Specified maximum value and maximum -1 and maximum + 1 values to analyze to analyze relational operators used to artificially constrain the encoding range to a high end of 196 kb/s.

  3. 16 kb/s – this is the next ISO standard encoding bitrate below the specified minimum, so although the decoder does not support a file encoded at 23 kb/s (min -1 value) I would still want to check at the next lower standard value.

  4. 224 kb/s – this is the next ISO standard encoding bitrate above the specified minimum (same reason as explained above.)

  5. 32, 40, 48, 56, 64, 80, 96, 112, 128, 144 (see #6), 160, 176, 192 kb/s – these are the typical ISO standard Mp3 encodings within the specified range, so we should assume that all these must work properly because there is a high probability of decoding files using these bitrates. Since there are not many of them test each one.
  6. 143, 144, 145 kb/s because the 144 kb/s bitrate seems to be an interesting value that "sticks out" more than others, and so I may also want to analyze the values around that particular value for any other anomalies

  7. Generate several randomly encoded files in the following ranges (between 24 kb/s and 127 kb/s) (between 128 and 143 kb/s), and (between 145 kb/s and 196 kb/s) to gain confidence the decoder can decode non-standard encoded files within the specified range without having to test all 174 or so possibilities

These are not the only tests I would execute; however, they would be the first set of tests I would design and execute to make sure the code at least does what it is supposed to do. Any failure in the above cases means our basic program functionality has some serious flaws. Once I established the program does what it is supposed to do (including handling expected errors gracefully), then I would begin exploring other possibilities including rigorous falsification/negative testing.

Pradeep indicated a file encoded at 96 and 128 kb/s crashed the system. These are not boundaries conditions (unless the developer did something totally unreasonable), and unfortunately since we can only assume files encoded above and below 96 and 128 kb/s played correctly we will never really know the cause of this problem (unless Pradeep did some root cause analysis and will share those findings). However, a failure with 128 kb/s is really a red flag to me because this happens to be the most prevalent bitrate for encoding Mp3 files. As a tester I would really want to know why unit testing or build acceptance testing, etc. didn’t at least hit the most probable encoding format (the happy path) before throwing crap code over the wall for Pradeep to test.

I hope the reader takes away a few lessons from all this (besides the obvious one of not going off half cocked especially if you lack expertise in the specific context (e.g. Mp3 encodings)). For example,

  • In-depth knowledge of the domain space (including the data set, how the data is encoded, and how the data set can and cannot be manipulated in code both correctly and incorrectly), industry standards, and how the domain space interacts with the system are critical for greater test effectiveness
  • The less we know about the domain, the data set, the system interaction the less effective are the application of specific techniques to identify specific classes of defects
  • Boundary testing is simply one technique (systematic procedure to solve one type of a complex problem). The boundary value analysis technique is designed to identify a specific class of defects involving incorrectly specified constant values, incorrect use of data types, or casting between data types, artificially constraining data types, and incorrect usage of relational operators. It is not effective to identify other classes of defects.
  • Boundary conditions simply do not exist at the extreme physical ranges, there could also be multiple boundary conditions within the overall range. (A good example is the Unicode repertoire. The Unicode BMP spans from U+0000 to U+FFFF, but within this range there are several important boundaries one must take into consideration when using Unicode data depending on the purpose of the test and the application under test (e.g. Private use area, surrogate pair area, etc).)
  • Understanding how to decompose the data set into equivalence class partition subsets exposes boundary conditions we might not otherwise consider
  • There is a great deal of detail in the code that can expose interesting information for a tester
  • When talking technology, be as specific and precise as possible to avoid ambiguity
  • And perhaps most importantly, one technique or one approach to testing is not sufficient. As testers we must gather and learn to use a great variety of skills and knowledge and approach the problem from multiple perspectives to be most effective in our roles.

OK…now time to get some really bitter coffee to wash down that humble pie :-)

Written by Bj Rollison

November 12th, 2009 at 6:43 pm

More on Boundary Testing and Mp3 Encodings

with 6 comments

Originally Published Tuesday, March 06, 2007

My previous post refuting a conjecture by Pradeep Soundarajan suggesting there are no boundary values in software was a bit harshly worded, and to him and the readers I apologize. Occasionally I get a little overzealous. I am sure Pradeep is a great guy, and I must say his reply to me on his blog was rather cordial given the situation. As I told Pradeep, email and blogs are a poor medium for expressing emotion. MichaelB cautioned me about this before, but my Type A personality sometimes takes over, and it is something I need to work on.

Anyway, although my critique of Pradeep’s conjecture was pretty ruthless, the analysis was accurate and the  boundary values Pradeep suggested for testing an Mp3 file are in fact not possible or probable even using the tools he referenced. I am not an expert on Mp3 encodings or decoding technology, but i did know that Mp3 files used standard encoding bit rate formats. A little investigation quickly revealed the bit rates for audio encoding are based on multiples of 8, and the first 32-bits of an Mp3 file contain header information including 4 bits to specify the bit rate index (Layer 1, Layer 2, or Layer 3) and the bit rates as outlined below. (Thanks to Wikipedia some of the specific data I used in my initial rebuttal was incorrect, and I put a single strikethrough that part of the sentence.)

Layer 1 – 32, 64, 96, 128, 160, 192, 224, 256, 288, 320, 352, 384, 416, 448
Layer 2 – 32, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256, 320, 384
Layer 3 – 32, 40, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256, 320

In his attempt to contest my argument Pradeep said, “Pooh! I am not sure why you don’t know that bit rates can be 33, 41, 57 or any number you want to generate. I recommend you to go through some multimedia test content generation tools like ffmpeg which gives a multimedia tester an edge to generate test content of his choice.

So, not being an expert I took Pradeep’s suggestion “For those who don’t know tools like ffmpeg, it is impossible. I suggest you explore the boundary of your education on multimedia.” and went home and increased my understanding of Mp3 encodings and the ffmpeg toolset. As I read through some of the API references I found an interesting struct (illustrated below) for bit rate constants. (Now, I am thinking to myself…this is a clue! I am also thinking…hmmm…there might be some real boundary values here!)

   1: static const int sBitRates[2][3][15] =

   2: {

   3:   00085 { {  0, 32, 64, 96,128,160,192,224,256,288,320,352,384,416,448},

   4:   00086   {  0, 32, 48, 56, 64, 80, 96,112,128,160,192,224,256,320,384},

   5:   00087   {  0, 32, 40, 48, 56, 64, 80, 96,112,128,160,192,224,256,320}

   6:   00088

   7:         },

   8:   00089 { {  0, 32, 48, 56, 64, 80, 96,112,128,144,160,176,192,224,256},

   9:   00090   {  0,  8, 16, 24, 32, 40, 48, 56, 64, 80, 96,112,128,144,160},

  10:   00091   {  0,  8, 16, 24, 32, 40, 48, 56, 64, 80, 96,112,128,144,160}

  11:   00092   },

  12: // etc...

I also followed up by asking a co-worker who frequently works with ffmpeg to try to encode a Mp3 file with a bit rate of 57 kb/s. Interestingly enough, when he issued the command line parameters we got an error message indicating “Invalid Value.” (I can get a snapshot of the command window, but I really don’t think that is necessary.)

I am still not an expert on Mp3 encodings, but I am fairly certain that Mp3 file decoding algorithms are standardized across the industry. So, let’s just assume for a moment that we can encode a Mp3 file at 57 kb/s, and that file fails to play. Does it really matter? No, because industry hardware simply doesn’t support that encoding, and as long as the Mp3 player didn’t burst into flames there is no business case that would compel someone to try to make it work (at this time)? (I am not suggesting that we only test only “real-world” scenarios here, but I am suggesting that in-depth domain and system knowledge goes a long way in increasing the efficiency and effectiveness of our testing (and can lead to better identification boundary values)).

Now, perhaps I am still missing something, and as I expressed previously I am not an expert on Mp3 encodings, or the use of ffmpeg or other tools to encode Mp3 files. So, I have asked Pradeep to share his knowledge with me in this area and teach me how to encode an Mp3 file with a bit rate of 57 kb/s using a commonly used tool such as ffmpeg (there is no doubt someone can write a customized algorithm to do this), and to also let me know of a commercially available Mp3 player that will decode and play that file. (Because if it can be done I would like to learn how simply because I love to learn new things.)

Many people do assume that boundary testing is quite simple. The actual execution of boundary tests are in fact rather simple; however, discerning the boundary values in any complex software is not as simple as looking at some minimum and maximum values and trying one value below and above each boundary condition. Boundary testing is a systematic procedure to solve a specific type of complex problem (specifically the incorrect usage of data types or constant values, artificially constrained data types, and relational operators). Boundary value analysis doesn’t solve all problems, it is not the holy grail, and its efficacy relies on the testers ability to understand and decompose the data set effectively.The less the tester knows about the data and how the data is used by the program, the less effective they will be in the application of this technique.

I did not intend my previous post to be construed as a personal attack against Pradeep; I am sure he is a bright guy. But, I am challenging his assertion on boundary testing on its technical merit. I hope he replies here (or on his blog) with an example of how to encode an Mp3 file at 57 kb/s, and I will make sure it is posted (or linked) here because I am certainly curious. (I don’t really like the taste of humble pie, but I will eat it from time to time if it helps me learn.)

Written by Bj Rollison

November 12th, 2009 at 6:34 pm

The Difference Between Professional Testing and Arbitrary Guessing or Wild Speculation

with 4 comments

Originally Published Saturday, March 03, 2007

My friend and teammate Alan knows how passionate I get about certain things on occasion, so he threw me a bone the other day regarding a blog post on boundary testing, or should I say a rather a poor attempt to discredit boundary testing and boundary value analysis as a valuable testing technique. The author of the post made an attempt to diminish the value of boundary testing by stating that he has “never seen a boundary!” and asked, “As a tester have you ever seen a boundary?” Maybe that is a trick question, but I am positive the answer is YES! Let me give you a clue…32767 is a boundary value, 65535, is another one, and there are many more. Also, if I artificially constrain an int in a predicate statement using a relational operator such as if (intValue <= 0) then there is another boundary condition that I would certainly want to analyze. (That is what ISTQB, Myers, et. el. mean by Test boundary conditions on, below and above the edges of input and output equivalence classes.)

Unfortunately, the author’s “conjecture that boundary is not static in software” is simply folly, and demonstrates a lack of understanding of how to identify boundary values, or how to adequately analyze boundary values, and simply assumes that boundary testing is as simple as testing at the extremes ranges. In the post the author describes his ‘experience’ boundary testing an audio decoder that will playback MP3 files encoded between 23 kb/s and 196 kb/s. His assumption is that boundary testing would simply entail testing MP3 files at 22 kb/s, 23 kb/s, and 24 kb/s, and 195 kb/s, 196 kb/s, and 197 kb/s encoded bit rates. Not only is this a bad assumption, it is technically impossible.

The bit rates for an MP3 file using mpeg2.0 encoding are 8, 16, 24, 32, 40, 48, 56, 64, 80, 96, 112, 128, 144, 160 kb/s, and the bit rates using mpeg-1 layer 3 are 32, 40, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256 and 320 kb/s. Modern MP3 players are capable of decoding files encoded with variable bit rates; however, with the exception of LAME encoder, files encoded with variable bit rates adhere to the bit rate encoding standards established by ISO/IEC for Layer I, II, and III bit rate indexes (and even the LAME encoder does not allow bit rate increments of 1 kb/s).

Therefore, since no encoding exists for 23 kb/s, 25 kb/s, 195 kb/s, and 197 kb/s encodings these are simply impossible and suggesting tests at these values indicate a lack of domain knowledge and appears to be simple guessing at what to test. Boundary analysis implies testing on, below and above the boundary condition using actual values, not something arbitrary or made up. If the requirements indicated support for mp3 files using a minimum bit rate of 24 kb/s the min – 1 value is 16 kb/s (not 23) and the min + 1 value is 32 kb/s. (Personally, given the limited number of bit rate encodings I probably would have tested a file encoded for each standard bit rate within each bit rate index (Layer I, II, and II), and also random samples of VBR encoded files within ranges specified by the requirements. And yes, include analyzing files encoded at bit rates just above and just below the stated requirement boundaries.)

The blogger also attempts to draw a parallel between the definition of planets in our solar system and boundary values in software. But, even here the author is misinformed. The analogy to planets illustrates antiquated thought based mostly on speculation and controversy. If by chance you are interested in facts with regards to planet classification look here. (BTW…Pluto is no longer classified as a planet, so once again the assumption that Pluto is a “boundary” of some sort is due to inaccurate or imprecise information or simple guessing. ) Fortunately, I don’t think too many of us have to deal with developers who bicker over the size of a 32 bit integer value the same way astronomers argue about planet classification.

Now, of course, if developers simply changed data types or relational operators randomly throughout the code on a weekly basis, then I am all bought into the argument that there are no boundary values and boundary value analysis is not a valuable testing technique. But, on this planet (the one in the “reality” universe) where most developers are not morons who constantly change data types or relational operators, or constants then it is quite possible to identify boundary conditions and then carefully analyze the ‘possible’ values immediately above and below that boundary condition. And that is a good thing because historical analysis indicates that more than a handful of defects occur at or near boundary values.

But, testers must be aware that boundary values don’t always exist only at the extreme ranges of data types or other variables. Occasionally, there are boundary values/conditions within the minimum and maximum physical ranges of a variable. That’s specifically why ISTBQ suggests boundary testing as boundary conditions on, immediately below and immediately above the edges of input and output equivalence class subsets. Of course, that assumes the person knows how to decompose data into equivalence class subsets. This is where in-depth technical, system, and professional knowledge and skills separate the professional testers from the amateur guessers.

This is a good example why, for those who are really interested in being a professional tester and want to pursue software testing as a career, we should read a few more books on software testing, and a few less on epistemology, cognitive psychology, and metaphysics. (I am not saying these topics are not interesting or important. But, I suspect that most testers can already think for themselves, learn and understand abstract and complex thoughts and apply logic and reason.)

Written by Bj Rollison

November 12th, 2009 at 6:23 pm

Posted in Testing Practices

Tagged with

Regression Testing Strategies

with 13 comments

Originally Published Wednesday, January 10, 2007

There is a lot written about regression testing, and yet there seems to be a lot of confusion about regression testing as well. Just to make sure we are all on the same page, by regression I am referring to the denotation of the word to indicate a relapse to a less perfect or developed state (American Heritage Dictionary). So, the primary objective of regression testing is to determine whether or not modifications or additions in each new build of a product cause previous functionality to regress to a non-functioning state or error condition. It is important to note the purpose of a regression test suite is not necessarily to expose new defects. The primary purpose of a regression test is to identify changes in behavior from a previously established baseline, which is supported by Beizer’s and Myers’ definitions of regression testing.

However, even on small projects the number of tests required to ensure new builds do not regress or change previous functionality can be quite numerous. So, regression testing demands a strategy in which we limit the number of tests to establish an effective baseline measurement. In IEEE 610 documentation it states regression testing is selective retesting. Thus, the key to an effective regression testing strategy is to design a test suite that provides a high degree of confidence without retesting everything. To limit the number of tests in the regression test suite, we must systematically reduce the number of possible tests. So, we must decide what tests are included in a regression test suite?

Deciding what tests to include in the regression test suite

The most effective regression test suites I have seen include two categories of tests. The first category of tests includes high priority tests for commonly expected functionality (e.g. the 20% of the product that 80% of the customers demand or rely on). The second category of tests includes any functional defects that are found and fixed. Found and fixed functional defects are included because fixed defects do occasionally regress, and if a business decision was made previously to fix a defect then we probably want it fixed before we release the product.

Prioritized feature area/functionality buckets

The tests in the regression test suite should also be partitioned into functional areas and each test in each functional partition or bucket should also be prioritized based on risk assessment criteria.  If the regression test suite is especially large or time is limited, and the regression suite is portioned into functional areas (and those areas are mapped to the project files or modules contain that specific functionality and any dependencies) the regression test pass can execute a limited subset of tests from the regression test suite that strategically target the modules that have changed (and tests for dependent modules as well). Simple directory comparison tools (such as Diff2Dirs), and tools to identify dependencies between modules (such as Depends) are useful in identifying which modules change between builds and to map out dependencies between the modules in each build.

Automate, Automate, Automate

Also, since the regression test suite will ideally be ran on each new build, this is one suite of tests that should be 99.999% automated. Similar to the BVT/BAT test suite the purpose of the regression test suite is not necessarily to expose defects; a regression test suite provides baseline measurement of functionality. Therefore, since these are tests that will be ran several times during the software development lifecycle and are not necessarily designed to expose new defects the ROI for automation is very high. In fact, any test that cannot be automated is suspect for inclusion in the regression test suite.

These are a few ideas to develop a highly successful automation strategy. What other tactics have you found to be successful?

Written by Bj Rollison

November 12th, 2009 at 10:44 am

Allpairs, Pairwise, Combinatorial Analysis

with 3 comments

Originally Published Wednesday, October 25, 2006

Last week I went to StarWest as a presenter and as a track chair to introduce speakers. Being a track chair is wonderful because you get to interface more closely with other speakers. Anyway…one of the speakers I introduced was Jon Bach. Jon is a good public speaker, and I was pleasantly surprised that he was doing a talk on the allpairs testing technique (also known as pairwise or combinatorial analysis). I wish Jon dedicated a little more time to the specifics of the technique during his talk and was generally more aware of available tools and information  for folks to investigate further, but I think he successfully raised the general awareness and interest in pariwise testing as an effective testing technique among the audience.

Pairwise testing is one approach to solving the potential explosion in the number of tests when dealing with multiple parameters whose variables are semi-coupled or have some dependency on variable states of other parameters. For example, in the font dialog of MS Word there are 11 checkboxes for various effects such as superscript, strikethrough, emboss, etc. Obviously these effects have impact on how the characters in a particular font are displayed and can be used in multiple combinations such as Strikethrough + Subscript + Emboss. The total number of combinations of effects is the Cartesian product of the variables for each parameter, or 211 or 2048 in this example. This doesn’t include different font types, styles, etc. which also interdependent. So, you can see how the number of combinations increases rapidly especially as additional dependent parameters are included in the matrix.

The good news is the industry has a lot of evidence to suggest that most software defects occur from simple interactions between the variables of 2 parameters. So, from a risk based perspective where it may not be feasible to test all possible combinations how do we choose the combinations out of all the possibilities? Two common approaches include orthogonal arrays and combinatorial analysis.

But, true orthogonal arrays require that the number of variables is the same for all parameters. (Rarely true in software.) It is possible to create "mixed orthogonal arrays" where some combinations of variables will be tested more than once. For example, if we have 5 parameters and one parameter has 5 variables and the remains 4 parameters only have 3 variables each, we can see from the orthogonal array selector (available on FreeQuality website) the size of the orthogonal array is L25 (which basically means the test case will require 25 tests which is still significantly less than the total number of combinations of 405).

The other approach is combinatorial analysis (often referred to as pairwise or allpairs testing) because the approach most commonly used is to use a mathematical formula to reduce the total number of combinations in such a way that each variable for each parameter is tested with each variable from the other parameters at least once. In the above example, the number of tests would be reduced to 16. (Note: some tools will give slightly different results.) However, some tools (such as Microsoft’s PICT) also allow for more complex analysis of variable combinations such as triplets and n-wise coverage.

One problem that is hopefully not overlooked by testers using these tools is that some combinations of variables are simply not possible. For example, in the Effects group of the Font dialog it is impossible to check the Superscript checkbox and the Subscript checkbox simultaneously. Therefore, the tester either has to manually modify the output, or use a tool that allows constraints. Again, this is another situation where Microsoft’s free tool PICT excels. PICT uses a simply basic-like language for conditional and unconditional constraining of combinations of variables. PICT also allows weighting variables, seeding, output randomization, and negative testing.

I didn’t want this to be a PICT sales job, but alas my bias has influenced this post. So, I will conclude by pointing the readers to the Pairwise Testing website. My colleague Jacek Czerwonka has pulled together great resources on the technique of combinatorial analysis including a list of free and commercially available tools, and white papers supporting the value and practicality of this testing technique.

Written by Bj Rollison

November 11th, 2009 at 11:28 am

Exploratory Testing Examined

with 5 comments

Originally Published Sunday, September 03, 2006

Exploratory testing is a topic often embroiled in unreasonable controversy. I didn’t always understand what all of the hullabaloo is about because the simple fact is that all testers engage in exploratory testing, and have used it in software testing for decades (although the early pioneers of software testing simply called it ‘testing’). Testers used exploratory testing approaches quite successfully long before Cem Kaner coined the phrase “exploratory testing.” And testers certainly engaged in exploratory testing long before a few consultants started exploiting the phrase by promoting exploratory testing as some new-fangled testing strategy and disregarding the need for greater technical knowledge and skills in professional testers.

Some readers may comment that I have expressed disdain for exploratory testing, which is simply not correct. I use exploratory testing approaches all the time (even before it was called exploratory testing) and I realize its value in the correct context. My contention with the subject of exploratory testing has nothing to do with the approach per se. My derision is with the ‘experts’ who divide our community into an ‘us’ versus ‘them’ political debate, separate testers into different ‘schools’ of thought, and proselytize exploratory testing as the holy grail of software testing with the zeal of religious extremists.

But, after responding to some comments from Michael Bolton this week, I sat down and thought about the topic of exploratory testing for awhile. (I have never met Michael, but I think we agree on the basic tenets of software testing,  and I hope our paths cross sometime in the future so we can share our thoughts.) Michael is an articulate guy, and he said something that clicked and made all the dots connect for me. So, read on and see if the dots between exploratory testing and formal techniques connect for you.

The first dot: Checking assumptions

Michael wrote, “…testing is not merely a rote, mechanical task; that it’s not merely the application of engineering principles; that it’s not simply comparing the product with the spec; or other such simplifications.” Upon reading that I realized that exploratory testing zealots consistently either choose to ignore the value of testing techniques such as boundary value analysis or equivalence class partitioning, or they have never worked on a software testing project that required the testers to actually use their intellectual knowledge or technical skills.

I guess I have been privileged and have never had to work with or manage software testers who are simple-minded droids incapable of thinking for themselves, blindly use tools or techniques without understanding their purpose or capabilities, and have a very narrow interpretation of the role of the software tester. On the other hand, many of the ‘successful’ software testers who I have worked with and managed possessed the characteristics and traits we commonly look for during the interview process such as problem-solving and analytical thinking, precision questioning, and the capacity to learn new concepts and apply them. Early in the interview process we often dig around until we found an engineering principle or concept the interviewee is unfamiliar with, and take a few minutes to explain it to them. Later in the interview process a different person on the interview loop will ask the interviewee about that principle or concept and see if they were not only to remember it, but to process the knowledge in a manner in which they could think of innovative ways to apply it.

I assume that many good testers already have the knowledge and capacity to learn, to think critically, to analyze, to ask relevant questions, and to “understand, diagnose, and solve problems”. Therefore, as a mentor, a teacher, and a software testing professional my goal is to help novice testers unlock their intellectual horsepower and their ability to successfully exploit new ideas such as testing techniques in order to be more effective in their role as a professional software tester, and more efficient with their time.

I don’t assume that a majority of testers lack the capacity to think for themselves or incapable of logical reasoning. And, contrary to Michael’s assertion I don’t agree that “…one doesn’t have to be an expert in anything to be successful (i.e. employed) as a professional tester; one doesn’t even need to be expert in testing itself.” First, I don’t equate success with employment. If Michael’s use of the term ‘professional’ implies “a person who earns a living in an occupation frequently engaged in by amateurs” then I concur that our industry has seen its share of self-proclaimed testers who are simply keyboard monkeys or checklist drones. Fortunately, the maturation of the testing discipline is weeding out the amateurs and faux testers. So, when I write or speak of professional testers I am referring to knowledgeable, highly skilled individuals who are “expert in their work”, who “show great skill”, and who “engage in a pursuit or activity professionally.”

The second dot: Techniques mature from exploration

Exploratory testing activists state heuristics are a key component of exploratory testing. In fact, Kaner and and Tinkham state “…exploratory testing is a highly heuristic process that uses heuristics.” So, let’s talk about heuristics. To put it simply heuristics:

  • serve to indicate or point out; stimulate interest as a means of furthering investigation
  • encourage a person to learn, discover, understand, or solve problems on his or her own, as by experimenting, evaluating possible answers or solutions, or by trial and error
  • of, pertaining to, or based on experimentation, evaluation, or trial-and-error methods
  • Computers, Mathematics. pertaining to a trial-and-error method of problem solving used when an algorithmic approach is impractical
  • Of or relating to a usually speculative formulation serving as a guide in the investigation or solution of a problem
  • A rule of thumb, simplification, or educated guess that reduces or limits the search for solutions in domains that are difficult and poorly understood

Experimentation and investigation often leads to patterns or repeatable steps to replicate a desired outcome in a given situation, or systematic procedures by which a complex or scientific task is accomplished (which happens to be the definition of technique). So, can’t we say the progenitors of software testing hypothesized systematic procedures such as boundary value analysis through experimentation and evaluation? And, don’t we agree that when boundary value testing is applied correctly it accomplishes the complex task of verifying the extreme ranges of data more effectively and more efficiently than continuing to use trial and error or educated guesses? So, when a pattern of systematic procedures emerges from experimentation that proves or disproves a hypothesis within specific context our experimentation matures into a technique that can save the tester time and has been proven effective through previous investigation and trail and error.

Functional and structural software testing techniques impart consistent guidelines for highly skilled knowledgeable testers to be more effective, and improve the tester’s efficiency. Techniques are not step by step or rote procedures. In fact, the application of techniques requires a great deal of knowledge and skill.

Educated guessing, trial and error, experimentation, and speculation are tribal knowledge. Attrition of the testers with the knowledge will cause a skill or performance gap in the organization. So, as long as companies continue to hire unqualified, sub-standard keyboard monkeys then the consultants who advocate exploratory testing as the Tao can continue to sell their exploratory testing medicine show. But, I agree with Michael that software testing is a “is a deeply intellectual and challenging activity.” And unfortunately for the exploratory snake oil salesmen many companies also understand that and the caliber of testers hired today is much higher than in years past. The era of hiring the butcher, the baker, and the candle-stick maker to find bugs is over. Companies such as Microsoft, Google, Freddie Mac, and others are targeting knowledgeable, technically skilled individuals as professional testers.

Drawing the line between the dots: Stop fighting and play nicely!

Michael described the testing role quite eloquently stating, “… testing is the process of investigating the product to reveal quality-related information about it, for people to whom that information might matter.” At the end of the day, there are multiple approaches and techniques required to examine or inspect any complex software project to accurately and adequately assess quality attributes and exposure to risk. Simply put, no single approach to software testing is sufficient. Exploratory testing and functional and structural software testing techniques are only some of the tools in the repertoire of the professional tester. They are all valuable when employed in the appropriate context. (BTW…everything has context because nothing occurs in a vacuum.) There is no us versus them, there are not 4 schools of testing, and since everyone does exploratory testing to some degree doesn’t that make everyone an "exploratory tester?" (Of course, if exploratory testing is the only approach to assessing the capabilities and attributes of a software project, then the testing strategy is probably not very mature, and numerous studies prove that approach leaves a lot of holes (untested areas of the product = 100% exposure to risk)).

I think myself, Michael, and other notable professionals in the field are exposing the need for highly qualified, smart, and innovative testers in the industry because testing is extremely challenging. We are also commonly aligned in our desire to teach bright and creative individuals additional testing skills to be more effective as professional software testers.  In essence aren’t we just building different walls of a sand castle? Can’t we just get along and play nicely in the same sandbox? Or, do we have to choose sides?

Written by Bj Rollison

November 11th, 2009 at 10:49 am

Posted in Testing Practices

Tagged with

Build Verification Testing (BVT)

without comments

Originally Published Friday, July 07, 2006

Have you ever received a build into testing which was missing files, had the wrong file, or a file with the incorrect language version? These and other problems associated with the file properties of each new build usually result from a improperly designed build verification test (BVT) suite. The primary purpose of the BVT is to validate the integrity of each new build of a project . Some teams combine the BVT along with the Build Acceptance Test (BAT) which is not necessarily a bad approach if the test team owns the BVT. However, I have found in these situations the test team usually places more emphasis on the BAT to verify basic functionality of the build rather than the BVT which actually validates the build and thus misses critical problems with the build itself.

Ideally, the person or team that builds the project should be responsible for validating each new build before releasing it to build acceptance testing, but the test team should have oversight into what is being checked. It does no good to run a new build through a BAT process only to discover a missing file, or a German help file in a Japanese language version of the project. At a minimum the BVT should check every file in each new build for:

  • Correct version information
  • Correct time and date stamps
  • Appropriate file flags
  • Correct cyclic redundancy check (CRC) keys
  • Correct language information
  • ISO 9660 file naming conventions
  • Viruses

Changes in the build can help the test team focus on critical areas, or identify areas of the project that should be revisited. Other information a BVT should provide to the test team includes:

  • New files added to the project build
  • Files removed from the project build
  • Files with binary changes

It might also be a good idea to scan file names for public acceptability. For example, I remember one project in which a clever developer decided to call his new library ‘sexme.dll.’ This file name was released in the shipped product and several customers objected.

Even on large projects with several hundred files the tests to validate the integrity of a build would not take more than a few minutes using automated processes. The few minutes spent validating each new build is well justified considering the cost of releasing an build into testing and discovering the build is invalid (even though it may have passed the BAT) because then you have to question the validity of the test effort on that build.

Written by Bj Rollison

November 10th, 2009 at 12:36 am