I.M. Testy

Treatises on the practice of software testing

Archive for November, 2009

Programming Paradigms in Test Automation

with 8 comments

Originally Published Thursday, May 14, 2009

Regardless of the personal opinions of a few people, the simple fact is that the demand for software testers who can design and develop effective test automation is increasing. Perhaps one reason for the distain by some folks in the industry is due to limitations of the test automation approach they are most familiar with, and they sometimes assume those limitations apply to all types of test automation. However, not all test automation approaches are equal, and there are advantages and disadvantages for any approach.

At its core an automated test case is software code, and similar to the various approaches used in developing product software there are different programming paradigms used to develop test automation such as:

  1. Record and playback automation
  2. Keyword or action-word driven automation
  3. Scripted automation
  4. Procedural automation
  5. Model based automation

Record and playback automation

The record and playback paradigm simply records sequences of keyboard and mouse events, auto-magically codifies them usually into some proprietary scripting language which can then be replayed (executed) over and over again. There are usually severe limitations to this type of automation and it tends to be extremely fragile requiring constant massaging (re-recording). Although many record/playback paradigm allows ‘test developers’ to modify the scripted actions to some extent, and possibly even incorporate simple yes/no oracles I think many people view the record/playback paradigm as being slightly more useful than trained monkeys in limited situations.

Keyword or action-word driven automation

Keyword or action-words are simple scripts usually in some tabular format that ‘describe’ a sequence of ‘actions’ for the computer to perform. Of course, the key to keywords is the underlying architecture of the tool that interprets the keywords and executes the sequence of events. The beauty of keyword driven testing is that it hides the actual code, and similar to record and playback can be more easily used by business analysts or ‘user domain experts’ hired to into testing roles to automate something. I do see the benefit of keyword driven testing in some limited contexts (especially for companies who rely on business analysts/user domain experts for testing software), but let’s be real…these people aren’t automating anything…they are simply filling out a form that is then feed into a tool that performs the actions as prescribed by the listed instructions. The keyword form does nothing by itself, and the only thing a ‘tester’ has to think about is using the correct key words to sequentially get from point A to point Z for a ‘test.’

Scripted automation (imperative programming)

The primary difference between keyword and scripted automation is the tester actually develops the test in a programming language rather than filling in a form with abstracted key words that drive some automation engine. However, similar to keywords, scripted automation tends to use rudimentary statements of basic instructions that manipulate the software to perform a pre-determined sequence of events as illustrated below.

   1: def test_b_googlenews

   2: #-------------------------------------------------------------------------

   3: # Test to demonstrate WATIR select from drop-down box functionality

   4: #

   5: #variables

   6: test_site = 'http://news.google.com'

   7: puts '## Beginning of test: google news use drop-down box'

   8: puts '  '

   9: puts 'Step 1: go to the google news site: news.google.com'

  10: $browser.goto(test_site)

  11: puts '  Action: entered ' + test_site + ' in the address bar.'

  12: puts 'Step 2: Select Canada from the Top Stories drop-down list'

  13: $browser.select_list( :index , 1).select("Canada English")

  14: puts '  Action: selected Canada from the drop-down list.'

  15: puts 'Step 3: click the "Go" button'

  16: $browser.button(:caption, "Go").click

  17: puts '  Action: clicked the Go button.'

  18: puts 'Expected Result: '

  19: puts ' - The Google News Canada site should be displayed'

  20: puts 'Actual Result: Check that "Canada" appears on the page by using an assertion'

  21: assert($browser.text.include?("Canada") )

  22: puts '  '

  23: puts '## End of test: google news selection'

  24: end # end of test_googlenews

  25: def test_c_googleradio

Most examples of scripted automation appear as codified versions of a set of steps listed in a less-than-adequately designed manual test case using hard-coded arguments for variables, mindless progression between steps, and simple deterministic oracles. Scripted automation is probably most beneficial for automating specific sub-tasks in "computer assisted testing." However, scripted automation is usually too prescriptive, and rely heavily on nothing going wrong during the execution of the test case.

Procedural automation (procedural programming)

In procedural automation the testers also develops a test by writing a  series of computational steps to achieve a desired purpose. However unlike scripted automation the procedural automation paradigm generally provides better control flow options during the execution of the automated test case, allows for greater complexity in the design, improves reuse and reduces maintenance through modularity, and can employ both deterministic and heuristic oracles.

   1: // Procedural programming example

   2: static void Main(string[] args)

   3: {

   4:   string logResult = string.Empty;

   5:   // Path to the data file passed as a string argument to the test case

   6:   string pictTestData = args[0];

   7:   //Stopwatch to measure test case duration

   8:   Stopwatch sw = new Stopwatch();

   9:   sw.Start();

  10:   // Launch the AUT

  11:   AutomationElement desktop = AutomationElement.RootElement;

  12:   AutomationElement myAutForm = null;

  13:   Process myProc = new Process();

  14:   myProc.StartInfo.FileName = myConstantAutFileName;

  15:   if (myProc.Start())

  16:   {

  17:     // Polling loop to find AUT window by window property

  18:     int pollCount = 0;

  19:     do

  20:     {

  21:       myAutForm = desktop.FindFirst(TreeScope.Children,

  22:         new PropertyCondition(AutomationElement.AutomationIdProperty,

  23:           myConstantAUTPropertyID));

  24:       pollCount++;

  25:       System.Threading.Thread.Sleep(100);

  26:     }

  27:     while (myAutForm == null && pollCount < 50);

  28:  

  29:     if (myAutForm == null)

  30:     {

  31:       throw new Exception("Failed to find dialog");

  32:     }

  33:  

  34:     // Get UI element collection here...

  35:     // Call method to read in test data

  36:     string[] testData = ReadTabDelimitedFile(pictTestData);

  37:     // iterate through each set of test data (data-driven test example)

  38:     foreach (string test in testData)

  39:     {

  40:       // Call method to execute each set of test data and assign the return

  41:       // value to the logResult variable; Oracle is separate method called 

  42:       // from the test method

  43:       LogResultMethod(ExecuteCombinatorialTestMethod(test));

  44:     }

  45:  

  46:     // close AUT and clean-up

  47:     TimeSpan ts1 = sw.Elapsed;

  48:     // log test case duration...

  49:     // Deal with situation if AUT failed to launch

  50:   }

  51: }

Procedural automation can be used for anything from API to GUI automated test cases designed to evaluate functionality (computational logic), non-functional areas such as stress, performance, and security, and also behavioral  . Using a language similar to the programming language removes abstraction layers, and also enables other members of the team (developers) to easily review test cases.

Model based automation

Model based automation is a relatively new automation paradigm, and its complexity is beyond the scope of this single post. Basically, model based automation involves codifying abstracted state machines and state traversals and couples these parts with an automation engine that uses some form of graph traversal logic to drive the system under test between the various state machines identified in the model. In some sense model based automation is similar to exploratory testing because tests are generally not pre-determined or pre-scripted, what constitutes a single test is really hard to describe, and the oracles generally detect errant behavior (or being in an unexpected state). Personally, I think there is tremendous potential in model based automation, but the industry has just begun to scratch the surface of this automation paradigm and it is still largely misunderstood. This automation paradigm requires more complex skill sets of the person designing the test automation such as the ability to abstract important machine states as a model, and encode system behaviors. For more information about model based automation I recommend taking a look at http://research.microsoft.com/en-us/projects/specexplorer.

So, approach which is best?

In my opinion, there may be some limited value in record/playback, keyword, and scripted automation in specific contexts; however, a robust automated test case that will run on multiple environments, multiple languages, and distributed across multiple platforms without rewriting the test for each variation requires well designed tests developed using procedural automation or model based automation approach.

Written by Bj Rollison

November 18th, 2009 at 10:21 pm

Posted in Test Automation

Tagged with

Assessing Tester Performance

with 8 comments

Originally Published Tuesday, April 28, 2009

Using context-free software product measures as personal performance indicators (KPI) is about as silly as pet rocks!

Periodically a discussion of assessing tester performance surfaces on various discussion groups. Some people offer advice such as counting bugs (or some derivation thereof), number of tests written in x amount of time, number of tests executed, % of automated tests compared to manual tests, and (my one of my least favorite measures of individual performance) % of code coverage.

The problem with all these measures is they lack context, and tend to ignore dependent variables. It is also highly likely that an astute tester can easily game the system and potentially cause detrimental problems. For example, if my manager considered one measure my performance on the number of bugs found per week, I would ask how many I had to find per week to satisfy the ‘expected’ criteria. Then each week I would report 2 or 3 more bugs than the ‘expected’ or ‘average’ number (in order to ‘exceed’ expectations), and any additional bugs I found that week, I would sit on and hold in case I was below my quota the following week. Of course, this means that bug reports are being artificially delayed which may negatively impact the overall product schedule.

The issue at hand is this bizarre desire by some simple-minded people who want an easy solution to a difficult problem. But, there is no simple formula for measuring the performance of an individual. Individual performance assessments are often somewhat subjective, and influenced by external factors identified through Human Performance Technology (HPT) research such as motivation, tools, inherent ability, processes, and even the physical environment.

A common problem I often see is unrealistic goals such as "Find the majority of bugs in my feature area." (How do we know what the majority is? What if the majority doesn’t include the most important issues? etc.) Another problem I commonly see is for individuals to over-promise and under-deliver relative to their capabilities. I also see managers who dictate the same identical set of performance goals to all individuals. While there may be a few common goals, as a manager I would want to tap into the potential strengths of each individual on my team. I also have different expectations and levels of contributions from individuals depending on where they are in their career, and also based on their career aspirations.

So, as testers we must learn to establish SMART goals with our managers that include:

  • goals that align with my manager’s goals
  • goals that align with the immediate goals of the product team or company
  • and stretch goals that illustrate continued growth and personal improvement relative to the team, group, or company goals

(This last one may be controversial; however, we shouldn’t be surprised to know individual performance is never constant in relation to your peer group. )

But, (fair or not) for a variety of reasons most software companies do (at least periodically) evaluate their employee performance in some manner, the key to success is in HPT and agreeing on SMARTer goals upfront.

Written by Bj Rollison

November 18th, 2009 at 10:07 pm

Posted in Test Management

Tagged with

"Good Enough" Is Not Good Enough!

with 8 comments

Originally Published Friday, April 17, 2009

This week I came across a discussion [regarding test design] in which a tester wrote, "…the main goal is having something that is ‘good enough’." Every time I hear a tester utter the phrase "good enough" my head wants to explode!

Wrapping duct tape around a splint on the broken handle on my hoe is good enough to finish the job until I can go buy a new handle. While I may sometimes temporarily improvise a "good enough" solution; I am never truly satisfied with good enough, and I personally aspire to be better than good enough. My father always told me if something was worth doing, I should do it right! He also raised me to always put forth my best effort, and constantly strive to improve myself.

I seriously can’t think of any professional (in any discipline) who seriously considers good enough to ever really be good enough? The "good enough" argument is the ultimate cop out! In my opinion "good enough" epitomizes an unprofessional, apathetic attitude sanctioning mediocrity.

From a job performance perspective I suspect that if we told our employers that we were going to simply design and execute tests that are "good enough" we probably wouldn’t be in a job very long. I certainly would not want people on my team who are satisfied with "good enough;" I want people who want to do their best, and to strive for better!

I spent some time in the US Air Force and we often used the phrase "it’s good enough for government work" to describe slop-shoddy work. So, it amazes me that some people seem to be satisfied by consciously condoning ignominious practices. But, I guess some people are taught to expend just enough effort to be good enough!

In my opinion, good enough may be "good enough for government work" or for individuals who don’t have a vested interested in helping organizations improve, or who don’t really care about improving themselves; but, there is no room for the slovenly "good enough" mentality among professional testers.

Written by Bj Rollison

November 18th, 2009 at 10:03 pm

Test Automation: Look Below the UI for More Effective and Robust UI Automated Test Case Designs

without comments

Originally Published Tuesday, April 14, 2009

Last month I wrote about simplistic views of UI test automation in which some people want to pretend that recording for playback or scripting hard-coded actions and data to mimic some human’s interactions at the keyboard is an automated test. Balderdash! Automating a set of sequences or preconceived steps simply for the sake of automating or preparing an environment is perhaps what Kaner, et. el. mean when they refer to computer assisted testing; however, computer assisted testing is not the same as a well designed automated test. (And yes, computers are very good tools for completely automating some types of tests quite effectively; including the oracle.) We see a lot of computer assisted testing in UI automation projects. I suspect this occurs because people are focused on trying to automate a test the same way they or an end-user would interact with the computer rather than design the automated test to evaluate an important attribute or capability of the software in order to provide significant information to the project team and add value to the testing process.

Personally, I am not a big fan of UI automation because it is usually done poorly, and it is usually very fragile and needs constant massaging; more so than test automation that runs below the UI layer. Also, I see a lot of misuse of UI automation. For example, I recently came across a comment by one fellow that wrote, “UI Automation is not necessarily meant for testing the UI (though, we use it for that also).” What??? I do understand the need for UI automation in the testing process, and done well it can provide tremendous benefit and free up my time to actually design new tests and think more critically about what has and has not been tested. But, when I automate through the UI my test cases are primarily testing behavioral aspects of the software (end-user scenarios for example) and that UI elements call the appropriate event handlers. While UI automation can be used to test functional capabilities also, it is generally not the best approach for robust functional testing. This is especially true when the automated UI test is over-loaded with excess baggage (manipulating UI elements not directly associated with the purpose of a test). The more baggage a UI test carries, the greater the potential for maintenance nightmares.

For example, not too long ago a tester was performing international sufficiency testing of his component to ensure his feature supported multiple national conventions and custom formats supported by Windows National Language Support (NLS) APIs. He knew the steps to manipulate the national conventions and custom formats required the user to click the Start menu, select Control Panel, then click on the Regional and Language Options control panel applet, click the Customize button, select the appropriate property sheet for the national convention he wanted to customize the setting for, and finally click the OK button the the Customize dialog and the Regional Settings dialog, and verify the results. Lather, rinse, and repeat as necessary!

To make matters more complicated the sequence of steps to change these settings are slightly different between Windows Xp and Windows Vista and we certainly don’t want to write 2 separate test cases, or branch the test code depending on the operating system in this case. Complexity cultivates complication; especially with UI automation! Fortunately, this fellow also knew that essentially all underlying functionality can be accessed via Windows APIs, and that is exactly the information he was looking for. In this situation I suggested he look at the SetLocaleInfo function and within minutes he incorporated that function to efficiently resolve his problem, and his automated test was capable of testing his application on any currently supported version of the Windows operating system.

In C# automation, we can use Process Invocation Services to PInvoke this Win32 API function from Kernel32.DLL as illustrated below

   1: namespace TestingMentor.PInvokeSample

   2: {

   3:   using System;

   4:   using System.Runtime.InteropServices;

   5:  

   6:   /// <summary>

   7:   /// This class contains Native Win32 API functions that are marshalled

   8:   /// over for use in C#

   9:   /// </summary>

  10:   class NativeMethod

  11:   {

  12:     /// <summary>

  13:     /// Sets an item of information in the user override portion of the

  14:     /// current locale. This function does not set the system defaults.

  15:     /// </summary>

  16:     /// <param name="locale">the locale identifier of the locale with the

  17:     /// code page used </param>

  18:     /// <param name="localeType">Type of locale information to set.</param>

  19:     /// <param name="localeData">A null-terminated string containing the

  20:     /// locale information to set</param>

  21:     /// <returns>Returns true if successful; otherwise false</returns>

  22:     [DllImport("kernel32.dll", CharSet = CharSet.Auto, SetLastError = true)]

  23:     public static extern bool SetLocaleInfo(

  24:       uint locale,

  25:       uint localeType,

  26:       string localeData);

  27:  

  28:     /// <summary>

  29:     /// Sets an item of information in the user override portion of the

  30:     /// current locale. This function does not set the system defaults.

  31:     /// </summary>

  32:     /// <param name="locale">the locale identifier of the locale with the

  33:     /// code page used </param>

  34:     /// <param name="localeType">Type of locale information to set.</param>

  35:     /// <param name="localeData">An integer value representing the locale

  36:     /// information to set</param>

  37:     /// <returns>Returns true if successful; otherwise false</returns>

  38:     [DllImport("kernel32.dll", SetLastError = true)]

  39:     public static extern bool SetLocaleInfo(

  40:       uint locale,

  41:       uint localeType,

  42:       int localeData);

  43:   }

  44: }

The argument values that we can pass to these functions are enumerated in a separate class similar to the one below

   1: namespace TestingMentor.NlsInfo

   2: {

   3:   /// <summary>

   4:   /// Constant values for SetLocaleInfo API

   5:   /// </summary>

   6:   class NlsConstant

   7:   {

   8:     public enum Locale : uint

   9:     {

  10:       Invariant = 0x007F,

  11:       SystemDefault = 0x0800, // use system default for setlocaleinfo

  12:       UserDefault = 0x0400,

  13:       Neutral = 0x0000,

  14:       CustomDefault = 0X0C00, // Vista and later

  15:       CustomUiDefault = 0x1400, // Vista and later

  16:       CustomUnspecified = 0X1000  // Vista and later

  17:     };

  18:     

  19:     public enum LocaleType : uint

  20:     {

  21:       // VALUE LCDATA TYPES

  22:       CalendarType = 0x00001009,  // type of calendar specifier

  23:       CurrencyDigits = 0x00000019,  // local monetary fractional digits

  24:       CurrencySymbol = 0x0000001B,  // position of positive currency symbol

  25:       FractionalDigits = 0x00000011,  // number of fractional digits

  26:       NativeDigitSubstitution = 0x00001014,  // native digit substitution

  27:       FirstDayOfWeek = 0x0000100C,  // first day of week specifier

  28:       FirstWeekOfYear = 0x0000100D,  // first week of year specifier

  29:       LeadingZeros = 0x00000012,  // leading zeros for decimal

  30:       Measure = 0x0000000D,  // 0 = metric, 1 = US

  31:       NegativeCurrency = 0x0000001C,  // negative currency mode

  32:       NegativeNumber = 0x00001010,  // negative number mode

  33:       PaperSize = 0x0000100A,  // paper size

  34:       TimeFormat = 0x00000023,  // time format specifier

  35:       

  36:       // STRING LCDATA TYPES

  37:       // Valid Unicode characters

  38:       AM = 0X00000028,  // AM designator

  39:       PM = 0x00000029,  // PM designator

  40:       CurrencySymbol = 0x00000014,  // local monetary symbol

  41:       DecimalSeparator = 0x0000000E,  // decimal separator

  42:       DigitGrouping = 0x00000010,  // digit grouping

  43:       ListSeparator = 0x0000000C,  // list item separator

  44:       LongDate = 0x00000020,  // long date format string

  45:       MonetaryDecimalSeparator = 0x00000016,  // monetary decimal separator 

  46:       MonetaryGrouping = 0x00000018,  // monetary grouping 

  47:       MonetaryThousandSeparator = 0x00000017,  // monetary thousand separator

  48:       NativeDigits = 0x00000013,  // native ascii 0-9 

  49:       NegativeSign = 0x00000051,  // negative sign

  50:       PositiveSign = 0x00000050,  // positive sign

  51:       ShortDate = 0x0000001D,  // short date format string

  52:       ThousandSeparator = 0x0000000F,  // thousand separator

  53:       TimeSeparator = 0x0000001E,  // time separator

  54:       TimeFormat = 0x00001003,  // time format string

  55:       YearMonthFormat = 0x00001006   // year month format string

  56:     };

  57:     

  58:     public enum LocaleData : int

  59:     {

  60:       // LOCALE_ICALENDARTYPE VALUES

  61:       Gregorian = 1, // Gregorian (localized)

  62:       GregorianUS = 2, // Gregorian(Always English)

  63:       GregorianMEFrench = 9, // Middle East French

  64:       GregorianArabic = 10,

  65:       GregorianXlitEnglish = 11, // transliterated English

  66:       GregorianXlitFrench = 12, // transliterated French

  67:       Japan = 3,

  68:       Taiwan = 4,

  69:       Korea = 5,

  70:       Hijri = 6,

  71:       Thai = 7,

  72:       Hebrew = 8, 

  73:       Umalqura = 23, // Um Al Qura (Arabic lunar) Vista or later 

  74:       

  75:       // LOCALE_ICURRENCY

  76:       PositiveCurrencyPrefixNoSeparation = 0,

  77:       PositiveCurrencySuffixNoSeparation = 1, 

  78:       PositiveCurrencyPrefixSeparation = 2, // one character separation

  79:       PositiveCurrencySuffixSeparation = 3, // one character separation  

  80:       

  81:       // LOCALE_IDIGITSUBSTITUTION

  82:       DigitSubstitutionContextBased = 0,

  83:       DigitSubstitutionNone = 1, // use this setting for full unicode support

  84:       DigitSubstitutionNative = 2, // uses digits based on national conventions

  85:                                    // according to LOCALE_SNATIVEDIGITS

  86:       

  87:       //LOCALE_IFIRSTDAYOFWEEK 

  88:       Monday = 0, // LOCALE_SDAYNAME1 

  89:       Tuesday = 1, // LOCALE_SDAYNAME2

  90:       Wednesday = 2, // LOCALE_SDAYNAME3

  91:       Thursday = 3, // LOCALE_SDAYNAME4 

  92:       Friday = 4, // LOCALE_SDAYNAME5

  93:       Saturday = 5, // LOCALE_SDAYNAME6

  94:       Sunday = 6, // LOCALE_SDAYNAME7

  95:       

  96:       // LOCALE_IFRISTWEEKOFYEAR

  97:       FirstDay = 0, // Week containing 1/1 even if single day

  98:       FirstFullWeek = 1, // first full week following 1/1

  99:       FirstWeek = 2, // first week with at least 4 days after 1/1

 100:       

 101:       // LOCALE_ILZERO

 102:       NoLeadingZero = 0,  // .975 119:   

 103:       LeadingZero = 1,     // 0.975 

 104:       

 105:       // LOCALE_IMEASURE 

 106:       Metric = 0, 

 107:       US = 1, 

 108:       

 109:       // LOCALE_INEGCURR

 110:       ParenthesisSymbolNumber = 0,   // ($1.1)

 111:       NegativeSignSymbolNumber = 1,  // -$1.1 

 112:       SymbolNegativeSignNumber = 2,  // $-1.1

 113:       SymbolNumberNegativeSign = 3,  // $1.1- 

 114:       ParenthesisNumberSymbol = 4,   // (1.1$)

 115:       NegativeSignNumberSymbol = 5,  // -1.1$

 116:       NumberNegativeSignSymbol = 6,  // 1.1-$

 117:       NumberSymbolNegativeSign = 7,  // 1.1$-

 118:       NegativeSignNumberSpaceSymbol = 8,  // -1.1 $

 119:       NegativeSignSymbolSpaceNumber = 9,  // -$ 1.1

 120:       NumberSpaceSymbolNegativeSign = 10,  // 1.1 $- 

 121:       SymbolSpaceNumberNegativeSign = 11,  // $ 1.1-

 122:       SymbolSpaceNegativeSignNumber = 12,  // $ -1.1

 123:       NumberNegativeSignSpaceSymbol = 13,  // 1.1- $ 

 124:       ParenthesisSymbolSpaceNumber = 14,  // ($ 1.1) 

 125:       ParenthesisNumberSpaceSymbol = 15,  // (1.1 $) 

 126:       

 127:       // LOCALE_INEGNUMBER

 128:       Parenthesis = 0,  // (1) 

 129:       NegativeSignNumber = 1,  // -1 

 130:       NegativeSignSpaceNumber = 2, // - 1

 131:       NumberNegativeSign = 3,  // 1- 

 132:       NumberSpaceNegativeSign = 4,  // 1 - 

 133:       

 134:       // LOCALE_PAPERSIZE

 135:       USLetter = 1, 

 136:       USLegal = 5,

 137:       A3 = 8,

 138:       A4 = 9,

 139:      

 140:       // LOCALE_ITIME

 141:       FormatAM_PM = 0,

 142:       Format24Hour = 1

 143:     };

 144:   } 

 145: }

You see, manipulating the Regional Options settings through the user interface had nothing to do with the purpose of his test; it was whether or not those changes in the NLS settings were propagated to the application under test, and whether the resultant output displayed correctly. The oracle to verify the output in this case was simply reading the string from the appropriate control in the application and comparing each character code point value with the expected character. For example, one test changed the date format from dd/mm/yyyy to yyyy-MM-­­dd. The automated oracle verified the year, month and day  values in the correct format and order, and also checked whether the date separator characters in the 4th and 7th position in the string were Unicode values U+002D in this example (or other randomly generated Unicode character value(s)). This automated test was able to test and verify 31 different customizable NLS settings with multiple variables per setting to satisfy basic international sufficiency of this tester’s feature in a fraction of the time it would require a human, and with greater precision. Of course, this assumes that as a tester you have an in-depth understanding of the “system” on which you are tasked to test, and capable of designing effective tests from perspectives other than that of the end-user.

I try to constantly emphasize the emerging role of a software tester primarily focuses on analysis and design; analysis of the “system”, the tests, and the results of tests, and the design of effective tests with reasoned purpose and well defined goals.  Professional testers provide value by enriching their organization’s intellectual knowledge repository and ultimately resolving hard problems. But, we can’t start to resolve the hard problem of effective UI test automation by perpetuating the medieval mentality that UI automation is merely mindlessly mimicking the clicks and  keystrokes through the user interface because we don’t understand how the system works below the surface, or we can’t think intelligently about effective oracles capable of interpreting the results for some of our automated tests. The persistent prophets of pestilence will perpetually pule, but fortunately I see more and more professional software testers stepping up to meet increasingly complex technological challenges head on with increasing success. As I have said before, the only problems we can’t solve are those which we have not yet devised a solution.

Written by Bj Rollison

November 18th, 2009 at 10:02 pm

The Quality Quandary

without comments

Originally Published Friday, March 27, 2009

I often find discussions about quality to be hypothetical, and in fact unless you define your specific context the word itself is nebulous, vague, or simply meaningless philosophical psycho-babble. For a while now, I previously posted my opposition to the simplistic notion that quality is value to some person. Sure, most thesaurus’ equate the two words as synonyms, and the context-driven posse constantly regurgitate Weinberg’s  quote about "quality is value to some person." Yes, that is one perspective of quality, but only one.

In the past I have taught that quality is not value in the context as one of the goals of software testing. My definition of value is the purpose or usefulness of a product in satisfying a customer’s needs or wants. My definition of quality was based on tangible aspects of the attributes or capabilities of a product from an engineering perspective. Our definition of software testing as any task designed to evaluate or assess the attributes and capabilities of a software project in relation to implicit or explicit guidelines in order to provide information to the management team (the people who make the business decisions). Those evaluations result in measurements we call quality measurements or criteria (the "essential or distinctive characteristics, properties, or attributes" that are critical for the success of our project) and that is part of the information we present to the decision makers to help them make more informed business decisions. Remember, Weinberg also stated "Thinking about measurability from the beginning is an essential part of creating a well-formed effort."

The other day I met with members of one of our research teams and they were talking about quality in terms of tenet and non-tenet quality; and I clarified that basically they were essentially talking about the customer perceptions of quality versus the engineering aspects of quality. Surely, from a holistic point the observation and reliable measurements of both perspectives of quality are important for the success of any organization. Customers usually buy/download software because it helps them satisfy some need or want. The decision of which software product to buy made be based on personal bias, or perhaps the herding instinct, or it may be more rational based on the comparison of features (measureable attributes and capabilities). Once the customer begins to use the software they form their own opinions based on expectations and/or previous experiences that provides the company with information regarding customer satisfaction.

So, the next time someone starts talking about quality stop them and ask them if they are talking about the engineering aspects of quality, or the customer perceptions of quality. They are closely related, but different perspectives of the same topic.

Written by Bj Rollison

November 18th, 2009 at 8:28 pm

Posted in General Testing Topics

Tagged with

Exploratory Testing Inside The Box

without comments

Originally Published Friday, March 20, 2009

Much of the information about exploratory testing focuses on testing from an end-user perspective. Pundits of exploratory testing claim the approach is also useful from a white box test design approach, but I have yet to see any practical discussion or examples. But, professional testers use exploratory testing approaches all the time from a white box perspective to explore the code for untested paths. Professional testers learn about areas of the code that are at risk, and reactively design effective tests to evaluate previously untested or under-tested areas of the code.

Let’s use a simple example to get started. Suppose we had to drive from Lynnwood, Washington to Puyallup, Washington without a map or (GPS auto navigation system). Just as we have ‘clues’ to point us in the various directions while performing exploratory testing at the user interface we have the numerous highway signs to help us navigate various routes to complete our journey. And, it is up to us to decide which route to take. The shortest route is I-405 south to SR-167. But, I-405 is always at a stand-still, so another popular route is I-5 to SR-18 east then SR-167 south. Of course, after traversing those routes a couple of times the scenery (and crawling in traffic) gets a bit boring, so we might find additional less travelled routes. But, regardless of how many times we make the journey or how many different drivers we choose to complete this journey it is highly unlikely that we will traverse every possible route in any reasonable amount of time. Some routes may not be obvious such as I-5 south to Seattle, then taking the ferry from Seattle to Bremerton and continuing to SR-310 south to SR-16, then I-5 north to SR-167. And, of course some routes are impossible (or at least so convoluted they would be improbable).

map

Fortunately, control flow through even complex algorithms is not as labyrinthine as the state roadways in western Washington. And, just as the department of transportation uses various tools to measure traffic volumes testers can use path profiling tools to measure frequently traversed paths through the code. We can also use code coverage tools to see what paths have or have not been traversed, and which decisions are made at branching statements. Using code coverage and profiling tools to map control flow through the algorithm we are able to more thoroughly explore the code. Using our ‘map’ we can learn what paths have not been traversed and even whether or not certain paths through the code are even possible. After we explore the ‘map’ we can more effectively design additional tests to traverse un-tested paths through an algorithm. Common structural test design techniques include  to evaluate code statements, code blocks, simple decisions or branches, or multiple Boolean conditional clauses in a single predicate statement. Then, using those test designs we can execute those tests either using stubs or mock objects at the unit or component level, or through the user interface to traverse those paths to reduce overall susceptibility to risk.

I discuss the various techniques commonly used in structural testing in Chapter 6 of our book How We Test Software At Microsoft, and also address the subject here, and here. Of course, the application of structural techniques is usually referred to as code coverage analysis. But, using this simple analogy hopefully other testers can begin to understand how exploratory testing approaches are used not only from the user interface, but also below the GUI at the code level. As Boris Beizer initially stated, "all testing is essentially exploratory in nature," and code coverage analysis (analyzing code coverage results to learn about, design additional tests, then execute those tests) also makes great use of exploratory approaches inside the box.

Written by Bj Rollison

November 18th, 2009 at 8:26 pm

Posted in Testing Practices

Tagged with

GUI Test Automation Is Not Child’s Play

with 5 comments

Originally Published Thursday, March 12, 2009

There are many approaches to test automation from unit testing to system level testing through the GUI. Of course, the most often discussed approach is the automation approach that drives the GUI to perform some action; or GUI automation. This also happens to be the most controversial approach to test automation, and is perhaps the hardest type of automation to design and develop. One reason why an automated GUI test fails or doesn’t achieve its potential is due to a lack of understanding of the "system" by the tester, which in turn leads to a poorly designed test from the outset.

This problem is especially obvious when people who may have specialized business knowledge but lack a in-depth understanding of the systems they are working on (non-technical testers) are asked to ‘automate’ something. The automation in this case is usually in the form of using record/playback tools or perhaps creating a rote script to drive a keyword-driven framework, and the ‘test’ is usually consists of nothing more than merely mimicking some contrived behavior by the ‘tester.’ In fact, this over-simplified view of test automation is sometimes perpetuated by tool vendors. A manager at one tool vendor said, "By automatically capturing the tester’s process and documenting their keystrokes and mouse clicks, the tester will have more time for the interesting, value-add activities, such as testing complex scenarios that would be difficult or impossible to justify automating."

There are 2 fundamental problems with the above quote, and this approach to ‘automation’ (and I use that term loosely in this context). First, while I don’t totally discount the value of record/playback, and in the right context it could very well be the best approach in a specific situation, the general consensus in the industry is that due to the limited capabilities of record/playback type tools this type of automation is simply one level above using a hoard of monkeys trained to repeat a set of sequences using the keyboard and mouse. In fact, some might suggest trained monkeys may be a better alternative because bananas are much cheaper than the costs of licensing a tool and then realizing that you have to hire someone to try to patch together some proprietary script in an often vain attempt to build an automation test suite that is beyond the reasonable capabilities of a record/playback automation approach. Unfortunately, in either case the organization is usually left with a horrible mess that nobody wants to clean up. Secondly, it makes a ridiculous assumption that testers are too stupid to automate complex scenarios or test automation is a brain-dead, non-interesting, zero-value-add activity.

If simply recording or ‘documenting’ rudimentary scripts that essentially repeat a sequence of ‘hard-coded’ steps over and over again is someone’s idea of well-designed test automation then I would agree that automation is a brain-dead activity. When you automate poorly designed tests, you simply get poorly designed automated tests! And, since I don’t mind calling the kettle black, I will say it…recording a set of actions, or documenting a sequence of steps with hard-coded values to feed into a keyword driven architecture is not test automation! Surely it automates tasks, but well designed automated tests are much more powerful than the production of some crude script that automates the actions performed by someone sitting in front of a computer.

Of course, well designed automated tests require highly skilled professional testers with in-depth knowledge of the systems they are working on, as well as proficiency in programming concepts and languages. Similar to how doctors study anatomy, physiology, pharmacology, immunology, biochemistry, etc., professional testers need to constantly study the various systems they are tasked to test. Developing a well designed automated test is very different than simply using a tool to automatically repeat a sequence of actions. Designing robust tests (automated or not) requires not only incredible creativity and problem-solving skills, but an in-depth knowledge of the system and an understanding of how to manipulate the system programmatically.

But, if you buy into the idea that automation is simple and merely rote recording or documenting some sequence of steps performed by some person then you get exactly that; simplistic repetitive automated actions. Simple automation is simply automated simplicity.

Written by Bj Rollison

November 18th, 2009 at 8:24 pm

Posted in Test Automation

Tagged with

Basic Blocks Aren’t So Basic

with 8 comments

Originally Published Friday, March 06, 2009

In the book How We Test Software at Microsoft I discuss structural testing techniques. Structural testing techniques are systematic procedures designed to analyze and evaluate control flow through a program. These are classic white box test design techniques, although my friend and respected colleague Alan Richardson states in his review of the book that he also employs similar techniques on models and I have to agree with him on that point.

Also, Peter M. sent me mail pointing out a reasonably obvious bug in the code chunks on pages 118 and 119. Both functions are declared as static void, but each has a return statement. Somehow this oversight made it through the review process, but of course a return statement in a function declared as static void would cause a compiler error. (Thanks for discovering that bug Peter and letting us know so we can fix it for the 2nd edition!)

Peter also asked for further clarification of how blocks are counted, and why a test that evaluated both conditional clauses in the compound expression as true in the below example (and on page 119) results in 85.71% coverage. Unfortunately, the answer for that is not simple.

Some surprising details…

   1: public static int BlockExample1(bool cond_1, bool cond_2)

   2: {

   3:   int x = 0, y = 0, z = 0;

   4:   if (cond_1 && cond_2)

   5:   {

   6:     x = 1;

   7:     y = 2;

   8:     z = 3;

   9:   }

  10:  

  11:   return x + y + z;

  12: }

The above code can be re-written as:

   1: public static int BlockExample2(bool cond_1, bool cond_2)

   2: {

   3:   int x = 0,

   4:   y = 0,

   5:   z = 0;

   6:   if (cond_1)

   7:   {

   8:     if (cond_2)

   9:     { 

  10:       x = 1;

  11:       y = 2;

  12:       z = 3;

  13:     } 

  14:   }

  15:  

  16:   return x + y + z;

  17: }

First, a ‘basic block’ is defined as a set of contiguous executable statements with no logical branches which seems pretty straight forward. So, based on our definition of basic blocks it appears there are 4 blocks of contiguous statements. However, the conditional clauses on line 4 and line 6 in the BlockExample2 method introduce logical branches which theoretically introduce 2 implicit blocks (e.g. one block when control flow follows the true path, and another block when control flow follows the false path). So, that is essentially how the 6 blocks are determined. But, that’s not the end of the story.

If we pass a Boolean true to both cond_1 and cond_2 conditional clauses the block coverage measure in BlockExample1 results in 85.71% coverage; however, the block coverage measure for BlockExample2 actually results in 100% coverage as illustrated below.

coverage What? How can this be? Both BlockExample1 and BlockExample2 are syntactically identical. Well, to understand this we would really need to dig deeper into compilers and coverage tools. That is well beyond the boundaries of this blog, but the IL does provide some insight.

msil

The MSIL for BlockExample1 is on the left and BlockExample2 is on the right. Now, I don’t want to do a deep dive into MSIL, but  those who are really observant can see that for some reason the Visual Studio compiler evaluated a branch in BlockExample1 to false (instruction IL_0008), and then instruction IL_000c compares the 2 values for equality and instruction IL_0015 appears to evaluate the optimized compound conditional expression to true. Compare that to BlockExample2 MSIL which shows the first comparison of 2 values occurs at IL_0009 and the branch is evaluated as true (IL_000f) and the second comparison of 2 values occurs at IL_0014 and again evaluates to true at instruction IL_001a.

But wait…it gets even more confusing. We typically measure structural coverage using the debug build. So, imagine my surprise when I recompiled the code using the retail build settings and again passed true arguments to the cond_1 and cond_2 parameters for BlockExample1 and BlockExample2 and the coverage tool in Visual Studio indicated these methods now only had 4 blocks, and the block coverage measure for both methods was 100% as illustrated below.

coverage2

Also, interestingly enough the compiler optimized the code so both methods had identical MSIL op code instructions as illustrated below.be2Steve Carroll (a senior developer in Visual Studio) wrote we "shouldn’t be too concerned if you can’t exactly identify where all the blocks are.  When you turn the optimizer on your binary, block counts are fairly unpredictable. Don’t worry though, the source line coloring will almost always lead you to the parts of the code that you need to worry about targeting to get your coverage stats up."

I agree with Steve when he states block counts are unpredictable when the code is optimized (and different tools that measure block coverage may provide different results). However, I only partially with his statement that source line coloring leading us to parts of the code we need to test. Maybe it will, maybe it won’t. But, professional testers performing an in-depth analysis of code coverage results will help us identify important parts of the code that require further investigation and testing.

So, what does it all mean?

Block testing is useful for unit testing and designing white box tests for switch statements and exception handlers (based on how we can track control flow through source code using a debugger as opposed to through the IL Disassembler). But, as I stated in How We Test Software at Microsoft block testing is the weakest form of structural testing. But, it does provide a different perspective as compared to other structural approaches or techniques and is useful when used by a professional tester in the right context.

But, the important point here is that just as we wouldn’t rely on only one tool to tune the carburetor on an automobile, we certainly would rely on only one technique or approach for designing structural tests; and we certainly wouldn’t only rely on structural testing as a single approach to testing. This example further reinforces another important point that I make in the book; code coverage is not directly related to quality. Any professional tester can clearly see that although we are able to achieve high levels of coverage with one test, these methods are not at all well tested.

Only a fool would use code coverage metrics to derive some measure of quality, or suggest the implication that high coverage measures equal greater quality. In truth, the value of code coverage is in its ability to help professional testers identify areas of the code that have not been previously exercised and to design tests to evaluate those areas of the code more effectively to help reduce overall risk.

If we don’t execute an area of code then we have zero probability of exposing errors in that code if they exist. However, just because we do execute a code statement doesn’t mean we expose all potential errors. But, it at least increases the probability from 0% and helps reduce risk.

Written by Bj Rollison

November 18th, 2009 at 8:20 pm

Troubleshooting Test Data with String Decoder

without comments

Originally Published Wednesday, February 25, 2009

I value static test data that is derived from historical failure indicators, or representative of typical end-users. But, of course a problem with static test data is that it only provides a limited set of all possible data, and becomes stale or provides little new information after multiple iterations of the test. So, I am a proponent of using random data in well-designed tests. Of course, recklessly generating random data is just plain dumb and potentially results in numerous false positives. But, when the data set is well defined and decomposed into equivalence class subsets then it is possible to generate random data that is representative of all possible data elements; probabilistic stochastic test data!

Last week I released an update to the test tool Babel for generating random strings of Unicode characters. Babel is a useful tool for comprehensive positive or negative testing of a textbox and other edit controls, and API parameters that take string arguments. Using probabilistic stochastic test data significantly increases the breadth of data coverage during a test cycle which increases the probability of exposing anomalies in string parsing and other string manipulation algorithms. But, when using characters from across the Unicode spectrum anomalies are usually caused by a specific character code point (or code points for surrogate pair characters), or combinations of characters.

image Of course, telling a developer that a string composed of the characters ꁲᱚRבּ䍳㄁܁쭤࿦ኳ causes an unexpected error would most likely be met with that classic deer in headlights look followed by some muttering such as "That’s not a real string" and "nobody would ever enter such a string." Often times developers are likely to shun random strings as test data, and managers might claim it is not representative of ‘real’ customer scenarios. So, the professional tester knows that instead of simply arguing in favor of random string testing we must troubleshoot the string to identify the specific character code point or code point combination causing the error. Because while a ‘real’ customer may not likely enter a string of random characters from multiple language scripts, the problem is likely caused by a single character (and sometimes the combination of character code points), and there is some probability of a customer somewhere in the world entering that problematic character! So, as professional’s we must find that specific problematic character.

To help professional testers decode each character in a string to its code point value I recently completed a new tool called String Decoder. This test tool is an updated version of my old Str2Val tool (which had some serious problems when converting strings with surrogate pair characters). String Decoder will decode Unicode characters (including surrogate pairs) to their hexadecimal UTF-16 (Big or Little Endian), UTF-8, UTF-7 encoding values, or an integer value (UTF-32).

For example the UTF-16 Big Endian encoding values displayed in the Results list in the image for the given string.

Once the specific character code point or combination is identified, the tester can now tell the developer exactly what Unicode character or integer value is causing the anomaly. For example, it is much better to state a Unicode value of U+13BD is causing unexpected functionality as compared to trying to explain how to input the Cherokee letter MU or saying "just enter this character  Ꮍ."

String Decoder can also be used to compare different Unicode transformation format encodings, or convert between Unicode hex values and 32-bit integer values of characters.

Let me know what you think!

Written by Bj Rollison

November 18th, 2009 at 8:15 pm

Posted in Testing Tools

Tagged with

Random String Generation…Update!

without comments

Originally Published Tuesday, February 17, 2009

One of the biggest challenges in input testing is the sheer amount of potential characters and the virtually infinite number of permutations of those characters in different character positions in a string. Even if we know about the myriad of language scripts used throughout the world, manually generating characters from multiple language groups would be excruciatingly inefficient.

Since any modern application should support Unicode character we can assert the strings “abcdefg” and “ڄƥ藖꼩昨”are equivalent for most input testing requiring a Unicode string. So, random string test data generation is useful for easily increasing the breadth of test data tested, and also for testing the robustness of the applications ability to process complex data streams.

Babel 2.0 is a free test tool, and one of the few random string generators that can generate a string of character across the entire Unicode spectrum, since its initial release in 2006 it has been widely popular. So, I am happy to announce that an updated Babel 2.0 is released! I know this constitutes a shameless plug…but, sometimes it helps to plug tools we’ve made that can benefit other testers or developers.

Unlike many string generators that only produce a string of random ASCII characters, Babel can produce a string of random Unicode characters defined in the Unicode 5.1 specification, including surrogate pair characters (which often expose problems in various text boxes…hint, hint). Additional updates to Babel 2.0 include:

  • Updated to the Unicode 5.1 spec (including new script groups and character code points)
  • Ability to include/exclude combining character code points
  • Ability to include/exclude reserved NetBIOS characters
  • Custom range allows character generation from 0×01 through 0xFFFF.
  • Ability to generate strings with a max length of 100,000 characters
  • Improved distribution of characters from the selected language script groups

The following illustration provides a basic flow diagram of how Babel generates random strings. Essentially, one script group is randomly selected from all selected script group nodes, and all code points assigned to that script group are put into a collection. Next, one character is randomly selected from that collection and is appended to a string. This process continues until the string length equals a specified number of characters.

Babel

Better distribution of character selection across multiple script groups occurs by preventing the same script group from being selected before at least ½ of the other specified groups are selected. This means that as long as more than one script group node is selected the selected group of characters will be removed from the random selection process until at least half of the other script groups are chosen. This provides a greater distribution as compared to simple random generation.

The download also includes the Babel.DLL (and the dependent UnicodeData.DLL) for test automation. The older methods are deprecated and no longer supported. The new methods have been simplified and now include:

public static string Polyglot (int, int, bool, bool, bool, bool, bool)
Returns a string of random Unicode characters in all Unicode script groups based on a specified seed value.

public static string Polyglot (int, bool, bool, bool, bool, bool, out int)
Generates a random seed value and returns a string of random Unicode string of characters in all Unicode script groups, and passes a reference to the seed value.

public static string Polyglot ( int, int, bool, bool, bool, bool, bool, char, char)
Returns a string of random Unicode string of characters in all Unicode script groups based on a specified seed value

public static string Polyglot (int, bool, bool, bool, bool, bool, char, char, out int)
Generates a random seed value and returns a string of random Unicode string of characters in all Unicode script groups, and passes a reference to the seed value.

Get the new release of Babel 2.0 !

Written by Bj Rollison

November 18th, 2009 at 8:10 pm

Posted in Testing Tools

Tagged with ,