Skip to content

Localization Testing – Part II

Originally Published Friday, October 30, 2009

I should be of no surprise to anyone that localization testing generally focuses on changes in the user interface, although as mentioned in the previous post these are not the only changes necessary to adapt a product to a specific target market. But, the most common category of localization class bug are usability or behavioral type issues that do involve the user interface. Bugs in this category generally include un-localized or un-translated text, key mnemonics, clipped or truncated text labels and other user interface controls, incorrect tab order, and layout issues. Fortunately, the majority of problems in this category do not require a fix in the software’s functional or business layer. Also, the majority of problems in this category do not require any special linguistic skills in order to identify, and in some cases, an automated approach can be even more effective than the human eye (more on that later).

Perhaps the most commonly reported issue in this category is “un-localized” or un-translated textual string. Unfortunately, in many cases un-translated strings is also an over-reported problem that only serves to flood the the defect tracking database with unnecessary bugs. Translating textual strings is a demanding task, and made even more difficult when there are constant changes in the user interface or contextual meaning of messages early in the product life cycle. Over-reporting of un-translated text too early in the product cycle only serves to artificially inflate the bug count, and causes undue pressure and creates extra work for the localization team.

Identifying this type of bugs is actually pretty easy. Here’s a simple heuristic; if you are testing a non-English language version in a language you are not familiar with and you can clearly read the textual string in English it is probably not localized or translated into the target language. The illustration below provides a pretty good example of this general rule of thumb. A tester doesn’t have to read German to realize that the text in the label control under the first radio button is not German.

1

There are several causes of un-localized text strings to appear in dialogs and other areas of the user interface. For example:

  • Worse case scenario is that the string is hard-coded into the source files
  • Perhaps localizers did not have enough time to completely process all strings in a particular file
  • Perhaps this is a new string in a file localizers thought was 100% localized
  • Strings displayed in some dialogs come from files other than the file that generates the dialog, and the localization team has not process that file
  • And, sometimes (usually not often), a string may simply be overlooked during the localization process

Testing for un-localized text is often a manually intensive process of enumerating menus, dialogs, and other user interface dialogs, message boxes and form, and form elements.  But, if the textual strings are located in a separate resource file (as they should be), a quick scan of resource files might more quickly reveal un-translated textual strings. Of course, there is little context in the resource file, and I also hope the localization team is reviewing their own work as well prior to handing it over to test.

Also, here are a few suggestions that might help focus localization testing efforts early in the project milestone and reduce the number of ‘known’ or false-positive un-translated text bugs being reported:

  • Ask the localization team to report the percentage of translation completion by file or module for each test build. Early in the development lifecycle only modules that are reported to be 100% complete which appear to have un-translated text should be reported as valid bugs. Of course, sometimes some strings are used in multiple modules, or may be coming from external resources. But, especially early in the development lifecycle reporting a gaggle of un-translated text bugs is simply “make work.” As the life cycle starts winding down…all strings are fair game for bug hunters!
  • Testers should use tools such a Spy++ or Reflector to help identify the module or other resources, and the unique resource ID for the problematic string or resource. This is much better then than simply attaching an image of the offending dialog to a defect report. Identifying the module and the specific resource ID number allows the localization team to affect a quick fix instead of having to search for the dialog through repro steps and track down the problem.
  • Also remember that not all textual strings are translated into a specific target language. Registered or trademarked product names are often not translated into different languages. In case of doubt, ask the localization team if a string that appears un-localized is a ‘true’ problem or not.

Unlocalized strings usually due to hard coded strings also tend to occur in menu items. This is especially true in the Windows Start menu or sub-menu items hard-coded in the INF or other installation/setup files. For example, the image on the right shows a common problem on European versions of Windows. Many European language versions  localize the name of the Program Files folder, and the menu item in the start menu. But, often times when we install an English language version of software to Windows it creates a new "Programs" menu item (and even a new Program Files directory, rather than detecting the default folder to install to. In the example on the left, the string Accessories is a hard-coded folder name. But, there is another issue as well. This illustrates not only a problem with the non-translated string "Accessories," but also shows one full-width Katakana string for ‘Accessories’ and another half-width string.

In part 3 I will discuss another often problematic area in localization….key mnemonics.

Localization Testing: Part 1

Originally Published Tuesday, October 27, 2009

When I first joined Microsoft 15 years ago I was on the Windows 95 International team. Our team was responsible for reducing the delta between the release of the English version and the Japanese version to 90 days, and I am very proud to say that we achieved that goal and Windows 95 took Japan by storm. It was so amazing that even people without computers were lined up outside of sales outlets waiting to purchase a copy of Windows 95. The growth of personal computers in Japan shot through the roof over the next few years. Today the Chinese market is exploding, and eastern European nations are experiencing unprecedented growth as well.  While the demand for the English language versions of our software still remains high, many of our customers are demanding software that is ‘localized’ to accommodate the customers national conventions, language, and even locally available hardware. Although much of the Internets content is in English, non-English sites on the web are growing, and even ICANN is considering allowing international domain names that contain non-ASCII characters this week in Seoul, Korea.

But, a lot has changed in how we develop software to support international markets. International versions of Windows 95 were developed on a forked code base. Basically, this means the source code contained #ifdefs to instruct the compiler to compile different parts of the source code depending on the language family. From a testing perspective this is a nightmare, because if the underlying code base of a localized version is fundamentally different than the base (US English) version then the testing problem is magnified because there is a lot of functionality that must be retested. Fortunately today, much software being produced is based on a single-worldwide binary model. (I briefly explained the single world wide binary concept at a talk in 1991, and Michael Kaplan talks about the advantages here.) In a nutshell, a single worldwide binary model is a development approach in which any functionality any user anywhere in the world might need is codified in the core source code so we don’t need to modify the core code once it is compiled to include some language/locale specific functionality.  For example, it was impossible to input Japanese text into Notepad on an English version of Windows 95 using an Input Method Editor (IME); I needed the localized Japanese version. But, on the English version of Windows Xp, Vista, or Windows 7 all I have to do is install the appropriate keyboard drivers and font files and expose the IME functionality. In fact, these days I can map my keyboard to over 150 different layouts and install fonts for all defined Unicode characters on any language version of the Windows operating system.

The big advantage of the single worldwide binary development model is that it allows us to differentiate between globalization testing and localization testing.  At Microsoft we define globalization as “the process of designing and implementing a product and/or content (including text and non-text elements) so that it can accommodate any locale market (locale).” And, we define localization as “the process of adapting a product and/or content to meet the language, cultural and political expectations and/or requirements of a specific target market.” This means we can better focus on the specific types of issues that each testing approach is most effective at identifying. For localization testing, this means we can focus on the specific things that change in the software during the “adaptation processes” to localize a product for each specific target market.

The most obvious adaptation process is the ‘localization’ or actually the translation of the user interface textual elements such as menu labels, text in label controls, and other string resources that are commonly exposed to the end user. However, the translation of string resources is not the only thing that occurs during the localization process. The localization processes that are required to adapt a software product to a specific local may also include other changes such as font files and drivers installed by default, registry keys set differently, drivers to support locale specific hardware devices, etc.

3 Categories of Localization Class Bugs

I am a big fan of developing a bug taxonomic hierarchy as part of my defect tracking database as a best practice because it better enables me to analyze bug data more efficiently. If I see a particular category of bug or a type of bug in a category that is being reported a lot, then perhaps we should find ways to prevent or at least minimize the problem from occurring later in the development lifecycle. After years of analyzing different bugs, I classified all localization class bugs into 3 categories; functionality, behavioral/usability, and linguistic quality.

Functionality type bugs exposed in localized software affect the functionality of the software and require a fix in the core source code. Fortunately, with a single worldwide binary development model where the core functionality is separated from the user interface the number of bugs in this category of localization class bugs is relatively small.  Checking the appropriate registry keys are set and files are installed in a new build is reasonably straight forward and should be built into the build verification test (BVT) suite. Other types of things that should be checked include application and hardware compatibility. It is important to identify these types of problems early because they are quite costly to correct, and can have a pretty large ripple effect on the testing effort.

Behavioral and usability issues primarily impact the usefulness or aesthetic quality of the user interface elements. Many of the problems in this category do not require a fix in the core functional layer of the source code. The types of bugs in this category include layout issues, un-translated text,  key mnemonic issues, and other problems that are often fixed in the user interface form design, form class, or form element properties. This category of problems often accounts for more than 90% of all localization class bugs. Fortunately, the majority of problems in this category do not require any special linguistic skills; a careful eye for detail is all that is required to expose even the most discrete bugs in this category.

The final category of localization class bug is linguistic quality. This category of bugs are primarily mis-translations. Obviously, the ability to read the language being tested is required to identify most problems in this category of errors. We found testers spent a lot of time looking for this type of bug, but later found the majority of linguistic quality type issues reported were resolved as won’t fix. There are many reasons for this, but here is my position on this type of testing….Stop wasting the time of your test team to validate the linguistic quality of the product. If this is a problem then hire a new localizer, hire a team of ‘editors’ to review the work of the localizer, or hire a professional linguistic specialist from the target locale as an auditor. Certainly, if testers see an obvious gaff in a translation then we must report it; however, testers are generally not linguistic experts (even in their native tongue), and I would never advocate hiring testers simply based on linguistic skills nor as a manager would I want to dedicate my valuable testing resources on validating linguistic quality…that’s usually not where their expertise lies, and it probably shouldn’t be.

What’s Next

Since behavioral /usability category issues are the most prevalent type of localization class bug this series of posts will focus on localization testing designed to evaluate user interface elements and resources. The next post will expose the often single most reported bug in this category.

Adding Variability in Test Case Design

Published Tuesday, October 20, 2009 IMG_5549

I love autumn! Yes, I am definitely a boy of summer and very much prefer warmer weather; however, there is something special about autumn. This past weekend my daughter, and my 2 friends Dongyi and her husband Yuning and I participated in the Rum Run sailboat fun race with an overnight raft up at Bainbridge Island’s Port Madison. Saturday morning was quite rainy, but the wind was blowing 15 knots with gusts to 25 knots and NOAA weather radio announcing gale force warnings in Puget Sound. Wow…what a ride! But, it was actually the rather relaxing sail back to my marina on Sunday morning that rekindled the beauty of autumn in my mind. The bright reds, golden yellows, and pastel browns of the foliage seemed to blend into a collage framed by the darkness of the waters of Puget Sound and the snow covered peaks of the Olympic mountains. The beauty of autumn reminds me about change. A sloughing of the old, the cleansing brought about by the pure white snows, eventually followed by the new and fresh growth that blossoms in spring.

Just as the earth goes through variable cycles of rejuvenation, we must also continually update our tests, and (more importantly) the test data we use in our test cases to prevent them from becoming stale. Trees shed their leaves in the autumn and new leaves emerge in the spring, but the tree is fundamentally still the same tree. Similarly, a well-designed test case has a unique fundamental purpose and by changing the variables we can grow the value of that test case. Of course, the cycle of change in test data should be dramatically shorter in duration as compared to the seasonal changes of mother earth.

Here is a simple example of how a well-designed test case using variable test data can increase the value of the information each  test iteration provides through increased confidence and also potentially reduce overall risk. In my role at Microsoft I am in a unique position to not only conduct controlled studies, but I can also implement ideas into practice on enterprise level software projects. One experiment I started about 2 years ago involved multiple groups of testers (sessions) located around the world divided into 3 separate control groups. Each control group tested the identical web page that would display the stock price if the user input a valid stock ticker symbol into a single textbox on the page and pressed the OK button. The only difference in the control groups was the instructions to perform single positive test case with the specific purpose of “ensure any valid stock ticker symbol displays the current stock price for the publicly traded stock specified by its symbol.” The purpose of the study was to determine if different cultural and experiential backgrounds impacted the test data used in a test based on the instructions for a test case. The study collected demographic information on the participants as well as specific inputs applied to the web page. Information on the oracle used by the students was collected anecdotally. Step one in each test was identical because we were not interested in how the tester launched the browser. (Of course this assumes there are other tests that test the multitude of ways to launch a browser and navigate to a URL. Also, if the browser failed to launch the test case is blocked.)

Group 1 was given the most vague instructions for the test case. The instruction was simply:

  1. Launch browser and navigate to [url address]
  2. Enter a valid stock ticker symbol and press the OK button and verify the accuracy of the returned stock price.

The instructions in the test case given to Group 2 were also somewhat vague, but provided a little guidance both on input and oracle.

  1. Launch browser and navigate to [url address]
  2. Enter a valid stock ticker symbol (e.g. “MSFT”)
  3. Press the OK button
  4. Verify the returned stock price is identical to the current stock price listed on the appropriate exchange

Group 3 had similar instructions to Group 2, but the group was given additional guidance as indicated below.

  1. Launch browser and navigate to [url address]
  2. Enter a valid stock ticker symbol from a publicly traded stock listed on any public stock exchange
    Listings of valid stock ticker symbols are on stock exchange web sites such as:
    http://www.nyse.com
    http://www.eoddata.com/Symbols.aspx
    http://www.nasdaq.com
    http://www.londonstockexchange.com
  3. Press the OK Button
  4. Verify the returned stock price is identical to the current stock price listed on the appropriate exchange

Results

The results were mostly not surprising, but rather reinforcing. For example, we expected Group 1 to be rather random, but mostly aligned with ticker symbols they were familiar with. Of course, the majority (90%) of stock ticker symbols entered was MSFT and there was no significant difference in cultural background, locale, experience or educational background. (As this study was conducted at Microsoft I am sure there was some bias as to the symbol entered.) What was most interesting was that testers with no formal training (no previous courses in testing, no CS degree, and read less than one discipline specific book) and with more than 2 years of test experience were approximately more likely (25%) to violate the purpose of the test and enter random or completely invalid data as their first action. In other words, instead of executing the required test their initial reaction was to immediately go on a bug hunt.

In group 2 99% of the participants simply entered the stock ticker symbol “MSFT.” But, what was even more surprising was the fact that one the next day, the same people in that group were given the same exact test, and 95% of them simply reentered MSFT. Perhaps this is laziness, perhaps this is related to the superficial nature of the study, or perhaps this is due to individuals taking the path of least resistance. The percentage of people who entered identical stock ticker symbols on consecutive days was not significantly different between group 1 and group 2.

It should be no surprise that group 3 had the greatest distribution of variable test data applied to the web page. Demographics had no impact on any of the people who were in group 3. The majority of people in group 3 (78%) would select the first stock exchange listed (regardless of what link it was) but there was no significant overlap in the selected stock ticker symbols. When asked to repeat the test on the next day 83% of the participants selected a different link and and a different symbol. Of those who selected the same link 97% selected a different stock ticker symbol. On the down side, approximately 4% of the people simply took the path of least resistance and input MSFT as the test data on both days of the experiment.

Conclusion

One of the most common problems I hear about ‘scripted,’ or pre-defined test cases is that they are too prescriptive and not flexible enough to allow the tester to try things. Of course, a well-designed test case is not simply a prescriptive set of steps inputting the same hard coded test data they run over and over. So, in this study we made the assumption that a scripted test case that specified “Enter MSFT in the textbox” would simply result in the tester entering “MSFT” without any thinking on the part of the tester. Hard-coding variable test data is often times the worse possible way to design a test case.

Vaguely written test cases added some level of variability, but also seemed to increase the probability of the tester executing context free tests outside the scope of the purpose of the test. In fact, what we found was some testers (approx 2%) simply went on a bug hunt and never actually input a valid stock ticker symbol at all during the session.

A test case that provided only one example that is representative of the type of test data required for the test case produced the least desirable results in this study. I am not sure this would be the case in practice. However, based on this study if I were to outsource execution of a test case similar to that used by group 2 the only thing I could guarantee is that MSFT would definitely be tested numerous times, and the variability of other test data would be extremely limited regardless of the number of testers executing that test or the number of iterations.

When faced with a virtually infinite number of possibilities for input variables as test data used in either positive or negative tests we need to test as many possibilities as possible given the available resources in order to increase test coverage and reduce overall risk. So, one way increase the coverage of test data while still achieving the specific purpose of the test case is to provide useful resources that help guide the tester while relying on the tester’s creative thinking skills and curiosity to expand the test coverage.

Of course, we can also increase variability of test data and capture the essence of the tester’s creativity using a similar approach in a well-designed automated test case as well. In fact, a similarly designed automated test case enables us to significantly increase the amount of variable test data that is exercised in order to expand test coverage and increase overall confidence.

Randomizing Static Test Data in Automated Tests

Originally Published Sunday, October 11, 2009

A significant percentage of static test data is stored in tabular comma delimited or tab-delimited formats and saved in Excel spreadsheets. Reading in comma or tab-delimited static test data into an automated test is pretty straight forward and there are numerous examples in many programming languages illustrating how to read in these types of test data repositories. Reading in rows of data is the foundation of data-driven automation and definitely has its place in any automation project.

I am a big proponent of stochastic (random) test data generation that is customized to the context, but I also know that sometimes static test data is useful for establishing baselines and more exact emulation of ‘real-world’ customer-like inputs. But, if the automated test is simply passing the same variable arguments to the same input parameters in the same order over and over again the value of subsequent iterations of that automated test using that static data set diminishes rather quickly. So how can we more effectively utilize static test data in our automated tests?

One possible solution is to randomly select an argument from a collection of static variables that is passed to the specific input parameter. The advantage of this approach is that it effectively increases the test data permutations in each iteration of the test case. For example, let’s consider 2 input parameters; one for a given name and one for a surname. In a traditional data-driven approach in which the static test data is read in by rows our test data file might be similar to:

Bob,Smith
John,Johnson
Roger,Williams
Steve,Abbot

This static data file would give us 4 sets of test data, but each time the test data is read into the test case the given and surnames are always the same.

However, if we read in the given names and surnames into 2 collections, and then randomly select a given name and surname from the appropriate collection to pass to the respective parameter we effectively have 16 possible combinations of static test data to work with. An advantage of this approach is that our ‘collections’ of given names and surnames can contain differing numbers of elements (in which case the number of possible combinations of test data is the Cartesian product of the number of elements in each collection).

Of course there are many ways to accomplish this. For example, one approach is to continue to use a comma or tab-delimited file format and list given names in one row and surnames in a second row. Another approach is to list the given names and surnames in columns in a spreadsheet and read in each column into a collection of some sort. The latter is the approach I used in developing my PseudoName test data generator tool. I chose this approach for 2 reasons; first an Excel spreadsheet is a simple yet powerful file format for storing static test data, and secondly because lists of test data are sometimes better represented in columns rather than rows.

The following code shows one way to read in test data by columns from an Excel spreadsheet.

   1: // <copyright file="datareader.cs" company="TestingMentor"> 

   2: // Copyright © 2009 by Bj Rollison. All rights reserved. 

   3: // </copyright> 

   4:   

   5: namespace TestingMentor.TestTool 

   6: { 

   7:   using System; 

   8:   using System.Collections; 

   9:   using System.Globalization; 

  10:   using System.Runtime.InteropServices; 

  11:   using System.Threading; 

  12:   using Excel = Microsoft.Office.Interop.Excel; 

  13:   

  14:   /// <summary> 

  15:   /// This class contains methods for reading test data from Excel spreadsheets 

  16:   /// </summary> 

  17:   public class TestDataReader 

  18:   { 

  19:     /// <summary> 

  20:     /// This method reads all the data elements in the specified number of 

  21:     /// columns in the specified Excel spreadsheet containing the test data 

  22:     /// and copies the data into a multi-dimensional array 

  23:     /// </summary> 

  24:     /// <param name="dataFileName">The filename containing the test data</param> 

  25:     /// <param name="columnCount">The number of columns in the Excel 

  26:     /// spreadsheet to read</param> 

  27:     /// <returns>A multi-dimensional array containing the data eleements for 

  28:     /// each column </returns> 

  29:     public static string[][] ExcelColumnReader(string dataFileName, uint columnCount) 

  30:     { 

  31:       CultureInfo originalCulture = null; 

  32:       Excel.Application excelApp = null; 

  33:       Excel.Workbook excelWorkbook = null; 

  34:       Excel.Worksheet excelActiveWorksheet = null; 

  35:       string[][] testData = new string[columnCount][]; 

  36:   

  37:       originalCulture = Thread.CurrentThread.CurrentCulture; 

  38:       Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US"); 

  39:   

  40:       excelApp = new Excel.Application(); 

  41:       excelWorkbook = excelApp.Workbooks.Open( 

  42:         dataFileName, 

  43:         0, 

  44:         false, 

  45:         5, 

  46:         String.Empty, 

  47:         String.Empty, 

  48:         false, 

  49:         Type.Missing, 

  50:         String.Empty, 

  51:         true, 

  52:         false, 

  53:         0, 

  54:         true, 

  55:         false, 

  56:         false); 

  57:       excelActiveWorksheet = (Excel.Worksheet)excelWorkbook.ActiveSheet; 

  58:   

  59:       for (int i = 0; i < columnCount; i++) 

  60:       { 

  61:         // Start at column 1 

  62:         object columnIndex = i + 1; 

  63:   

  64:         // Row 1 is the column title; test data starts on Row 2 

  65:         object rowIndex = 2; 

  66:         ArrayList tempCollection = new ArrayList(); 

  67:         while ( 

  68:           ((Excel.Range) 

  69:           excelActiveWorksheet.Cells[rowIndex, columnIndex]).Value2 != null) 

  70:         { 

  71:           tempCollection.Add( 

  72:             ((Excel.Range) 

  73:             excelActiveWorksheet.Cells[rowIndex, columnIndex]).Value2); 

  74:           rowIndex = (int)rowIndex + 1; 

  75:         } 

  76:   

  77:         testData[i] = new string[tempCollection.Count]; 

  78:         testData[i] = (string[])tempCollection.ToArray(typeof(string)); 

  79:       } 

  80:   

  81:       // Clean up 

  82:       excelWorkbook.Close(false, Type.Missing, Type.Missing); 

  83:       excelWorkbook = null; 

  84:       excelApp.Quit(); 

  85:       excelApp = null; 

  86:   

  87:       // Garbage collection is not pretty, but necessary to release Excel proc 

  88:       System.GC.Collect(); 

  89:       System.GC.WaitForPendingFinalizers(); 

  90:   

  91:       if (originalCulture != null) 

  92:       { 

  93:         Thread.CurrentThread.CurrentCulture = originalCulture; 

  94:       } 

  95:   

  96:       return testData; 

  97:     } 

  98:   } 

  99: } 

I must tell you that performance can be an issue especially if the columns contain a lot of data. For example, to read in approximately 700 elements of test data in 3 separate columns took slightly less than 1 second, and reading in 1800 elements in 3 columns required just over 4 seconds. Unfortunately, I didn’t compare total byte counts, but it is pretty obvious the greater the number of test data elements being read the longer the read operation will take and you certainly will have to take the read time into consideration in your automated test case.

Reading static test data line by line from a data file while looping through a data-driven automated test case is a useful test design approach in some situations, this is another useful approach that will allow the test designer to randomize the combinations of static test data values applied to multiple input parameters in multiple iterations of an automated test case.

The Primary Goal of a Tester Should Be To Work Themselves Out Of A Job!

Originally Published Thursday, October 01, 2009

Software is knowledge. Software is the intangible product crafted by a team of people who have pooled their intellectual knowledge to help solve a complex problem and add value to those who use that software. So how does a tester contribute to the intellectual knowledge pool?

I guess we could say that finding and reporting bugs during the software development lifecycle (SDLC) is important knowledge because it helps identify many anomalies prior to release. But, the mere act of finding and reporting bugs is transient knowledge. Reporting bugs in the system does not add any long term or persistent value to the intellectual knowledge pool of a software company. Perhaps even worse, finding the same type of issue repeatedly actually stagnates the intellectual knowledge pool because the team is focused on fixing the same problem over and over again. Of course, finding really interesting and important bugs requires a lot of knowledge and creativity. But, once the bug is fixed the value that bug may have provided in the intellectual knowledge pool evaporates; especially if there is no shared learning experience that occurs as a result of that fixed issue.

One way software testers can significantly contribute to the intellectual knowledge pool is through defect prevention instead of defect detection. Simply put, if we expand our vision of the role of the tester to include problem solving instead of just problem finding we can open up new challenges, provide overall greater value to our business, and further advance the discipline of software testing. For example, if we were to identify a particular area or category of defects and identify the root causes for that type of problem then we can implement various strategies or best practices to prevent those types of issues from being injected into the product design from the onset, or at least develop testing patterns or tools to help the team identify many of that category of issues sooner in the SDLC. Understanding why certain categories of problems occur and providing best practice solutions within the appropriate context is intellectual knowledge that can be shared with existing and new team members, and can persist to help prevent certain types of problems from recurring in the future. This is intellectual capital in the knowledge pool. Testing tool and test patterns that can be shared and taught to others that help identify certain types of issues sooner can reduce testing costs. This is also intellectual capital in the knowledge pool.

If I am constantly burdened with finding the same types of problems over and over again, then my contribution to the SDLC and the knowledge pool is essentially limited to the bugs I find, and the value of those bugs often depreciates rapidly. Basically, I am simply identifying problems; I am not contributing to solving the problems.

Of course, I don’t think testers will ever work themselves out of a job and we will always be in the business of identify issues during the SDLC. But, if I solve one type of problem, then I can move on to face new and more difficult challenges. By solving one problem I get that job off my plate, and I can then move onto the next job. Organizational maturity and professional growth occurs through solving increasingly complex problems, not by continually dealing with the same problem.

I think the role of a professional tester is growing beyond that of simple problem identification, and many of us are exploring the more challenging aspects of problem solving. Finding ways to prevent defects or identify issues earlier, and essentially drive quality upstream are exciting challenges that will advance the practice of software testing and increase the value of our contributions to the intellectual knowledge pool and advance the profession of software testing.

Prevention is the Best Medicine

Originally Published Thursday, September 24, 2009

Israel & Nuremberg 015 The past 2 weeks have been a bit rough. While in Israel I began to feel a bit congested. By the time I hit Nürmberg, Germany for 12th International Conference on Quality Engineering in Software Technology I was injecting nose-juice (nasal decongestant) about every 2-3 hours and couldn’t sleep through the night. Fortunately I didn’t speak until Friday, so Monday morning I visited a local Apotheke (Pharmacy), described the symptoms, and was presented with some medicinal remedies by the pharmacist. By Wednesday I was much worse, so again tried another pharmacy and was given a different batch of drugs. By Friday morning I was struggling, but managed to present my talk on probabilistic stochastic test data generation using parameterized equivalent partitions and genetic algorithms (which I will discuss in a future post). Unfortunately, I had to cancel another engagement and reschedule my flight home for Saturday. Once home I went to my doctor and was quickly diagnosed with a bacterial infection in my nasal cavities.

Now, I am not telling you this story to seek your sympathy, but to illustrate a point. I had convinced myself that I simply had a slight cold that I could treat with over-the-counter remedies, and perhaps due to my own stubborn nature I refused advice from my friends in Germany to see a physician. In the end, I realized I was simply treating the symptoms and ignoring the root cause of the real problem. So, I sometimes wonder if we are too focused on treating the symptoms of buggy software by focusing our testing efforts on bug detection rather than addressing the real problem and thinking more about bug prevention.

In my opinion, one of the most significant ways we can directly impact quality of the product and the effectiveness of our teams is not by trying to beat the bugs out of the product after the designers and developers have spent days/weeks injecting bugs into the product, but through partnering with the PMs and developers earlier in the lifecycle to prevent issues from ever getting into the product to begin with. If we continue to think of testing as an after-the-fact process than we might never advance our discipline, and perhaps even worse, we might relegate the role of testers to nothing more than bug-finders.

Defect prevention doesn’t negate or eliminate the need for system level testing, but it could certainly change the role of testers throughout any product lifecycle. Rather than perpetuating an adversarial  “don’t trust the developer” attitude I envision testers and developers working in a more symbiotic relationship (Доверяй, но проверяй – Trust, but verify). For example, I think many readers would agree that developers are responsible for unit testing, but I wonder how many testers are proactively engaging their development partners and suggesting ways to improve the effectiveness of their unit tests (without adding significant additional overhead), or participating in code inspections. And, how many testers are engaged in design reviews and prototyping with program managers and designers in an effort to prevent sub-optimal designs which often leads to a tremendous amount of rework.

The ability to move quality upstream through defect prevention requires different skills and capabilities, but also opens up new and greater challenges for software testers.

“Bug prevention is testing’s first goal.” – B. Beizer

Best Practices – Philosophy vs. Practicality

Originally Published Saturday, September 12, 2009

lala-land_2 I have spent the last week in Israel teaching our new SDET course in Herzillya and our Senior SDET course in Haifa. I also did a lot of listening and discussing various issues relating to software testing and the maturation of our discipline; not just here in Israel, but around the world both inside and outside of Microsoft. Now I am sitting at LaLa Land after a relaxing day of sailing in the Med, and reflecting on the past week’s discussions.

DSC_3455

 

One of the topics we discussed was best practices, and that  seems appropriate to write about since the concept of “best practices” was recently discussed (again) in an article in the Software Test and Performance magazine by Eddie Correia. Eddie argues “…the notion of “best practices” is not useful. Best for whom? And for what kind of testing?” Actually, this is just a repetition of the same old fustian melodramatic hyperbole of the “context-driven” posse.

Perhaps the philosophical “questioning” of best practices may be interesting for folks who like to run around quoting Aristotle and Plato. However, from a pragmatic point of view this is a rather benign debate for anyone capable of thinking for themselves.

In reality, many different professions recognize the concept of best practices. For example, a best practice in preventative medicine is to rinse a minor abrasion and apply a topical antiseptic ointment. A best practice in plumbing is to wrap Teflon tape in the direction of the threads when fitting pipes. “Eliminating distractions in the operational area” is listed as one of the best practices by the FAA for airfield safety. Do these “best practices” apply in all situations? No, they don’t.

So, why do so many professionals recognize the concept of “best practices.” Because they understand that best practices provide guidelines that are generally more effective in the appropriate context as compared to other approaches. They understand that “best practices” are not a rules or rigid standards that must be followed in all circumstances, but “best practices” are general solutions to common problems that can be shared among professionals who might face similar situations. The professionals who understand “best practice” concepts are usually well-trained on other comparable practices for the type of problem they are facing, and know when to apply the best practice within the appropriate situation. They understand that “best practices” don’t simply apply to 1 or 2 limited situations, but have been proven to be generally effective for that particular type of problem.

But, most importantly these professionals (who recognize “best practices”) are extremely  knowledgeable about their field and can “act with appropriate judgment” (that’s sapience for the CD crowd), and conversely know when to approach a problem using a different solution.

Fortunately, the argument against “best practices” only stems from a few people who are seemingly more interested in stirring up portentous philosophical debate rather than earnestly discussing the practical advancement of the profession of software testing beyond mysticism and emotionally charged rhetoric. And, that argument really seems to boil down to a rather condescending and incorrect viewpoint that best practices are merely steadfast rules and requirements that must be followed in all situations. I say condescending because this point of view seems to suggest that testers are incapable of analyzing a problem and logically rationalizing the benefits and limitations of various approaches to problems in different situations to reach appropriate decisions on their own.

Personally, I think professionals in the discipline of software testing are highly intelligent, and are quite capable of making smart decisions, and can “act with appropriate judgment” in a wide variety of contextual situations. I also think discussions of best practices are enriched with case studies outlining situations where they may not apply and the alternative approaches that were more effective in those situations. And, I think “best practices” provide a common reference for professionals in that field that can be shared and further developed, and perhaps even give rise to new “best practices” for varying situations.

So, for those of you who believe there is a “one-size fits all” solution that can be applied in every situation I recommend that you don’t subscribe to the concept of best practices. (I would also recommend these people are well supervised and constantly monitored.) But, for the vast majority of professionals in the practice of software testing I suspect you understand the notion of “best practices” is quite useful for pragmatic discussions for advancing the intellectual knowledge pool of our profession and maturing our discipline.

微软的软件测试之道(Microsoft核心技术丛书)

book cover in Chinese

Originally Published Thursday, September 10, 2009

I am really happy to announce that our book has been released in China and available on the Chinese Amazon site! This was really a monumental effort driven by my friend and colleague Kelly Zhang.

We look forward to the feedback from the Chinese testing community, and we hope this provides our Chinese friends  with some additional perspectives on software testing (or at least some interesting stories).

Test Automation ROI (Part II)

Originally Published Wednesday, September 02, 2009

Last week I talked about the silliness of wasting time calculating the return on investment (ROI) of an automation effort on any non-trivial software project; especially if it has an extended shelf-life. As my friend Joe Strazzre commented, “If you need an ROI analysis to convince business management that test automation is a good thing when used intelligently, than you have already lost.”

But, management might need to be educated on the limitations of record/playback, rudimentary hard-coded scripts and keyword driven automation efforts because these it is often more appealing for bean counters to invest in low cost tools and continue to rely on non-coding bug finders or domain experts to script out ‘tests’ which do nothing more than repeat some rote set of steps over and over again. But, as E.Dustin, T. Garrett, and B. Gauf wrote in Implementing Automated Software Testing any serious software automation effort “is software development.” Well designed automated tests requires highly skilled, technically competent, extremely creative, analytical testers capable of designing and developing automated tests using procedural programming paradigms.

We should still apply ROI concepts in test automation, but at a much lower level. Essentially, each tester must evaluate the return on investment of any test before automating it. The most fundamental purpose of an automated test effort is to provide some perceived value to the tester, the testing effort, and the organization. As a tester, the primary reason I automate a test is to:

  • Free up my time,
  • Re-verify baseline assessments (BVT/BAT, regression, acceptance test suites)
  • Increase test coverage (via increased breadth of data or state variability),
  • Accomplish tasks that can’t easily be achieved via manual testing approaches.

For example, the build verification and build acceptance test suites are baseline tests that must be ran on each new build; these tests should be 99.9% automated because they free up my time to design other tests. Tests that evaluate a non-trivial number of combinations or permutations are generally good candidates for automation because they increase test coverage. Performance, stress, load, and concurrency tests should be heavily automated because they are difficult to conduct manually.

It is important to note that I am not simply referring to UI type automation. A significant amount of “functional tests” designed to evaluate the computational logic of methods or functions can be automated below the UI layer in software architectures using OOP and procedural paradigms where the business and computational logic is separate from the UI layer.

There are many papers that discuss specific factors to take into consideration when deciding what tests to automate. Unfortunately, there is no single cookie-cutter approach in deciding what tests to automate. Different projects have different requirements and expectations, and, of course, not all tests are equal. One of the best papers I’ve read on deciding what tests to automate is When Should a Test Be Automated by Brian Marick. I like the simplicity in his 3 key questions:

  • How much more will this test cost to automate versus running it manually?
    Some people think that automating a test reduces costs because it eliminates the tester from manually executing that test. Unfortunately, this is not always the case. As i talked about in a previous post, visual comparative oracles are notoriously error prone requiring the testers to constantly massage the test code and manually verify results anyway. Sometimes paying a vendor to run a test periodically is cheaper than paying an SDET to tweak the test every build. But, if the population of potential test data is large, or combinatorial testing of non-trivial features then automating that test case is probably a good investment.
  • What is the potential lifetime of this automated test?
    How many times will this test  be re-ran during the development cycle and in maintenance or sustained engineering efforts? Can this test be reused in the next iteration of the product?
  • Does the automated test have some probability of exposing new issues?
    Although I don’t necessarily agree with this question because many automated tests may not expose new issues, but they still provide value to the overall testing effort. For example, I wouldn’t expect tests in my regression test suite to expose new defects because if they do there was a regression in the product. So, I would rephrase this question to ask, “Does this automated test have some probability of exposing new issues, providing additional information that increases confidence, or increases test coverage?

A few other questions I ask myself when I am deciding whether to automate a particular test include:

  • What exactly is being evaluated?
    This is perhaps the first question I ask myself. If the test is evaluating functional or non-functional (stress, perf, security, etc.) capabilities then automation may be worthwhile. But, behavioral tests such as usability tests and content testing are generally not good candidates for automated testing.
  • What is the reliability of automating this test?
    I don’t want to have to constantly massage a test in order to get it to run. So, what is the probability this test will throw a lot of false positives or false negatives? How much tweaking will this test require due to code or UI instability?
  • What are the oracles for this test and can they be automated?
    I don’t want to sit in front of a computer and watch software run software. Also, there is a difference between an automated test and a macro (A single, user-defined command that is part of an application and executes a series of commands). There are different types of oracles, and the professional test designer needs to also design the most effective oracle for the test. By the way…if the most effective oracle is a human reviewing the results then that test should probably not be automated using current technologies.

For each test I consider these questions in deciding whether to automate that test. For some tests, I may ask additional questions depending on the context and the business needs of my organization. I don’t use a cookie-cutter template, or try to fill out some spreadsheet to do a cost comparison based on dollar amounts. It’s hard to put a price on value. Instead, I ask myself a few key questions to help me decide if automating a test is worth it to me, my team, and the organization. Is automating a particular test the right thing to do or am I automating something because it’s challenging, or to increase some magical percentage of automated tests compared to all currently defined tests. The key message here is not to blindly automate everything; use your brain and make smart decisions about whether each test should be automated and being able to explain how automating that test benefits the testing effort.

Measuring Test Automation ROI

Originally Published Tuesday, August 25, 2009

I just finished reading Implementing Automated Software Testing by E.Dustin, T. Garrett, and B. Gauf and overall this is a good read providing some well thought out arguments for beginning an automation project, and provides strategic perspectives to manage a test automation project. The first chapter made several excellent points such as:

  • Automated software testing “is software development.”
  • Automated software testing “and manual testing are intertwined and complement each other.”
  • And, “The overall objective of AST (automated software testing) is to design, develop, and deliver an automated test and retest capability that increases testing efficiencies.”

Of course, I was also pleased to read the section on test data generation since I design and develop test data generation tools as a hobby. The authors correctly note that random test data increases flexibility, improve functional testing, and reduce limited in scope and error prone manually produced test data.

There is also a chapter on presenting the business case for an automation project by calculating a return on investment (ROI) measure via various worksheets. I have 2 essential problems with ROI calculations within the context of test automation. First, if the business manager doesn’t understand the value of automation within a complex software project (especially one which will have multiple iterations) they should read a book on managing software development projects. I really think most managers understand that test automation would benefit their business (in most cases). I suspect many managers have experienced less than successful automation projects but don’t understand how to establish a more successful automation effort. I also suspect really bright business managers are not overly impressed with magic beans.

Magic beans pimped by a zealous huckster are the second essential problem with automation ROI calculations. Let’s be honest, the numbers produced by these worksheets or other automation ROI calculators are simply magic beans. Now, why do I make this statement? Because the numbers that are plugged into the calculators or worksheets are ROMA data. I mean really, how many of us can realistically predict the number of atomic tests for any complex project? Also, do all tests take the same amount of time, or will all tests be executed the same number of iterations? Does it take the same amount of time to develop all automated tests, and how does one go about predicting a realistic time for all automated tests to run? And of course, how many of those tests will be automated? (Actually, that answer is easy….the number of automated tests should be 100% of the tests that should be automated.)

Personally, I think test managers should not waste their time trying to convince their business manager of the value of a test automation project; especially with magic beans produced from ROMA data. Instead test managers should start helping their team members think about ROI at the test level itself. In other words, teach your team how to make smart decisions about what tests to automate and what tests should not be automated because they can be more effectively tested via other approaches.

In my next post I will outline some factors that testers, and test managers can use to help decide which tests you might consider automating. Basically, the bottom line here is that an automated test should provide significant value to the tester and the organization, and should help free up the testers time in order to increase the breadth and/or scope of testing.