Skip to content

Automation Anarchy

After a 1 year sabbatical from writing the time has come for me to get back to sharing some ideas, and hopefully provide some thought provoking discourses. Both personally and professionally it has been an exciting year with many challenges and new opportunities. Professionally I am now working on solutions to improve the effectiveness and efficiency of our automated tests (more on that later). Another new opportunity for me has been teaching courses on software testing fundamentals at Duy Tan University in Vietnam. I must admit that sitting on the beach in Hoi An and writing this post suites me very well.

Anyway…back to the testing stuff. Earlier this year I started working on a new project at work that I have dubbed “Automation Anarchy.” As a product matures from one version to the next the pool of automated tests also continues to expand as legacy tests are ported forward and additional tests are added for new features. But as the number of automated tests increase the number of helper methods in the automation stack are also increasing.

One problem with this seemingly unending piling on of test code is that the build times and run times of the automation runs continues to creep up. In a world of high volume automation and quick build cycles (2 or 3 a day on branches) we don’t have the luxury of letting our automation churn for hours. The results needed to decide whether we integrate lower branches up into the main branch must percolate to management within 1 to 2 hours after the build.

Another problem with the non-stop growth of test code includes the additional costs to maintain the code base. Changes in product behavior may require changes in the test code. Some of the automated tests will become obsolete. Poorly designed tests, or tests without a clear purpose generally require more maintenance costs. The most costly tests are those that are unreliable. Sometimes they pass, sometimes they don’t complete and throw a test exception during execution. Every time an automated test fails to run or throws a test exception while executing that test case must be investigated. This takes time and valuable resources away from other tasks on the tester’s plate.

Duplicate and/or redundant automated tests are sometimes inadvertently added to a test suite when the tester changes and the intent or purpose of an automated test is unclear or not well documented. Duplicate and redundant tests do not provide any additional or useful information. The primary purpose of testing is to provide the necessary information to enable decision makers make business decisions. Automation can help provide that important information more timely compared to manual testing. But, if redundant or duplicate tests#160; are not providing valuable information then they are wasting time, or even worse these tests can ambiguate test results.

Finally, the proliferation of test code also causes our wonderfully intended automation stack to start to decompose. Parts of the automation stack decay and results in pockets of dead code. Obsolete tests and helper methods linger in the code base. And sometimes as we start to dig deeper we find code clones in the automation stack. Sometimes these clones are benign and the result of a tester copying and pasting a chunk of test code, a helper method, or maybe even an entire class. But, other times some of the code clones is not only copied but then mutated. Code clones and mutations of code clones are like a cancer within the automation stack. Cloned code obfuscates the code base, and also increases maintenance costs.

In the future I will write more about this project and hopefully offer some suggestions to help others prevent automation anarchy from taking over their testing projects.

Automated tests are like dandelions. Carefully cultivated dandelions are used for foods and medicines around the world, and the plant can also be used to help bring nutrients to other plants. But, left unchecked and untended the dandelion is a prolific weed that will take over and ruin an otherwise beautiful garden./p

100% Automation Pass Rates

Well, now that the summer has “mostly” passed and we are “mostly” finished with our latest release I am finally able to see light at the end of the tunnel. So, time to start blogging again!

Back in May my colleague wrote a post on why 100% automation pass rates are bad. In the past I also questioned whether striving for 100% pass rates in automated test passes was an important goal. I wondered whether setting this goal focused testers on the wrong objective. I  was concerned that my managers might assume that 100% automation implied the product was of high quality. And I struggled with the notion of investing time in automated tests that won’t likely find new bugs after they are designed versus time spent “testing.”

Automated testing is now more pervasive throughout the industry as compared to the past. Automated tests running against daily builds are an essential element in an Agile lifecycle. Continuous integration (CI) requires continuous feedback from automated tests. So, let’s dispel some myths and explain why 100% pass rates are not only a good thing, but are and important goal to strive towards. Let’s start by understanding the nature and purpose of many automated test suites.

Even with high volume automation the number of automated tests are a relatively small percentage of the overall (explicit and exploratory) testing effort dedicated to any project. This limited subset of targeted tests does not in any way imply the product works as expected. An automated test suite provides information about the status of specific functional attributes of the product for each new build that incorporated changes in the code base. The ability of those tests to provide accurate information is determined by the design and reliability of the test and the effectiveness of the oracle used by that test. Basing the overall quality of a product on a limited subset of automated test is as foolish as the notion that automated tests will replace testers.

Also, most automated testing suites are intended to provide baseline assessments, or measure various aspects of non-functional behaviors such as performance, bandwidth, or battery usage. For example, my unit tests provide a baseline assessment of a function or method in code, so when I refactor that code at a later date my unit tests validate that method works exactly as before the code change. Likewise, many higher level test automation suites are some form of regression tests similar to lower level unit tests. Regression tests help identify bugs that are introduced by changes in the product code through refactoring, adding or removing features, or fixing bugs. Automated regression test suites are usually not intended or designed to find “new” bugs. Automated regression test suites provide a baseline and can help find regressions in product functionality after changes in the code base.

We should also remember that the purpose of testing is to provide information continually throughout the lifecycle of a product. Testers who are only focused on finding bugs are only providing one aspect of information. I know many books have been written touting the purpose of testing is to find bugs. However, we should get beyond outdated and limited views of testing and mature the profession towards the more valuable service of collecting and analyzing data through tests and providing the decision makers with the appropriate information that will help them make the appropriate decisions. Testers should strive to provide information that:

  • assesses the product’s ability to satisfy explicit and implicit goals, objectives, and other requirements – measure
  • identifies functional and behavioral issues that may negatively impact customer satisfaction, or the business  – find bugs

Automated tests are just one tool that helps the team (especially management) quickly assess a product’s ability to satisfy a limited set of explicit conditions after changes in the code base (especially in complex products with multiple developers and external dependencies). Automated tests enable earlier identification of regressions in the product (things that used to work but are not broken), and also provides a baseline for additional testing.

Automation pass rates below 100% indicate a failure in the “system.” The “system” in this case includes the product, the test code, the test infrastructure, or external dependencies such as Internet connections. An automated test that is failing due to a bug in the product indicates that product doesn’t satisfy specific functional, non-functional, or behavioral attributes that are important. (I wouldn’t write time writing automated tests for unimportant things. In other words, if the automated test fails and the problem doesn’t have a good probability of being fixed then I likely wouldn’t waste my time writing an automated test.) So, if the pass rate is something that is looked at every day (assuming daily builds) by the leadership team then there is usually more focus on getting a fix in sooner.

Tests that are reporting a failure due to faulty test code are essentially reporting false positives. False positives indicate a bug, but in this case the bug is in the test code; not in the product. Bugs in test code are caused by various reasons. But, whenever a test throws a false positive it eats up valuable time (testers need to troubleshoot/investigate/fix) and also reduces confidence in the effectiveness of the automation suite. An automated test suite should be bullet proof…and testers should adopt zero tolerance for faulty or error prone test code. Ultimately, every test failure (test fails, aborts, or the outcome is inconclusive) must be investigated or the team may become numb to failing tests.

Becoming numb to automated test pass rates less than 100% is a significant danger. In one case a team overlooked a product bug because pass rate was consistently around a 95% and they had stopped investigating all failures in the automated test suite. That team became accustomed to the pass rate varying a little due to unreliable tests that sometimes passed and sometimes threw a false positives due to network latency. So, when 1 test accurately detected a regression in product functionality is went unnoticed because the team became numb to the pass rate and did not adequately investigate each failure.

The bottom line is that teams should strive for a 100% pass rate in functional test suites. I have never met a tester who would be satisfied with less than 100% of all unit tests passing, so we shouldn’t be hypocritical and not demand the same from our own functional automated test suites.

Sleepy Automated Tests

Here we are almost half way through the year. Once again Seattle has been unseasonably cool with less than 10 days above 70 degrees so far. The Seattle area is nice, and it is especially beautiful on a warm sunny day. But I am a water-baby at heart and really enjoy consistently warm days. I sometimes long for the island life when I could walk down to the ocean and jump in for a swim or go surfing, diving, sailing or just about any other waterborne activity. So, these unseasonably cool temperatures and these consistently gray skies (you know its bad when you can readily identify 265 different shades of gray) are taking its toll. I would rather be outside doing something but more often then not find myself curled up on the couch reading a book, or falling asleep and dreaming of warmer climes.

Some automated test suites also succumb to sleepiness. One of the most common problems with automated tests is synchronizing the automated test and the application or service under test. Automated tests sometimes race through their instruction set faster than the application or service can respond leading to false positive (test reports a failure but there is no bug). Often times testers use a Band-Aid approach to solve the problem by sprinkling Sleep() methods throughout their test code in an attempt to synchronize the test code and the application or service under test. Unfortunately, these Sleep() methods may negatively impact the performance of an automated test and artificially increase the time of the automated test pass.

In general, sprinkling Sleep() statements in code is not highly recommended. Sleep() methods halt execution of the test code for the specified period of time regardless of whether the system under test is ready or not. Basically, the more a test “sleeps” the longer it takes that test to run. The time required to run an automated test suite may not seem important. But, if you have daily builds and your automated regression test suite takes more then 24 hours then obviously you are not running your full regression suite on each daily build. If you have a daily build and your automated build verification test (BVT) suite takes 8 hours that means that testers are probably spending some amount of time testing on ‘yesterday’s build’ and if they find a bug the developers will likely tell them to try and repro it on today’s build after the BVT is complete. Many of our teams partnered with developers to run a subset of our functional tests as part of a pre-check-in test suite before code fixes. We agreed the total time for a pre-check-in test suite including unit tests to not exceed 15 minutes. The bottom line is that the time it takes to run an automated test suite is important, and the longer a test takes to execute the longer it takes to get the results.

In various code reviews of test code I have found stand alone Sleep() statements for as much as 2 minutes, Sleep() statements that were inadvertently left in the code during test development or troubleshooting, and Sleep() statements in polling loops that were 5 seconds or more. As a general rule of thumb, wrapping a Sleep() method in a polling loop rather than having a stand alone statement with a call to a Sleep() method is a best practice in test automation. But, it is almost always better to increase the poll count (or number of retries of the polling loop) and decrease the Sleep() time rather than have a long Sleep time with a short number of retries.

For example, some tests will stop execution of the automated tests for some period of time (often using a magical number pulled right out of the blue) to give the AUT time to launch and ready to respond, or service to start, or to consider network latency.

   1:        // Launch AUT
   2:        Process myAut = new Process();
   3:        myAut.StartInfo.FileName = autName;
   4:        myAut.Start();
   6:        // Stop the automation for 5 seconds while
   7:        // the AUT launches (or wait for network delays)
   8:        System.Threading.Thread.Sleep(5000);
  10:        // Start executing the test 
  11:        // This assumes the system's state is in the
  12:        // expected condition to conduct the test

Of course, this scripted test blindly assumes that the system will be in the proper state after a delay of 5 seconds. If it is not, the test code will try to execute the code and the test will likely fail miserably. This is where a polling loop can be used to help synchronize your test with the system, but either allows the test execute as soon as the system is ready to respond, or exit the test if the system is taking too long to respond.

   1:        try
   2:        {
   3:          // Launch AUT
   4:          Process myAut = new Process();
   5:          myAut.StartInfo.FileName = autName;
   6:          myAut.Start();
   7:          WaitForAutReady(myAut);
   9:          // Start executing the test
  11:        }
  12:        catch (Exception msg)
  13:        {
  14:          Console.Write(msg.ToString());
  15:          // handle exceptions
  16:        } 
  17:      }
  19:      public static void WaitForAutReady(Process aut)
  20:      {
  21:        int retry = 50;
  22:        while (!(retry-- == 0))
  23:        {
  24:          if (aut.Responding)
  25:          {
  26:            return;
  27:          }
  28:          System.Threading.Thread.Sleep(100);
  29:        }
  31:        throw new Exception(
  32:          "AUT takes more than 5 seconds to respond");
  33:      }

In this example we can use a polling loop to wait up to some predetermined amount of time for the AUT or system to get into the desired state, but

  • the test will restart executing if the system is in the desired state prior to the allotted time (in other words, if the AUT is responding within 1 second the WaitForAutReady method will return and the test will start doing it’s thing
  • if the system does not achieve the necessary state within the allotted time the method throws an exception, the test execution stops, and the test case result is blocked or indeterminate (something went wrong during test execution that is preventing the test from determining a pass/fail result).

This is a rather simple example, but in most situations the use of a polling loop is way better than a simple Sleep() statement that stops test execution for some period of time. Polling loops can be used to

  • help synchronize test execution with the system state
  • reduce overall test execution time by allowing the test to run when the system state is ready
  • help troubleshoot race conditions
  • prevent false positives in your test results

So, stop putting your automated tests into periodic comatose states, and use the system’s state to determine when a test needs to rest for a few milliseconds to let the system catch up.

Testing with Surrogate Code Points

It has been a very long time since my last blog post; too long. I have been extremely busy this past year and have been doing a lot of juggling. In some cases I tried juggling too many balls and dropped a few balls. But I have learned quite a bit during my transition from “academia” back into the product groups here at Microsoft, and I have learned a lot about what it means to be a great test lead shipping world class software. Despite the bumps I love my new career direction in Windows Phone team and I finally feel things are coming under control. So, it is time now to once again share some of the things I’ve learned and continue to learn in my journey as a software tester.

Let’s start with a discussion of a problem I came across the other day while doing some testing around posts and feeds (uploads and downloads to social networks such as Twitter, Facebook, etc). Over the years I have frequently mentioned testing with Unicode surrogate code points in strings and using Babel string generation tool to help increase test coverage by producing variable test data composed of characters from across the Unicode spectrum.

Surrogate pairs are often problematic in string parsing algorithms. Unlike “typical” 16-bit Unicode characters in the base multilingual plane (BMP or Plane 0) surrogate pairs are composed of 2 16-bit Unicode code points that are mapped to represent a single character (glyph). (See definition D75 Section 3.8, Surrogates in The Unicode Standard Version 6.1) Surrogate code points typically cause problems because many string parsing algorithms assume 1 character/glyph is 1 code point value which can lead to character (data) corruption or string buffer miscounts (which can sometime lead to buffer overflow errors).

twitter websiteAs an example of string buffer miscounting let’s take a look at Twitter. It is generally well known that Twitter has a character limit of 140 characters. But, when a sting of 140 characters contains surrogate pairs it seems that Twitter doesn’t know how to count them correctly and displays a message stating, “Your Tweet was over 140 characters. You’ll have to be more clever.”

Well Twitter…I was being clever! I was clever enough to expose an error path caused by a mismatch between the code that counts character glyphs and the code that realizes there are more than 140 16-bit character code points.

Although there is a counting mismatch at least Twitter preserved the character glyphs for surrogate code points in this string.

tweetdeckUnfortunately, TweetDeck is what I refer to as a globalization stupid application. TweetDeck doesn’t have a problem with character count mismatches because it breaks horribly when surrogate code points are used in a string.

There is some really wicked character parsing when the string is pasted into TweetDeck. TweetDeck solves the character count problem by blocking any character that is not an ASCII character from the string. (Note: the “W” character is a full-width Latin character U+FF37 not the Latin W U+0057.)

I find it hard to believe that a modern application would limit the range of characters it allows customers to use; especially an application targeted towards users of the world wide web.

API Testing–How can it help?

After a rather wet and soggy weekend, I woke up this morning to a beautiful sunny day in Seattle. Despite it being a bit cool, I do enjoy the sunshine so much more than the dreary gray days of a Seattle winter. Most of the leaves have fallen from the trees which makes good mulch for the gardens, but just adds more work to my stack. The good news is that there is snow in the mountains and the ski resorts in the area have opened early this year, so I hope to get in some good ski days.

In the previous post I attempted to explain the subtle differences between unit testing and API testing. It should also be noted that testing at the API layer is different than testing through the GUI. API testing is primarily focused on the functionality of the business logic of the software, and not necessarily the behavior or the “look and feel” from the end user customer perspective. In fact, the 1st tier customer of the API tester is the developer who designed and develops the API. The second tier customers are the developers who utilize the APIs in building the user interface or applications that sit on top of the underlying infrastructure of a program. And finally, API testers must also consider the various end-to-end scenarios of the end user customers at the integration level of testing (without a GUI).

This post will discuss why API testing is an important activity in the complete software development lifecycle (SDLC). Teams that have multiple developers and a continuous integration (CI) build process can greatly benefit from API testing. Key benefits of API testing include:

  • Reduced testing costs
  • Improved productivity
  • Higher functional (business logic) quality   

Reduced Testing Costs

It shouldn’t be a surprise to anyone that finding most types of functional bugs early in the SDLC is more cost effective. The primary goal of unit testing, and component/integration levels of testing (API testing) is to flush out as many functional issues in the business logic layer of a software program as early as possible in the SDLC. Driving functional quality upstream not only reduces production costs, but can also reduce testing costs.

API testing can reduce the overall cost of test automation.  Automated API tests are based on the API’s interface. So, once the API interface is defined testers can begin to design and develop automated tests. Having a battery of automated tests ready as the functional APIs come on-line pushes testing upstream in parallel with development rather than later in the SDLC. This also enables earlier tester engagement and closer collaboration between testers and developers.

Also, since API interfaces are generally very stable, so automated API tests are less impacted by changes as compared to GUI based automated tests. Many testers are familiar with the constant upkeep and maintenance typically associated with GUI based automated tests. The constant massaging of GUI automation is often a huge cost in a test automation effort and a contributing factor to why so many automation projects fail. Automated API tests in general require a lot less maintenance unless there is a fundamental change in the underlying infrastructure or design of the program.

Another significant way that API testing can reduce testing costs is by refocusing testing. Many test strategies rely  heavily on finding functional bugs typically using exploratory type testing through the GUI. But, most software produced today is developed in “layers” (see Testing in Layers). A more robust test strategy should focus the bulk of functional testing at the API layer that contains the “business logic” of the program. Of course some functional issues will still be found while testing through the GUI, but the focus of testing at the GUI layer should be on  behavioral testing. A test strategy that provides a multi-tiered approach is more effective than the typical approach of throwing a bunch of bodies to bang on the GUI in an attempt to beat out the bugs. A multi-tiered test strategy may even reduce the total testing time by reducing the need to spend long cycles trying to uncover a lot of functional bugs through the GUI.

Improved Productivity

There are different ways to evaluate productivity, but certainly one way is to ensure production keeps moving forward. Continuous integration is a keystone of Agile development projects, and at Microsoft this means daily builds. If the build breaks, production grinds to a virtual halt and forward momentum is blocked until the issue is fixed. A build break negatively impacts the productivity of the entire team. A suite of low level integration tests can help identify potential build breaks especially involving dependent modules before new fixes or features are merged into a higher level branch.

API testing can also improve productivity of testing. For example, structural testing is a white box test approach intended to test the structure or flow of a program. If increased levels of code coverage is an important goal, then the most efficient way to improve structural coverage is to identify untested code paths and design and develop API level tests that will tactically target untested code.

Perhaps the most significant improvement to productivity is gained through teamwork. Building and releasing great software products require a team effort. A team of people working closely together. Gone are the days of the adversarial relationship between developers and testers. Changes in technologies, changes in customer demands, and changes in how we build software require close collaboration between developers and testers, and testers being actively engaged throughout the SDLC and not just at the beginning (picking apart a spec), or at the end (banging out  bugs via the GUI pretending to mimic a ‘customer’). A team focused on delivering high quality can greatly add to a team’s overall productivity.

Higher Functional Quality

One of the advantages and also disadvantages of testing at the API layer is that you can test the API in ways that are different then how the GUI interacts with the API. For example, the Morse Code Trainer has an interface for the methods that parse the dots and dashes and plays a system beep of 1 unit duration for each dot in the stream, and a system beep for 3 units of duration for each dash in the stream. The duration of a unit is based on the WordsPerMinute property value.

   1:    interface ISoundGenerator
   2:    {
   3:      void PlayMorseCode(string morseCodeString);
   5:      int WordsPerMinute { get; set; }
   6:    }

Testing this property at the API level we could “set” a negative integer value to make sure nothing really bad happens. But, a well-designed GUI would never accept an integer value less than 1 (which is painfully slow) nor above 150 (which is ridiculously high). A better design might be to use a drop-down list of values ranging from 5 (the minimum requirement for a basic license) to 20 words per minute (required for the highest level amateur radio operator license). Of course, it may be possible to find functional anomalies while API testing that could not be found via testing through the GUI. But, the important thing an API tester must consider is how a bug found at the component or integration levels of testing adversely affects a scenario, or the customer.

API testers work alongside of the developers. An API tester may also provide input into the initial API design, engage in code reviews before check-ins, and of course write automated tests to test the API (component level) and APIs in end-to-end scenarios (integration level). Having testers engage with developers early and throughout the SDLC helps ensure team work and instills the idea that quality is a collaborative effort.

API Testing–Functional Testing Below the User Interface

After a long hiatus from writing I am finally carving out some time to put thoughts to words again. A lot has been going on both professionally and personally. On the personal side I will simply say that never take for granted the time someone you care for has on this earth, and make a habit of spending quality time with that person regularly. On the professional side, things have been crazy busy in a very good way. I have settled in at work (still have lots to learn as always), and squeezed out a day to drive to Portland, OR to speak at PNSQC on random test data generation, and also present an online presentation discussing API testing best practices for the STP Online Summit: Achieving Business Value With Test Automation. Based on questions from that session I thought I would follow up with a few posts discussing API testing. Let’s start with describing API testing and how it differs from other “levels of software testing.”

Application Programming Interface (API)

The Microsoft Press Computer Dictionary defines API as “A set of routines used by an application program to direct the performance of procedures by the computer’s operating system.” So, referring to the abstract levels of testing, an API can be a unit, but is more likely a component because it is usually “an integrated aggregate of one or more units.”

An API provides value to both the developer and to the customer. For example, an API:

  • provides developers with common reusable functions so they don’t have to rewrite common routines from scratch
  • provides a level of abstraction between the application and lower level ‘privileged’ functions
  • ensures any program that uses a given API will have identical behavior/functionality (for example, many Windows programs use a common file dialog such as IFileSaveDialog API to allow customers to save files in a consistent manner)

Essentially, an API contains the core functionality of a program, or the business logic as some people refer to it. Customers don’t interact with API’s directly. Customers interact with software via the Graphical User Interface (GUI) which in turn interacts with an abstraction layer (e.g. controller in the MVC design pattern) that interacts with APIs exposed via Interfaces.

Testing an API as a Black Box

Some people assume that API testing is a ‘white-box’ testing activity in which the tester has access to the product source code. But in reality, API testing is truly black-box testing in the truest sense of the testing approach. API testers make no assumptions about how the functionality is implemented, and are not limited by constraints or distracting behaviors of a graphical user interface.

As an example, let’s use a program I developed called Morse Code Trainer. As a boy I was really into electronics (HeathKit projects were routinely on my Christmas list), and in order to pursue my amateur radio license I had to learn Morse code, or CW for short. Although Morse code is not required any longer to get a HAM operators license I think it is not hard to learn (memorize) about 55 sequences of dits and dahs, and in my opinion learning additional languages is good for the brain.

A core bit of functionality in this program is to convert a string of characters (a sentence) to the dits (represented as a period character “.”) and dahs (represented as a dash character “-“). The API to do this bit of magic is:

        string AlphaNumericCharacterStringToMorseCodeString(string input)

and is exposed to the developer who will code the UI and controller via the IMorseCodeEncoder interface.

        interface IMorseCodeEncoder
            string AlphaNumericCharacterStringToMorseCodeString(string input);

           string MorseCodeStringToAlphaNumericCharacterString(string input);

Notice that we don’t see any of the underlying code of how this method actually does its magic.

Let’s assume the developer didn’t do any unit testing and simply threw the code over the proverbial wall for testers to beat on. Since the tester (me) knows the developer (me) didn’t do any unit testing of any of the private methods the API under test relies on, the API tester (me) writes a simple test just to see if this code “works” like the developer (me) assured the tester (me) that it would. The most basic API test looks very similar to the unit test illustrated below, and in fact it this is a unit test the developer should write and execute before chucking a program at testers to bang on. A proper API test would call the method under test from the dynamic link library (DLL), and include initialization, clean-up, utilize the proper test design, have a robust oracle,  and of course have no hard-coded strings.

   1:      [TestMethod()]
   2:      [DeploymentItem("Morse Code Trainer.exe")]
   3:      public void GetMorseCodeStreamTest()
   4:      {
   5:        try
   6:        {
   7:          MorseCodeEncoder_Accessor target = new MorseCodeEncoder_Accessor();
   8:          string input = "A QUICK TEST";
   9:          string expected = ".-  --.- ..- .. -.-. -.-  - . ... -";
  10:          string actual;
  11:          actual = target.GetMorseCodeStream(input);
  12:          Assert.AreEqual(expected, actual);
  13:        }
  14:        catch (Exception e)
  15:        {
  16:          Assert.Fail(e.ToString());
  17:        }
  18:      }


Interestingly enough, had the developer (me) ran this unit test the developer would have discovered an unhandled exception. The unit test failed because this API called a method to get a Dictionary in another class and the Dictionary was created from 2 string arrays (an array of alpha-numeric characters, and an array of Morse code sequences). The specific error was a duplicate key/value in the Dictionary; in other words a duplicate entry in the alphaCharacterArray string array was throwing an System.ArgumentException for duplicate keys. But, because the methods in the MorseCodeLibrary class weren’t unit tested the API to encode a string of alpha-numeric characters to Morse code characters failed its basic unit test.

   1:      public Dictionary<string, string> GetAlphaCharacterToMorseCodeDictionary()
   2:      {
   3:        Dictionary<string, string> AlphaToMorseCodeDictionary = new Dictionary<string, string>();
   4:        for (int i = 0; i < this.alphaCharacterArray.Length; i++)
   5:        {
   6:          AlphaToMorseCodeDictionary.Add(this.alphaCharacterArray[i], this.morseCodeArray[i]);
   7:        }
   9:        return AlphaToMorseCodeDictionary;
  10:      }


But, it actually gets worse. Another API in another class to decode a string of Morse code to alpha-numeric characters would have failed as well because it used the same faulty string array of data to create a Dictionary calling the public method

        public Dictionary<string, string> GetMorseCodeToAlphaCharacterDictionary().

This is actually a good example of 2 very different bugs with the same root cause. This is also good example of how it is more efficient to find functional bugs at the unit,  or component or integration levels of testing (API testing) as compared to finding this problem via functional testing through the user interface.

Unit vs. API Testing

So, you’re probably asking yourself “if the above example is really an example of a unit test the developers should do before throwing their code at testers, then how does unit testing differ from API testing?” When testing a single API call the most significant difference is in the thoroughness of test coverage. Most unit tests are rather simple things. Unit tests are not very complex; unit tests are not comprehensive in test coverage (although a good suite of unit tests should achieve good structural coverage); and unit tests often rely on simplistic oracles.

API tests by contrast are usually more comprehensive as compared to unit tests. API tests usually include both positive tests (does it do what its supposed to do) as well as negative tests (how well does it handle error conditions). While API tests should strive for a high level of code coverage (structural testing) a more important goal is test coverage. For example API tests of this same method might include a series of data-driven tests that:

  • test every known alpha-numeric character defined in Morse code (the population of the variable is small enough to test every element, if the population of a given variable is large then testers should define equivalent partitions and test an adequate number of samples from the population for confidence)
  • test character casing
  • test pangrams, and special signals (e.g. end of message, attention, received, etc)
  • test boundary conditons (e.g string max len although 2 billion+ characters seems excessive for a Morse code transmission)
  • test strings with invalid or characters that are not defined in Morse code
  • test strings with non-ASCII letters that have Morse code encodings (Ä, Á, Å, Ch, É, Ñ, Ö, Ü)
  • test performance to provide baseline measures of individual methods

More complex APIs such as this MessageBox.Show method that have several parameters with variable argument values might benefit from additional testing techniques such as combinatorial testing.

Testing API End–To–End Scenarios

Testing a single API is usually considered unit or component level testing in the abstract levels of testing. Some people consider unit and component level testing to be “owned” by the developer. I certainly agree that unit tests must be owned by the developer, and that developers can do a much better job of component level testing. But, I also think this is a key area where API testers can collaborate more closely with developers to increase the effectiveness of the tests, the data used in the test, and even the test design (e.g. data-driven unit testing).

But, I will suggest that the integration level of testing or “testing done to show that even though the components were individually satisfactory, as demonstrated by successful passage of component tests, the combination of components are incorrect or inconsistent“ is the domain of the API tester. Software applications are complex beasts that often rely on sequences of API calls interacting with databases, cloud services, or other background workers. So, although API testing rarely involves testing through the GUI, API testers must also understand how the various APIs will be used to effect various customer scenarios. The only difference is that API testers emulate these scenarios without navigating a graphical user interface.

For example, one scenario is to convert a string of text into dits and dahs and use the system’s beep to “play” the Morse code sequence over the computer’s speaker. So, this program contains a class to convert the sequences of dits and dahs into sound; the SoundGenerator class. The interface for the sound functions includes a getter and setter, and the PlayCharacterCode API.

   1:    interface ISoundGenerator
   2:    {
   3:      void PlayMorseCode(string morseCodeString);
   5:      int WordsPerMinute { get; set; }
   6:    }


So, although we don’t know exactly yet how the developer and GUI designer will implement the GUI for this program, we can still create a test that inputs a string of alphanumeric characters, encodes the alphanumeric string into a Morse code encoded string, and then passes the string of Morse code dits and dahs as an argument to the PlayMorseCode method. This is a rather simple example of an end-to-end scenario. In more complex application the API functions/methods would likely be compiled in one or more dynamic link libraries (DLLS), rely on mocks, fake servers, and possibly other emulators. Of course, the oracles for this type of API testing is also more complex and generally involves checking multiple outcomes or states.

API testing focuses on an application’s functional capabilities, whereas testing through a GUI should focus primarily on behavior, usefulness and general ‘likeability.’

Decoding the Secrets in Unicode Strings

At the end of each week one of the last things I do is open my junk mail folder in Outlook and check to see if an email was moved there inadvertently before deleting all the spam that let’s me know that I’ve won the lottery in Ethiopia, or that my long lost relative in Chechnya left me 19 bazillion Euros, or the countless discount drug offerings. So, as I was going through my Friday evening spam mail deletion ritual I noticed a subject line that was a bit unusual. Before you jump to any incorrect conclusions it wasn’t about appendage enlargement, or free internet dating services. The email title was in Arabic, but included a “box” character at the beginning of the string.


Now, I don’t read Arabic, but I am pretty good at noticing globalization bugs when they are staring me right in the face. The “box” character (actually a glyph) in a Unicode string either represents an Unicode code point that is unassigned (it doesn’t have a character associated with that code point value), or the system doesn’t have a font that maps a glyph (the character we see) to that particular Unicode code point. So, curiosity got the better of me, and I decided to investigate a bit. The first thing I did was to right click on the email subject line and paste it into Notepad and notice that the “box” glyph did not appear.


imageA few years ago I developed a utility for decoding Unicode Strings aptly called “String Decoder” and also wrote a post that discusses the tool. So, I launched String Decoder and copied the Arabic string from Notepad and pasted it into the String Decoder tool.

The first thing I notice when reading through the list of Unicode code point values is the value U+FEFF. Now, I happen to know that this particular value is a byte order mark (BOM). This seems pretty unusual and ask myself how a BOM character could get inserted in a string. So, I look up the character in the Unicode Charts and discover that in the Arabic Presentation Forms-B character set this was a special character for a zero width no-break space that as been deprecated. Ah, so the Unicode BOM code point value appearing in the string is not so magical after all!

Interestingly enough, the U+FEFF character only displays as a “box” glyph in the subject line in the Junk E-mail folder. When I copied the email message from the Junk folder to my Inbox (or other folder) the code point U+FEFF is treated as a zero width non-breaking space character so no box glyph appears. This is due to the fact that when an email gets shunted into the Junk E-mail folder “links and other functionality have been disabled.” In other words, it is plain-text.

I previously also wrote about using “real world” test data for globalization testing, and this is another example of “real-world” data can be useful in testing text inputs and outputs to evaluate how unexpected character code points in a string are parsed or handled. I think this also bolsters the argument to include some amount of test data randomization using tools such as the Babel tool in globalization testing to potentially test for other unexpected characters or sequences of mixed Unicode characters.

More Thoughts on Leadership

I have been in my new role as Test Lead 6 months now. The experience has been magnified because I am actually leading 2 platform teams; the social networking integration team, and the models team. The learning curve has been exponential. In my transition to this role I took advantage of attending a few HR courses to refresh my knowledge in management principles. I also read quite a few books. Perhaps the single book that I read that helped reinforce my ideas of leadership (outlined in this blog post) was The Mentor Leader: Secrets to Building People and Teams that Win Consistently by Tony Dungy. This is a great book for leads/managers and anyone who mentors others.

If you ask any lead they will likely agree that their success as a lead hinges largely on their team. But, if you ask leads what their first priority is they will likely say shipping a product, or managing testing of some feature area they have been assigned. Yes, ultimately we need to ship a product and do our best to make sure our feature areas are adequately tested in an attempt to improve our customer’s overall experience. There are many ‘managers’ throughout the industry who are good at manipulating ‘resources’ to achieve some desired result or filling in magic numbers on a balanced scorecard. Balanced scorecards provide some value to a business, but sometimes managers lose sight of what is most important and focus on doing mundane things that will twiddle the numbers to make it fit into the scorecard to hype success. But, managing resources to ship a product is different than leading a team of people to achieve, and sometimes exceed goals and visions.

Leadership is much more than management. A successful leader manages projects by articulating a clear vision, guiding people towards achieving goals, and motivates people by helping them grow. When folks ask me what my first priority is as a Test Lead I say it is the people on my team. But, what does that mean?

Open doorways to dreams!

One of my primary responsibilities as a lead is to help the people on my team grow and expand their scope of influence and impact not only on my team, but ultimately within other teams across Microsoft. Of course it is always hard to see someone on our team leave for new opportunities, but good leaders understand the career aspirations of the people on the team and work with them to help them achieve those dreams. Leaders find opportunities that will help people develop skills that will benefit both the project and the person. Leaders should be truly invested and take an active role in helping people on their team grow even if that means the person will eventually leave the team to find new challenges. Managers fear losing the people on their team; leaders nurture people on their teams and open doorways to dreams and new opportunities. Think of it this way, would you rather join a team in which the manager holds on to people until they burn out, or a team in which the leader has a track-record of helping people grow into their next job.

Delegate responsibility not just work!

Like many other leaders, I have many balls to juggle, and I can’t juggle them all alone. So, as leads we must delegate some of the things on our plates. But, delegation is more than assigning tasks to people. Delegation is endorsing people on your team who will represent you have be responsible for driving a project that has a broad scope of impact. Of course, delegation also doesn’t mean just throwing ideas out there and seeing what happens. A leader who delegates work will set clear expectations and realistic goals, coach for success, provide guidance on how to build upon success, and perhaps most importantly empower the person to make decisions on their own. When we delegate we should set people up for success; not throw them into the fire of failure.

Encourage risk and accept failure!

Sometimes when people know that I am an avid sailor they will ask me to teach them to sail. I love sharing knowledge and experiences about things that I am passionate about with people who are interested in learning. Sometimes people are hesitant to do something because they don’t want to break something, or do something wrong. I make it very clear from the start that every inanimate object on the boat is replaceable, and while back-winding a sail means we steered too far into the wind it can always be corrected. I know many ‘captains’ who yell and shout when a line gets twisted and jams in a sheave, or someone accidentally releases the main halyard while under sail. It’s our reactions to such situations that provide a positive learning experience or turn our experience into a day of hell on the water. Leaders encourage people to try new things, innovate, and experiment. Leaders also know that sometimes things might not work out perfectly and should be willing to protect people from harm (either physical harm on the boat, or professional/political harm at work), and rebuild a persons confidence when things don’t work out so well.

The burden of blame!

At the end of the day I can’t point fingers and say “so and so didn’t do such and such,” or “if things weren’t so screwed up to begin with we wouldn’t be in this mess.” As a lead I am accountable. If things go wrong I first look at my own leadership to see if I failed to set clear expectations, or neglected to provide adequate guidance (without hand-holding), or did I “delegate and disappear.” Ultimate the responsibility of achieving my team’s goals and objectives is mine. We succeed as a team, or I fail as an individual.

Attitude adjustment!

I sometimes see managers who are grumpy or apathetic. I sometimes hear managers say, “I don’t like this either, but we have to do it to satisfy some other manager or scorecard criteria.” A good leader understands asks and explains why and how they might provide value to the requestor. I know that my attitude affects the people on my team, and if I appear empathetic towards their ideas then they will likely not share their innovative ideas with me. If I am constantly complaining about something, then my team learns to complain about similar things and we start to look like a bunch of whiners. (Nobody really likes whiners. People might try to appease whiners from time to time, but ultimately they just want the whiners to go away.) Leaders know they are being watched and should always project a positive attitude.

So, after 6 months of being back in the trenches, shipping a product, and facing some tough challenges I will say that I am still loving it!

Dealing with locale/language specific static test data

Photo_8E4D6B91-9BD2-E46E-F9EB-0E718B64C8E1Photo_EA7B694A-5664-CD3C-B691-B859E85F742CIt has been sometime since my last post. This seems to happen every so often lately; not because I don’t have anything to write about but mostly due to having too many irons in the fire so to speak and juggling hot irons is never fun and one is always going to drop. Also about this time every year I go sailing in the San Juan Islands or the Gulf Islands of British Columbia. This year I went to the San Juan Islands, and spent a few days incommunicado anchored in Shallow Bay on Sucia Island. Sucia. Echo Bay is a great anchorage with sandy beaches (unusual for the PNW), and the famous China Caves to explore.

Another place I have been known to explore from time to time is the Stack Exchange Software Quality Assurance and Testing forum. There are many interesting questions and a great variety of responses that offer a wealth of information or provide different perspectives. Recently a question was posed about how to read in static test data for a specific locale or language. Many regular readers know that I am a strong proponent of pseudo-random test data generation in conjunction with automated testing to increase the variability of test data used in each test iteration and generally improve test coverage. But I also understand the value of static test data in providing a solid baseline, and in some cases enabling access to specific test data in different locales or languages.

For example, suppose I am testing a text editor application and I want to read in a text file in the appropriate language based on the operating system current users locale settings. In this situation, I could save a text file containing strings or sentences for each target language or locale dialect. Each file would get a unique name based on the 3 letter ISO-639-2 language name (the complete list is at, prepended it to a common filename that describes the contents and the appropriate extension. For example,

  • ENG[TestData].txt would be English
  • ZHO[TestData].txt would be Chinese
  • DEU[TestData].txt would be German

To get the appropriate text file auto-magically read in to the test at runtime the only thing we would need to do is to get the current user locale using the CultureInfo class Three Letter ISO Language Name property in C#.

   1:              string testDataFileName = "testdata.txt";
   3:              CultureInfo ci = CultureInfo.CurrentCulture;
   5:              // Path to server location where static files exist 
   6:              string path = Path.GetFullPath(
   7:                  Environment.GetFolderPath(Environment.SpecialFolder.Desktop));
   9:              // Read file contents
  10:              using (StreamReader readFile = 
  11:                  new StreamReader(Path.Combine(
  12:                      path, string.Concat(
  13:                      ci.ThreeLetterISOLanguageName, testDataFileName))))
  14:              {
  15:                  //parse test data and do test stuff
  16:              }


Notice we concatenate the filename (and extension) and the 3-letter ISO language name in line 13 and then combine that with the path to the file location and read the file contents using StreamReader.

But, we might need more specialization depending on what we are testing. For example, if we were testing a spell checker for US versus Great Britain (and Canada), or testing simplified Chinese and also traditional Chinese. In this case the ISO 639-2 specification does not delineate between simplified Chinese and traditional Chinese or US English and British English.  In this case we could “make up” a 3-letter designation such as GBR for Great Britain, or CHT for Chinese (traditional).

Or, perhaps a better solution would be to use the Locale Identifiers (LCID) used by Windows to identify specific locales (rather than languages). The solution is identical to the above except instead of calling the ThreeLetterISOLanguageName property we call the LCID property as illustrated below.

   1:              string testDataFileName = "testdata.txt";
   3:              CultureInfo ci = CultureInfo.CurrentCulture;
   5:              // Path to server location where static files exist 
   6:              string path = Path.GetFullPath(
   7:                  Environment.GetFolderPath(Environment.SpecialFolder.Desktop));
   9:              // Read file contents
  10:              using (StreamReader readFile = 
  11:                  new StreamReader(Path.Combine(
  12:                      path, string.Concat(
  13:                      ci.LCID, testDataFileName))))
  14:              {
  15:                  //parse test data and do test stuff
  16:              }

Of course, now we would need to name our static file names with the appropriate LCID decimal number such as

  • 1028testdata.txt would be traditional Chinese used in Taiwan, and
  • 2052testdata.txt would be for simplified Chinese used in PRC

Personally, I prefer getting the LCID as it provides greater control and more specificity. But the down side of using LCIDs is that if you may end up having multiple files that contain the same contents. For example, although Singapore, Malaysia, and PRC all use simplified Chinese there are 3 different LCIDs.

There are other properties that allow you to get the culture info for the current user in Windows, and the right property to use ultimately depends on your specific needs. But, CultureInfo class members can easily be used to manage localized static data files or even manage control flow through an automated test that has specific dependencies on a language or a locale setting.


The SDET vs STE Debate Redux: It’s only a title!

Every few months the STE vs. SDET debate reemerges like the crazy outcast relative that comes to visit unexpectedly and sits around complaining about imaginative ailments, and reminiscing about how things were in the good ol’ days. We certainly don’t want to be rude to our relatives, so we tolerate their rants while watching the clock and giving subtle suggestions about the late time. But, with the ridiculous ‘debate’ between STE and SDET I can be rude; drop it! It’s a baseless discussion without merit. It’s only a title!

In this previous post I explained the business reasons why Microsoft changed the title from STE to SDET. But, for some reason people commonly mistake the title with the role or job function. In the good ol’ days our internal job description for STE at level 59 included ‘must be able to debug other’s code,’ and ‘design automated tests.’ Almost all STEs hired prior to 1995 had coding questions as part of their interview and were expected to grow their ‘technical skills’ throughout their career.  That was the traditional role of the STE.

As I explained in this previous post we established the title of SDET to ensure that testers at a given level in one organization in the company had comparable skills to another tester in a different organization. As part of the title change, the company decided that we needed to reestablish the base skill set of our testers to include ‘technical competence.’ Unfortunately when the career profiles were introduced some managers misinterpreted ‘technical competence’ with raw coding skills and the naive ideology of 100% automation. These same managers now complain their SDETs don’t excel at ‘bug finding’ and customer advocacy.

On my current team, the program managers are big customer advocates. They run their own set of ‘scenarios’ against new builds at least weekly. My feature area is testing private APIs on our platform. Our primary customers are the developers who consume those APIs, but we also must understand how bugs we find via our automated tests might manifest themselves and impact our customers. So, our team spends quite a bit of time also self-hosting, doing exploratory testing, and we even started a new approach that takes customer scenarios to the n-th degree that we call "day in the life" testing to help us better understand how customers might use our product throughout their busy days. Our product has 93% customer satisfaction.

So, if its true that the SDETs on some teams aren’t finding bugs and lack customer focus (and I suspect it is for some teams) then they hired the wrong people onto their test team. If SDETs don’t balance their technical competence with customer empathy then we have a problem; and I will say it is likely a management problem.

The testing profession is diverse and requires people to perform different roles or job functions during the development process and over the course of their career. Microsoft didn’t eradicate the STE “role” we simply changed the title of the people we hire in our testing “roles” and reestablished the traditional expectations of people in that role.

Differentiating between STE and SDET in our industry seems nonsensical to me, and I also think this false differentiation ultimately limits our potential to positively impact our customer’s experience and advance the profession. Testers today face many challenges, and hiring great testers (regardless of the job title) is about finding people who not only have a passion and drive to help improve our customer’s experience and satisfaction, but can also solve tough technical challenges to advance the craft and help improve the company’s business.