I.M. Testy

Treatises on the practice of software testing

Archive for January, 2011

A Source of “Real-World” Test Data for Globalization Testing

with 4 comments

I am generally not a big fan of static test data. I do know that in the proper context static test data can provide some value. Of course we should be aware of the common problems with files of static test data or (even worse) hard coded test data in a test case. Some problems with static test data include:

  • Stagnation – static test data may add some initial value, but over time simply reusing the same test data over and over in a test diminishes the value of that test. For example, retesting the same name strings in a first name input textbox is not providing any new information if those ‘static’ names worked in the previous build and the underlying functionality has not changed.
  • Contextual blindness – sometimes we have files of static test data that was identified as “problematic” in one situation (context), so we reuse the “problematic” test data regardless of the context. In 1995 I wrote a white paper on “problematic double-byte encoded characters (DBCS) explaining why each code point was problematic in a given context. For example, a Japanese character that began with a 0x5C trail-byte might be problematic in a filename on an ANSI based system that parsed characters by bytes instead of wide bytes. This is not true on Windows systems where the default encoding is the Unicode transformation format of UTF-16. However, some people continue to use obsolete DBCS problem characters perhaps because they don’t fully understand the underlying contextual differences between ANSI based encodings and Unicode.

Perhaps on the opposite end of the test data spectrum is random test data. Many of you that read this blog or have heard me speak know that I am a big proponent of parameterized random test data generation. Parameterization allows us to better model our test data. I know that even parameterized random data can be crafted to be representative of real data, but it is not “real” data.

But, there may be a happy medium between static test data and random test data. And, best of all it is abundantly available. One of the best sources for (especially non-English) test data comes from sources that most of us already use on a daily basis. The test data source I speak of are social networks.

I have met many wonderful people from around the world both in person and virtually, and stayed in contact with many of them. Last year while keynoting at the first software testing conference in Vietnam (VistaCon 2010) I was privileged to meet my dear friend Thuyen, who helped organize the conference. Since the conference we have stayed in contact via email and Facebook. When she posts on Facebook it is usually in Vietnamese. Since I don’t (yet) read Vietnamese I use Bing Translator to help me figure out the comment.

Last week she had an entry on her Facebook wall that began “Tối nay vô tình nghe trên TV 1 bài hát mà giai điệu…” So, I copied the entry and opened Bing Translator to translate the entry.

image

Many of you will quickly notice the strange anomaly in the translation. I initially thought that this service might be incrementing this numeric value for some reason, but when I changed the number value to 2 the number 2 displayed in the translated string. I tried various other numbers and quickly discovered that 6 incremented to the number 7, 8 decremented to 7, and 9 decremented to the number 3. I didn’t see a clear pattern here so I thought this might be an issues resulting from parsing a particular sequence of characters.

So, I modified different parts of the string (removed words) to narrow down the problem. I found the string “tình nghe trên TV 1 bài hát mà giai điệu” contained the problematic sequence. Removing any ‘word’ from this string displayed the translated string with a number of 1, with the exception of 1 word. Removing the word “nghe” from the above string resulted in the translation illustrated below.

image

imageBy the way…the Google translation engine doesn’t fair much better. And, the results are different between www.google.com/ig and http://translate.google.com.

But, the purpose of this post is not to illustrate this particular bug, but to give you ideas of how we can use social network feeds in our testing. People around the world use social networks and you can find “real world” strings in various languages that you can use as test data in various contexts. Most of the time this ‘test data’ will not likely result in a bug; but sometimes it can reveal interesting issues. Best of all, strings taken from social networks are not some manufactured static or random test data. Using strings copied from social networks is about as “real world” as we can get…this is the “data” from our customers.

Written by Bj Rollison

January 23rd, 2011 at 10:14 am

Sometimes bugs find you

with one comment

Well, it’s another new year. Like a lot of folks I have spent the last few days reflecting on the past year and contemplating this coming year. I won’t bore you by rambling on about my thoughts, reflections, or ambitions; they are mostly personal. Professionally, I will continue to strive to improve myself in my chosen discipline, seek out new challenges and opportunities, and share my experiences with those who follow my posts. The beginning of the year is a busy time for me arranging my conference schedule (which I am cutting back on) and preparing to teach a software testing course and a software test automation course at the University of Washington, and getting engaged with 2 key internal projects intended to help improve our internal engineering processes at Microsoft.

Many of you know that I started at Microsoft on the Windows International Test Team, and internationalization (I18N), globalization (G11N) and even localization (L10N) are topics that I have always been interested in. In 2009 I posted a series on localization testing (Part 1, Part 2, Part 3, and Part 4). Last year, I designed and developed an internal class on globalization testing basics for non-globalization experts who want to incorporate globalization test strategies into their test designs in an attempt to find bugs sooner and drive quality upstream. As the world becomes even more connected and there is rapid growth of customers around the globe it just makes sense that we need to design our software that our customers will value in their (local) context. So, over the next few weeks I will do a series of posts on globalization testing.

But first, I’d like to share a globalization type bug that I found…or should I say that found me. This is a great example of bugs that you might come across using a strategy I used in my previous teams to help drive international sufficiency testing upstream. In this post on international sufficiency testing I explained one strategy was to get folks on the team to set their default user locale to something other than US-English. This particular bug was initially detected the day after writing the post discussing a bug that was found by an SDET taking my globalization testing basics course. To replicate the bug for screen shots, I customized my number format via the Region and Language control panel applet by changing the decimal symbol from the default period character (‘.’) to the small Latin letter d (‘d’). After writing the post, I did not restore the machine to it’s default state (using the period character as the decimal symbol).

clip_image002The next day I came into the office, logged onto my computer and noticed the following cryptic scripting error message on the desktop. I hadn’t launched any other app yet, so I deduce this is likely caused by some process that starts automatically when logging in. So, I give the faithful Windows 3 finger salute to bring up the task manager and scan through the list of running processes.

It didn’t take long to associate this error dialog with the communicator.exe process. But, at first I wasn’t quite sure what was causing this error. Later that morning, I reset the user locale settings back to the default settings. After lunch I logged back onto my machine and no error message!

So, being a tester, curiosity got the better of me and I felt compelled to try to replicate the initial anomaly. To make a long story short, it didn’t take too long for me to put the pieces of the puzzle together and figure out the customized decimal symbol was causing an error. But, after troubleshooting a bit it was much worse than thought because this error just didn’t occur with a letter character, it also occurred when I changed my decimal symbol to a comma character (‘,’) which some international locales use as the decimal symbol. So, I contacted my friend and colleague Alan Page who just so happens to work on the Communications team.  He quickly looked into it for me and announced this problem does not occur in the latest release. (Note to self…update your machine to the latest self host bits.)

What is important here is that this bug did not require a different language version. I didn’t need to know a different written language. This bug was exposed simply by customizing the settings in the Regional and Language control panel applet and using the system as a ‘normal user.’ So, what I did need to know was a bit of technical knowledge of the system (specifically around the National Language Support or NLS) and how to configure the system to help me expose potential globalization issues. It’s not magic and it’s not rocket science. So, over the next few weeks I plan to share some testing approaches to help other testers find problems similar to this and incorporate globalization testing into their test strategies.

Written by Bj Rollison

January 6th, 2011 at 5:43 pm

halifax@mailxu.com