Archive for the ‘Internationalization Testing’ Category
It has been a very long time since my last blog post; too long. I have been extremely busy this past year and have been doing a lot of juggling. In some cases I tried juggling too many balls and dropped a few balls. But I have learned quite a bit during my transition from “academia” back into the product groups here at Microsoft, and I have learned a lot about what it means to be a great test lead shipping world class software. Despite the bumps I love my new career direction in Windows Phone team and I finally feel things are coming under control. So, it is time now to once again share some of the things I’ve learned and continue to learn in my journey as a software tester.
Let’s start with a discussion of a problem I came across the other day while doing some testing around posts and feeds (uploads and downloads to social networks such as Twitter, Facebook, etc). Over the years I have frequently mentioned testing with Unicode surrogate code points in strings and using Babel string generation tool to help increase test coverage by producing variable test data composed of characters from across the Unicode spectrum.
Surrogate pairs are often problematic in string parsing algorithms. Unlike “typical” 16-bit Unicode characters in the base multilingual plane (BMP or Plane 0) surrogate pairs are composed of 2 16-bit Unicode code points that are mapped to represent a single character (glyph). (See definition D75 Section 3.8, Surrogates in The Unicode Standard Version 6.1) Surrogate code points typically cause problems because many string parsing algorithms assume 1 character/glyph is 1 code point value which can lead to character (data) corruption or string buffer miscounts (which can sometime lead to buffer overflow errors).
As an example of string buffer miscounting let’s take a look at Twitter. It is generally well known that Twitter has a character limit of 140 characters. But, when a sting of 140 characters contains surrogate pairs it seems that Twitter doesn’t know how to count them correctly and displays a message stating, “Your Tweet was over 140 characters. You’ll have to be more clever.”
Well Twitter…I was being clever! I was clever enough to expose an error path caused by a mismatch between the code that counts character glyphs and the code that realizes there are more than 140 16-bit character code points.
Although there is a counting mismatch at least Twitter preserved the character glyphs for surrogate code points in this string.
Unfortunately, TweetDeck is what I refer to as a globalization stupid application. TweetDeck doesn’t have a problem with character count mismatches because it breaks horribly when surrogate code points are used in a string.
There is some really wicked character parsing when the string is pasted into TweetDeck. TweetDeck solves the character count problem by blocking any character that is not an ASCII character from the string. (Note: the “W” character is a full-width Latin character U+FF37 not the Latin W U+0057.)
I find it hard to believe that a modern application would limit the range of characters it allows customers to use; especially an application targeted towards users of the world wide web.
At the end of each week one of the last things I do is open my junk mail folder in Outlook and check to see if an email was moved there inadvertently before deleting all the spam that let’s me know that I’ve won the lottery in Ethiopia, or that my long lost relative in Chechnya left me 19 bazillion Euros, or the countless discount drug offerings. So, as I was going through my Friday evening spam mail deletion ritual I noticed a subject line that was a bit unusual. Before you jump to any incorrect conclusions it wasn’t about appendage enlargement, or free internet dating services. The email title was in Arabic, but included a “box” character at the beginning of the string.
Now, I don’t read Arabic, but I am pretty good at noticing globalization bugs when they are staring me right in the face. The “box” character (actually a glyph) in a Unicode string either represents an Unicode code point that is unassigned (it doesn’t have a character associated with that code point value), or the system doesn’t have a font that maps a glyph (the character we see) to that particular Unicode code point. So, curiosity got the better of me, and I decided to investigate a bit. The first thing I did was to right click on the email subject line and paste it into Notepad and notice that the “box” glyph did not appear.
A few years ago I developed a utility for decoding Unicode Strings aptly called “String Decoder” and also wrote a post that discusses the tool. So, I launched String Decoder and copied the Arabic string from Notepad and pasted it into the String Decoder tool.
The first thing I notice when reading through the list of Unicode code point values is the value U+FEFF. Now, I happen to know that this particular value is a byte order mark (BOM). This seems pretty unusual and ask myself how a BOM character could get inserted in a string. So, I look up the character in the Unicode Charts and discover that in the Arabic Presentation Forms-B character set this was a special character for a zero width no-break space that as been deprecated. Ah, so the Unicode BOM code point value appearing in the string is not so magical after all!
Interestingly enough, the U+FEFF character only displays as a “box” glyph in the subject line in the Junk E-mail folder. When I copied the email message from the Junk folder to my Inbox (or other folder) the code point U+FEFF is treated as a zero width non-breaking space character so no box glyph appears. This is due to the fact that when an email gets shunted into the Junk E-mail folder “links and other functionality have been disabled.” In other words, it is plain-text.
I previously also wrote about using “real world” test data for globalization testing, and this is another example of “real-world” data can be useful in testing text inputs and outputs to evaluate how unexpected character code points in a string are parsed or handled. I think this also bolsters the argument to include some amount of test data randomization using tools such as the Babel tool in globalization testing to potentially test for other unexpected characters or sequences of mixed Unicode characters.
It has been sometime since my last post. This seems to happen every so often lately; not because I don’t have anything to write about but mostly due to having too many irons in the fire so to speak and juggling hot irons is never fun and one is always going to drop. Also about this time every year I go sailing in the San Juan Islands or the Gulf Islands of British Columbia. This year I went to the San Juan Islands, and spent a few days incommunicado anchored in Shallow Bay on Sucia Island. Sucia. Echo Bay is a great anchorage with sandy beaches (unusual for the PNW), and the famous China Caves to explore.
Another place I have been known to explore from time to time is the Stack Exchange Software Quality Assurance and Testing forum. There are many interesting questions and a great variety of responses that offer a wealth of information or provide different perspectives. Recently a question was posed about how to read in static test data for a specific locale or language. Many regular readers know that I am a strong proponent of pseudo-random test data generation in conjunction with automated testing to increase the variability of test data used in each test iteration and generally improve test coverage. But I also understand the value of static test data in providing a solid baseline, and in some cases enabling access to specific test data in different locales or languages.
For example, suppose I am testing a text editor application and I want to read in a text file in the appropriate language based on the operating system current users locale settings. In this situation, I could save a text file containing strings or sentences for each target language or locale dialect. Each file would get a unique name based on the 3 letter ISO-639-2 language name (the complete list is at http://www.loc.gov/standards/iso639-2/php/code_list.php), prepended it to a common filename that describes the contents and the appropriate extension. For example,
- ENG[TestData].txt would be English
- ZHO[TestData].txt would be Chinese
- DEU[TestData].txt would be German
To get the appropriate text file auto-magically read in to the test at runtime the only thing we would need to do is to get the current user locale using the CultureInfo class Three Letter ISO Language Name property in C#.
1: string testDataFileName = "testdata.txt";
3: CultureInfo ci = CultureInfo.CurrentCulture;
5: // Path to server location where static files exist
6: string path = Path.GetFullPath(
9: // Read file contents
10: using (StreamReader readFile =
11: new StreamReader(Path.Combine(
12: path, string.Concat(
13: ci.ThreeLetterISOLanguageName, testDataFileName))))
15: //parse test data and do test stuff
Notice we concatenate the filename (and extension) and the 3-letter ISO language name in line 13 and then combine that with the path to the file location and read the file contents using StreamReader.
But, we might need more specialization depending on what we are testing. For example, if we were testing a spell checker for US versus Great Britain (and Canada), or testing simplified Chinese and also traditional Chinese. In this case the ISO 639-2 specification does not delineate between simplified Chinese and traditional Chinese or US English and British English. In this case we could “make up” a 3-letter designation such as GBR for Great Britain, or CHT for Chinese (traditional).
Or, perhaps a better solution would be to use the Locale Identifiers (LCID) used by Windows to identify specific locales (rather than languages). The solution is identical to the above except instead of calling the ThreeLetterISOLanguageName property we call the LCID property as illustrated below.
1: string testDataFileName = "testdata.txt";
3: CultureInfo ci = CultureInfo.CurrentCulture;
5: // Path to server location where static files exist
6: string path = Path.GetFullPath(
9: // Read file contents
10: using (StreamReader readFile =
11: new StreamReader(Path.Combine(
12: path, string.Concat(
13: ci.LCID, testDataFileName))))
15: //parse test data and do test stuff
Of course, now we would need to name our static file names with the appropriate LCID decimal number such as
- 1028testdata.txt would be traditional Chinese used in Taiwan, and
- 2052testdata.txt would be for simplified Chinese used in PRC
Personally, I prefer getting the LCID as it provides greater control and more specificity. But the down side of using LCIDs is that if you may end up having multiple files that contain the same contents. For example, although Singapore, Malaysia, and PRC all use simplified Chinese there are 3 different LCIDs.
There are other properties that allow you to get the culture info for the current user in Windows, and the right property to use ultimately depends on your specific needs. But, CultureInfo class members can easily be used to manage localized static data files or even manage control flow through an automated test that has specific dependencies on a language or a locale setting.
I am generally not a big fan of static test data. I do know that in the proper context static test data can provide some value. Of course we should be aware of the common problems with files of static test data or (even worse) hard coded test data in a test case. Some problems with static test data include:
- Stagnation – static test data may add some initial value, but over time simply reusing the same test data over and over in a test diminishes the value of that test. For example, retesting the same name strings in a first name input textbox is not providing any new information if those ‘static’ names worked in the previous build and the underlying functionality has not changed.
- Contextual blindness – sometimes we have files of static test data that was identified as “problematic” in one situation (context), so we reuse the “problematic” test data regardless of the context. In 1995 I wrote a white paper on “problematic double-byte encoded characters (DBCS) explaining why each code point was problematic in a given context. For example, a Japanese character that began with a 0x5C trail-byte might be problematic in a filename on an ANSI based system that parsed characters by bytes instead of wide bytes. This is not true on Windows systems where the default encoding is the Unicode transformation format of UTF-16. However, some people continue to use obsolete DBCS problem characters perhaps because they don’t fully understand the underlying contextual differences between ANSI based encodings and Unicode.
Perhaps on the opposite end of the test data spectrum is random test data. Many of you that read this blog or have heard me speak know that I am a big proponent of parameterized random test data generation. Parameterization allows us to better model our test data. I know that even parameterized random data can be crafted to be representative of real data, but it is not “real” data.
But, there may be a happy medium between static test data and random test data. And, best of all it is abundantly available. One of the best sources for (especially non-English) test data comes from sources that most of us already use on a daily basis. The test data source I speak of are social networks.
I have met many wonderful people from around the world both in person and virtually, and stayed in contact with many of them. Last year while keynoting at the first software testing conference in Vietnam (VistaCon 2010) I was privileged to meet my dear friend Thuyen, who helped organize the conference. Since the conference we have stayed in contact via email and Facebook. When she posts on Facebook it is usually in Vietnamese. Since I don’t (yet) read Vietnamese I use Bing Translator to help me figure out the comment.
Last week she had an entry on her Facebook wall that began “Tối nay vô tình nghe trên TV 1 bài hát mà giai điệu…” So, I copied the entry and opened Bing Translator to translate the entry.
Many of you will quickly notice the strange anomaly in the translation. I initially thought that this service might be incrementing this numeric value for some reason, but when I changed the number value to 2 the number 2 displayed in the translated string. I tried various other numbers and quickly discovered that 6 incremented to the number 7, 8 decremented to 7, and 9 decremented to the number 3. I didn’t see a clear pattern here so I thought this might be an issues resulting from parsing a particular sequence of characters.
So, I modified different parts of the string (removed words) to narrow down the problem. I found the string “tình nghe trên TV 1 bài hát mà giai điệu” contained the problematic sequence. Removing any ‘word’ from this string displayed the translated string with a number of 1, with the exception of 1 word. Removing the word “nghe” from the above string resulted in the translation illustrated below.
But, the purpose of this post is not to illustrate this particular bug, but to give you ideas of how we can use social network feeds in our testing. People around the world use social networks and you can find “real world” strings in various languages that you can use as test data in various contexts. Most of the time this ‘test data’ will not likely result in a bug; but sometimes it can reveal interesting issues. Best of all, strings taken from social networks are not some manufactured static or random test data. Using strings copied from social networks is about as “real world” as we can get…this is the “data” from our customers.
Well, it’s another new year. Like a lot of folks I have spent the last few days reflecting on the past year and contemplating this coming year. I won’t bore you by rambling on about my thoughts, reflections, or ambitions; they are mostly personal. Professionally, I will continue to strive to improve myself in my chosen discipline, seek out new challenges and opportunities, and share my experiences with those who follow my posts. The beginning of the year is a busy time for me arranging my conference schedule (which I am cutting back on) and preparing to teach a software testing course and a software test automation course at the University of Washington, and getting engaged with 2 key internal projects intended to help improve our internal engineering processes at Microsoft.
Many of you know that I started at Microsoft on the Windows International Test Team, and internationalization (I18N), globalization (G11N) and even localization (L10N) are topics that I have always been interested in. In 2009 I posted a series on localization testing (Part 1, Part 2, Part 3, and Part 4). Last year, I designed and developed an internal class on globalization testing basics for non-globalization experts who want to incorporate globalization test strategies into their test designs in an attempt to find bugs sooner and drive quality upstream. As the world becomes even more connected and there is rapid growth of customers around the globe it just makes sense that we need to design our software that our customers will value in their (local) context. So, over the next few weeks I will do a series of posts on globalization testing.
But first, I’d like to share a globalization type bug that I found…or should I say that found me. This is a great example of bugs that you might come across using a strategy I used in my previous teams to help drive international sufficiency testing upstream. In this post on international sufficiency testing I explained one strategy was to get folks on the team to set their default user locale to something other than US-English. This particular bug was initially detected the day after writing the post discussing a bug that was found by an SDET taking my globalization testing basics course. To replicate the bug for screen shots, I customized my number format via the Region and Language control panel applet by changing the decimal symbol from the default period character (‘.’) to the small Latin letter d (‘d’). After writing the post, I did not restore the machine to it’s default state (using the period character as the decimal symbol).
The next day I came into the office, logged onto my computer and noticed the following cryptic scripting error message on the desktop. I hadn’t launched any other app yet, so I deduce this is likely caused by some process that starts automatically when logging in. So, I give the faithful Windows 3 finger salute to bring up the task manager and scan through the list of running processes.
It didn’t take long to associate this error dialog with the communicator.exe process. But, at first I wasn’t quite sure what was causing this error. Later that morning, I reset the user locale settings back to the default settings. After lunch I logged back onto my machine and no error message!
So, being a tester, curiosity got the better of me and I felt compelled to try to replicate the initial anomaly. To make a long story short, it didn’t take too long for me to put the pieces of the puzzle together and figure out the customized decimal symbol was causing an error. But, after troubleshooting a bit it was much worse than thought because this error just didn’t occur with a letter character, it also occurred when I changed my decimal symbol to a comma character (‘,’) which some international locales use as the decimal symbol. So, I contacted my friend and colleague Alan Page who just so happens to work on the Communications team. He quickly looked into it for me and announced this problem does not occur in the latest release. (Note to self…update your machine to the latest self host bits.)
What is important here is that this bug did not require a different language version. I didn’t need to know a different written language. This bug was exposed simply by customizing the settings in the Regional and Language control panel applet and using the system as a ‘normal user.’ So, what I did need to know was a bit of technical knowledge of the system (specifically around the National Language Support or NLS) and how to configure the system to help me expose potential globalization issues. It’s not magic and it’s not rocket science. So, over the next few weeks I plan to share some testing approaches to help other testers find problems similar to this and incorporate globalization testing into their test strategies.
A person who speaks 3 languages is trilingual; a person who speaks 2 languages is bilingual; and a person who speaks one language is an American (or Brit…depending on which side of the pond you are on).
I work at a company with tremendous cultural diversity embodied in very smart people from around the globe. Yet, it seems that when people come to Redmond their cultural uniqueness seems to disappear. Maybe it is an engineering thing, maybe it is an assimilation thing, but whatever it is, it is not good thing especially considering that a growing number of our customers come from non-US markets and the way they interact with software is often quite different compared to the US centric scenarios and personas I so often seen used to design and develop our software and services.
Monday afternoon I taught a class on globalization testing basics geared towards SDETs who are not experts in globalization. These testers usually come to the training mostly because they’ve been tagged by their manager to help with globalization testing efforts on their team. But, what they learn is that with a little understanding of some basic concepts and with the aid of a few tools they can incorporate international sufficiency testing concepts into their test designs (both exploratory and automated) and drive quality upstream (e.g. find some bugs sooner).
One of the topics I discuss in the class is customizing the current user locale settings. I also discussed this in an earlier post this year. Customizing the national conventions settings for the current user locale include things like custom date and time pictures, as well as number and currency formats. Of course to really understand custom locale settings we should have a good technical understanding of the national language support (NLS) API functions and specifically the LCType parameter Locale Information Constants.
<tangent alert> When I and other people talk about encouraging testers to develop their technical skills or knowledge there are some people who simply assume that we mean “technical” equates to coding. This is really a shallow view of how they interpret ‘technical.’ When I refer to technical knowledge and skills I am primarily referring to an understanding of how the system (or parts of the system) work and how to use that knowledge and their skills to design more effective tests.</tangent alert>
In this case, our technical understanding of the variable arguments that can be passed to the lpLCData parameter of an NLS API function such as SetLocaleInfo() for the various LCType constants can help is more fully explore the functionality of features that use the operating system’s NLS settings. For example, the constant value for the decimal symbol for number formats is LOCALE_SDECIMAL. When this LCType is specified the value we can pass to the lpLCData parameter is a string of up to 3 characters in length (“maximum number of characters allowed for this string is four, including a terminating null character.”)
In class I often use a custom decimal symbol as an example of customizing national convention settings by manually changing the number format decimal symbol from a period character (.) to random Unicode characters. From previous examples I knew of a minor anomaly in the calculator (calc.exe) in which the decimal button changed to show “abc” (or some other random string of 1 to 3 characters), but the decimal symbol in the result window only displayed the first character (“a”).
But, in this week’s class I gave an example of changing the decimal symbol from a period character to the literal string “dot.” Then, Andreas Schiffler, one of the SDETs in the class used the example “dot” string in the calculator and quickly discovered an anomaly.
It seems that the calculator does not like the lower case letter ‘d’ as a decimal point. If you customize the decimal symbol for the number format to the Unicode lower case letter d, then launch the calculator (calc.exe) and press the decimal symbol key (or perform any calculation in which the result includes a decimal value) the calculator result window will show “Overflow.”
Andreas initially thought this might be caused due to the fact that the letter ‘d’ is a formatting character. However, we quickly discovered the upper case letter ‘D’ is also a formatting character, but the upper case ‘D’ and other formatting characters do not cause a similar incorrect condition. (I went home and automated this test looking for clues using the GlobalTester library and pumped in over 10000 random Unicode characters, and none of them have caused an overflow.)
Now this particular problem might seem out of context, or have no ‘real-world’ scenario because the commonly used characters for the decimal symbol are the period (.), the comma (,) and the space ( ) characters. And for the life of me I can’t think of any national convention in this world that uses the letter ‘d’ as a decimal symbol in their number formats. But, this application is using the NLS settings (at least to some level of implementation), and since we allow the user to customize these settings, then I shouldn’t expect that functionality to break. The bottom line is that something out of the ordinary is going on that probably needs investigating.
Sometimes simply changing the current user default locale settings might reveal basic internationalization issues. And sometimes customizing the national conventions and using randomized data for those conventions might reveal problems with how the developer implements national language support in the product. You don’t have to be an expert in globalization to discover these issues! You just have to know a little bit about the technicalities of NLS, and have a desire to potentially find some pretty cool, or at least weird anomalies earlier in your testing.
I started my career at Microsoft in 1994 working on the Windows 95 International Test team. Globalization testing is a unique specialty in software testing just like performance, security, and other specific areas of testing. Globalization testing doesn’t necessarily require a tester to be bi-lingual, or be from a country other than the United States. A good globalization tester has an in-depth understanding of such things as character encoding types and issues associated with the different types, character mapping and conversion issues, data manipulation by the application, operating system, and network protocols.
Many people might also say that globalization testers also need to know that different locales (places) around the world use different formats for date and time (national conventions). For example, in the United States the default long date format is Thursday, June 03, 2010 but in Germany it is Donnerstag, 3. Juni 2010. A tester doesn’t have to ‘read’ German to see the abstract date format has changed from dddd, MMMM dd, yyyy to dddd, d. MMMM yyyy.
Testing for support of these different national conventions used around the world is referred to as basic international sufficiency testing. I suspect the reason why some people might assume basic international sufficiency testing these different national conventions is the domain of the globalization tester is because the national conventions are set by default on the different localized versions of a software product so that’s when they are tested. But, this reasoning is absurd!
First, not all products are “localized” into all languages or ‘locales.” So, who tests the Canadian long date format of MMMM-dd-yy, or the Georgian (Georgia) long date format of yyyy ‘წლის’ dd MM, dddd? Also, Vista and later versions of Windows allow the user to ‘customize’ the date and time “format pictures” to use different separator symbols and orderings.
Secondly, way too many bugs such as hard-coded date formats are found way too late in the testing cycle (because localized versions tend to lag US English language version). And of course, we all know the cost of finding bugs later in the lifecycle are more costly to correct.
So, we must ask if there is a way for basic international sufficiency testing to be ‘pushed upstream?’ And of course the answer is yes. The easiest way is to host a “globalization bug bash” early in the cycle. (A “bug bash” is a day where testers are given some basic training on attack patterns, fault models, etc., in a general focus area and then spend a day exploring different areas of the product trying to flush out bugs in a competition style format.) Another way is to assign each tester a different locale (preferably one that is not associated with a localized language version) and have them set their test and self-host environments to that locale during their testing.
This is easily accomplished on Windows test environments by having testers launch the Regional and Language control panel applet (the short cut is Start –> Run, then type “intl.cpl” without the quotes, and press the OK button).
This just tests for a basic level of international sufficiency, and any good tester would want to explore their project’s capability to support the more than 150 different locale national conventions at a deeper level. This is especially true if your product is going to be used by customers around the world (including Canada). But, of course, we don’t want to run the same tests on all 150+ locales supported by the operating system.
The national convention settings for a particular locale are stored in a data type called the LCID, and when we change our locale (Format on the latest Regional and Language control panel applet) through the user interface we are actually calling various National Language Support (NLS) APIs. A “world-wide” application should use the universal NLS APIs and data available via the operating system.
One way to test our application’s ability to correctly use the national convention data supplied by the operating system is to set customized conventions. For example, did you know the Windows 7 operating system allows a digit grouping symbol to be a string of up to 3 characters? Or the Negative sign symbol can be a string of up to 4 characters.
Although having testers change their default locale (Format) on their test environment and self-host machines is a good first step in basic international sufficiency testing, we also want to see if our application can process a negative value of “!NEG7” instead of just “–7,” and any textboxes correctly display the customized negative sign symbol (especially at the upper extreme boundary of the textbox size property.
To customize the national convention settings we simply click the Advanced settings… button on the Formats property sheet of the Region and Language control panel applet which instantiates a new dialog with 4 property sheets for Numbers, Currency, Time, and Date.
Solution for Test Automation
That’s all well and fine for basic testing, or testing a “few” customized values, but if we wanted to test the permutations for each convention, or the combination of different conventions on numbers, currency, time, or date formats the number of tests is astronomical. Typically, testers writing an automated test would try to navigate the user interface of the Regional and Language control panel applet and the Customize Format property sheets in order to set custom conventions.
In the past I provided some code snippets for changing the convention settings on the Customize Format property sheets on versions of Windows pre-Vista. Earlier this year I also provided code snippets for customizing the date format picture and the time format picture.
That’s all well and good, but I recently released a new test automation library called GlobalTester for test developers to use in their automated test scripts. The GlobalTester library provides testers methods to set custom national conventions for the current user without having to navigate the user interface of the Region and Language options control panel applet. These national conventions include number formats, currency formats, date formats, time formats, and also current location.
The following example illustrates how we might design a test script to customize the date format for a test and reset the date format to its original setting (restoring the test environment to pre-test conditions). (Usage documentation for the GlobalTester library is on the Testing Mentor website.)
Time is a commodity in short supply. I have been juggling a lot lately and there never seems to be enough time to do everything I need to do, and even less time to do the things I want to do. (Blogging falls under the want to do category.) I wish sometimes I could slow down the hands of time, but that is beyond my control. What is within my control is changing the time format displayed on the computer. And if I need to do that in an automated test to increase the robustness of my test to include globalization, then I can programmatically change the time format without having to manipulate the Region and Language settings control panel applet.
Time and date information is commonly pulled from the operating system by many developers for use in headers or footers on documents, default file names, printing, and other places time/date stamps are useful or important. To ensure our products are “world-ready” we should modify the formats to validate whether our product supports various national conventions used in different regions (locales) around the world. In the previous post I illustrated how to programmatically customize the date formats on a Windows environment for including some basic globalization tests in your test automation. This week let’s look at how we can programmatically change both the short time and long time formats.
We will again need the 2 Win32 API functions SetLocaleInfo() and PostMessage() that we marshaled over into the NativeMethods class. Since that code doesn’t change I won’t repeat it here you can simply refer to the code snippet in the previous post. In this situation we need to set the lcType in SetLocaleInfo() to the LOCALE_STIMEFORMAT constant. Then we can pass a null-terminated string to the lcData variable in the SetLocaleInfo() function. MSDN explains “The maximum number of characters allowed for this string is 80, including a terminating null character. The string can consist of a combination of hour, minute, and second format pictures.”
Once again, to simplify that a bit I wrote some more wrapper methods to change the time format. Also, since we will be calling SetLocaleInfo() and PostMessage() a lot for customizing date, time, and other national conventions I created a wrapper method called UpdateLocaleInformation() to remove redundancy.
Once again, we simply have to set the SetTimeFormatType property to either the Short time or Long time format, provide the format picture by setting the SetTimeFormatPicture property, and then call ChangeTimeFormat(). The sample below illustrates how to change the short time format with different time separators and a reverse order.
Now, we can also customize the AM/PM designator as well. To change the AM/PM designator we need to add a few more properties and another wrapper method. In this case, I’ve added the SetAmPmDesignator property, the SetAmPmString property, and the ChangeAmPmDesignator() method.
The code snippet below illustrates how to change the AM designator from “AM” to “In the morning.”
Modifying national conventions is one way to test for globalization support upstream and should be done early in the testing cycle rather than relying on a separate globalization testing cycle. Time and date are perhaps the most visible national conventions used in many different ways in our applications. We should test the common (equivalent) conventions used in various regions around the world, and customizing these settings helps ensure the developer is properly calling NLS APIs and not using custom functions.
Also, check out the beta release of the GlobalTester automation library that has this functionality and more, and let me know what you think.
The ability of our software products to function correctly in a global environment is becoming more and more important. Our software should support national conventions used by the various locales around the globe. For example, in some regions of the world the period character is used as the number group separator and the comma is used as the decimal symbol (radix). European calendars generally start on Monday rather than Sunday which is customary in the United States. Era based calendars are still in common use in Japan and Korea, date formats and order, and time formats also vary by region or locale. As testers we need to test our software to ensure our customers around the world can use the national conventions they are accustomed to, and not force them down a US-centric, one-size-fits-all format or standard.
There are several settings that we can modify and customize for more robust globalization testing such as number, currency, time and date formats. Modifying these settings can help us test that our application is globalized to use National Language System (NLS) APIs provided by the system.Although a user would change these settings using the Regional Options user interface property sheets, if the purpose of our test is not to emulate user interaction, then modifying the custom regional settings for globalization testing programmatically is more efficient.
Last year I talked about how to programmatically make changes to the settings in the Region and Language control panel applet when doing globalization testing. Unfortunately, the code sample provided in the previous post was appropriate for versions of Windows XP and earlier. For versions of Windows Vista and later things have changed a bit. Also, the previous sample tried to be a one-size fits all and relied on the test developer to set the appropriate lcType constants and lcData argument variables required by the Win32 function SetLocaleInfo().
This time, I decided to simplify things a bit and wrapped some methods to call the appropriate Win32 API functions and properties to set lcType and lcData values to make it easier to incorporate into automated tests. I also separated the various advanced custom formats for Region and Language options into separate classes. Of course, I have a beta version of an automation library (DLL) called GlobalTest.DLL on my website that testers can use in their automated test cases, but this week let’s look at the class for setting custom date formats.
Making these changes programmatically still requires the Win32 SetLocaleInfo() function. MSDN also states this function modifies the specified values for all applications, so to prevent potential issues in other applications running on the system we should also broadcast the WM_SETTINGCHANGE message. To broadcast the WM_SETTINGCHANGE message we will also need the Win32 PostMessage() function. Since we are Process Invocation (PInvoke) to call these unmanaged functions we should put them in a separate class that I’ve called NativeMethods. I also included all necessary constant values required by these methods in the NativeMethods class also as illustrated below.
The class for the custom wrapper method is TestingMentor.TestTool.GlobalTester.SetDateFormat. There is a public enumeration for the short date and long date constants. One of these values must be assigned to the SetDateType property. The other property that must be set is the SetDateFormatPicture. The big change in the SetLocaleInfo() function is that the lcData type is a null-terminated string that MSDN refers to as a format picture. Current versions of Windows allow users to customize the order of the month, day and year, the format for each, and even allow different separators between the date elements. The format picture enables the user to select various format types in different orders for either the short date or the long date. See MSDN’s Month, Day, Year and Era Format Pictures for the various supported format types.
Once the SetDateType and SetDateFormatPicture properties are assigned we simply have to call ChangeDateFormat() method to change the settings and broadcast the message to the system. The code snippet below illustrates how a tester would change the default long date format in an automated test to determine globalization support in the application under test. Customizing the date format is useful if the application under test uses a date string in any way. For example, if the application includes a function to insert a date string in an edit control, or if the date is printed as a header or footer in a document, or if a date string is appended to a record.
Programmatically changing the date format is an easy way testers can customize date formats in their automated tests without having to manipulate the controls on Region and Language property sheet. Also note, that since the format picture is a string the order of the supported date format types is now controlled by the arrangement in the string, and the separator characters can be different between the day and month and the month and year as illustrated in the example above.
Modifying national conventions is one way to test for globalization support upstream and should be done early in the testing cycle rather than relying on a separate globalization testing cycle.
Next week I will discuss customizing the time format. Also, check out the beta release of the GlobalTester automation library that has this functionality and more and let me know what you think.
Originally Published Thursday, November 12, 2009
The past series of posts have focused on one of localization testing which describes the largest category of localization class issues reported by testers performing localization testing, and what we categorize as usability/behavioral type issues because they adversely impact the usability of the software or how end users interact with the product. This is the last post in this series, but I do intend to publish a more complete paper covering localization testing in the near future….stay tuned. This final post in this series will discuss issues that affect the layout of controls on a dialog or window and are generally referred to as clipping or truncation.
Clipping occurs when the top or bottom portion of a control (including label controls that contain static text) is cut off and does not display the control or the control’s contents completely as illustrated below. Clipping and truncation is quite common on East Asian language versions because the default font size used in Japanese, Korean, and Chinese language versions is a 9 point font instead of the 8 point font used in English and other language versions. Clipping often occurs because developers fail to size controls adequately for larger fonts (especially common in East Asian language versions), or for display resolutions set to custom font sizes. Clipping also occurs because many localization tools are incapable of displaying a true WYSIWYG or runtime view of dialogs, requiring localizers to ‘guess’ when resizing control on dialog layouts.
It is possible to test for potential clipping and truncation problem areas without a localized application. English language version should function and display properly on all localized language versions of the Windows operating system. So, one way to check for potential clipping or truncation issues is to install the English language version of the application under test on an East Asian language version of the Windows operating system. Another testing method to test for potential clipping and truncation issues is to change the Windows display appearance or the custom font size via the Display Properties control panel applet.
However, due to the limitations of most current localization tools inability to dynamically resized controls and dialogs, and inability to display dialogs at runtime or present a true WYSIWYG view during the localization process, the localized language versions must also be tested for clipping and truncations caused by improper sizing and layout of controls.
Truncation is similar to clipping, but typically occurs when the right side of controls are cut off (or the left side of the controls in bi-directional displays used in Hebrew and Arabic languages) and do not completely display the entire control or the control’s contents.
Other Layout Issues
Because some localization tools may not provide a true ‘WYSIWYG’ display of what a dialog or property sheet will look like at runtime, occasionally resizing may cause several controls to overlap. This is especially true when dialogs contain dynamic controls that are dependent on certain configurations or machine states.
In East Asian cultures it is common for an individual’s surname to precede the given (first) name. (It is also uncommon to have a middle name, so this field should never be required.) Therefore, the controls for name type data may need to be repositioned on dialogs in East Asian language versions. The localization team will reposition the last name label and textbox controls and the given name controls. This means that the logical tab order be reset. Also, the surname textbox control should have focus when the dialog is displayed instead of the first given name field.
The tab order of controls should allow for easy, intuitive navigation of a dialog. Design guidelines suggest a tab order that changes the focus of controls from left to right and top to bottom. Focus should change between each control in a logical order, and dialogs should never have loss of tab focus’ where no control on the dialog appears to have focus.
Tab order is typically problematic even in English language versions in the early lifecycle of many projects when the user interface is in flux. There is also a high probability of introducing tab order problems any time the controls on a dialog change.
All localization testing doesn’t have to be manual
In the past much of the localization testing has been repetitive manual testing. Testers would manually step through every menu item and other link to instantiate every dialog and property sheet in the program and inspect it visually and test the behavior of such things as tab order, access keys, etc. for errors. This painstaking process would be repeated multiple times during the project lifecycle on every localized language version. Unfortunately, not only was this boringly repetitive, but because the manual testers were looking at so many dialogs during the workday their eyes simply tired out leading to missed bugs. So, there must be a better way.
We know that each dialog has a 2-dimensional size usually measured in pixels. Once we know the height and width of the dialog or property sheet we can measure the distance from the left most edge of the dialog to the leading edge of the first control. Using control properties such as size and location that are stored in the form’s resource file we can measure the size and position of each control on a dialog or property sheet. Once all controls are identified the distance and position of the controls can then be measured in relation to the dialog or property sheet and other controls.
Using a simple example, let’s consider 1 dimension of a dialog as 250 pixels wide. The dialog contains a label control that is 15 pixels to the right of the left most edge of the dialog, and that label is 45 pixels in length. The textbox control next to the label starts at position 70, so there are 10 pixels between the right edge of the label control and the left edge of the textbox control. Now, let’s say that textbox control is 150 pixels wide. By calculating the width of the 2 controls plus the distance between the controls we can see that truncation will occur on this dialog. Similarly, we can also evaluate the relative position of controls on a dialog and detect alignment both horizontally and vertically more accurately than the human eye.
Of course this is not a simple solution, but if you have thousands of dialogs and property sheets, and multiple language versions investing in an automated solution may be invaluable. One internal case study testing efficiency increased and significantly reduced manual testing and overall direct costs, and the effectiveness/accuracy of reported issues also increased. Perhaps not for everyone, but it is possible!