Archive for January, 2010
When I was growing up I would sometimes go down into my grandfather’s basement. He had amassed a variety of tools during his lifetime and he was an excellent wood craftsman. I wasn’t allowed to touch any of the power tools, because his rule was, “if you don’t know how to use a tool properly then you shouldn’t play with it.”
Of course, I am a bit of a hard head (even back then) and one day I started playing with the wood lathe while my grandfather was upstairs. Everything seemed to be going pretty well until I pushed the chisel in too far too fast and the wood split and went flying. One piece shattered the overhead light and the other piece ricocheted off the back of my hand leaving an nice gash. I shut off the machine and ran upstairs. After my grandmother cleaned and wrapped my hand, my grandfather made me go back downstairs and clean up the mess and stood over me with a stern look of disapproval making sure I wiped up my blood trail. After that incident, I heeded my grandfather’s advice, at least in his basement shop.
Anyway, with the recent discussions of code coverage around the testing blogosphere I started thinking about what was really being discussed. The discussions (as is the case with most discussions about code coverage) were not actually about the application code coverage as a tool, but more about the code coverage metric. And more specifically the discussions were about how not to assume a high measure of code coverage implies something is well tested. Interestingly enough, 2 years ago I wrote a post illustrating how the metric can be gamed and how the code coverage measure tells us nothing about quality or test effectiveness, but also alluded to how it might be used more effectively.
I thought that how the metric is sometimes misused is mostly self-evident, but then I realized that almost every time testers start talking about code coverage the discussion tends to focus on the metric. This may seem a bit harsh, but if a person’s only contribution to a conversation about code coverage is about how the metric doesn’t relate to quality or testing effectiveness then that person should not be allowed to play with hammers, and employing more complex tools such a wheel-barrows are well beyond that person’s comprehension.
Only thinking of code coverage as a means to get some magic number is akin to thinking “how many nails can I pound with this hammer. The metric itself is mostly irrelevant; and it is completely irrelevant if you don’t know how to interpret it in a way that helps you as a tester. Think about it this way; if we told our managers “our tests achieved 80% code coverage” some of our managers would be elated. (Of course IMHO, these types of managers are metric morons.) But, what do you think these same pointy headed number zombies would say if we told them “we ran our tests and we only missed testing 20% of the code.” I suspect they would start pacing back and forth in the room mumbling “We must run more tests, we must run more tests.”
When we stop thinking of code coverage as a simply measure where our only use of the tool is to try and achieve some magical number then perhaps we can start thinking about how to actually use code coverage as an effective tool to help us design tests (in under-tested or untested areas of the code), reduce potential risk, and possibly even drive quality upstream.
For example, one of my mentees is currently working on a project that uses just in time code coverage as a tool to evaluate how tests exercise changed code and downstream dependencies prior to checking code changes (e.g. bug fixes) back into the main tree. The initial pushback by some members of the team (including some pointy headed managers) was “code coverage doesn’t tell us about product quality” or “its too hard to achieve 80% code coverage” (although no such goal had been mentioned), and my personal favorite, “it’s too difficult to get everyone to measure coverage.” I reminded my mentee that the project is not about achieving some magic number, and in fact, it’s really not even about measuring at all. It’s about using the tool to discover information and to help us design additional functional tests at the API or component level that we might otherwise overlook to help prevent downstream regressions. In a nutshell, its about using code coverage as a defect prevention tool in this case.
Bottom line, code coverage is a tool! If you don’t know how to use it to improve your testing, well…
This past weekend I was working on a new test tool library for generating random email addresses; specifically the local address segment of an email address. I know, there are already a lot of email address generators available and this could be construed as reinventing the wheel. But I wanted to give my students in my test automation course at the University of Washington something to test at the API level. So why not have them test a test tool and learn a bit more about API level testing and how to use combinatorial analysis of the input property values to drive a data-driven automated test case. Also, having them test it means that I don’t have too!
Anyway, one of the tool’s properties is a character array of invalid characters for the specific email address system under test. Although the guidelines for email addresses are outlined in RFC 5322 and RFC 2821 many companies can place greater restrictions on the characters that are allowed for the local address component of an email address (the local address is the part before the ‘@’ character).
For example, Yahoo only allows a local address to be between 4 and 32 characters, the first character must be a letter, and only letters, numbers, underscores and only 1 period character. The Google mail local address is between 6 and 30 characters, and only allows letters, numbers, and (multiple) period characters. Hotmail and Live mail allow local address name lengths between 6 and 64 characters (64 is the maximum allowable size according to RFC 5322), and can only contain letters, numbers, periods, hyphens, and underscores.
Even from these few examples we can see a couple of things. First, although we are testing email addresses there is not a universal set of equivalent partitions that works in all contexts. We need to partition the test data into equivalent class subsets based on the specific domain we are testing. For example, the invalid class subset of characters for a Google local address includes the underscore character, but both Yahoo and Hotmail allow the underscore as a valid character in an email local address. (But, I will talk next week about the equivalent partitioning of this data…for now let’s get back to boundary testing!)
Back to my story – as I was exploring each email providers requirements in order to determine how to partition the data I discovered a interesting problem with Yahoo. Remember, the maximum length of the local address for a Yahoo account is 32 characters.
And, the textbox control property on the web page is set to only allow a maximum input of 32 characters to prevent the user from inputting more than 32 characters. Copying a string longer than 32 characters into that textbox simply truncates the string after the 32nd character.
But, when I bump up against the maximum allowable length with some test strings the underlying program that generates suggested alternative local address names will actually produce a local address of 35 characters in length!
Now, if the software message tells me I can’t do something (like have a local address name of more than 32 characters and then the software generates a local address name of 35 characters for me…well, I am the sort of fellow who will push that button!
And sure enough it looks like I can use it. But wait. Only one more button to push and…
What do you mean “Sorry, this appears to be an invalid Yahoo ID?” You generated an invalid local address for me! Why would Yahoo mail torment me so?
I am thinking in the developers mind the user story went sort of like;
User: “I would like this.”
System: “No you can’t have that, but you can have this.”
System: “No, you can’t have that either.”
It’s funny this came up this week because I was talking with a group of senior SDETs about defect prevention versus defect detection and how 99.999% of boundary issues can be found at the unit level or API level of testing well before the UI is slapped onto the functional layer.
Testing the functional layer more thoroughly or a code review would most likely have revealed this ‘magic’ number was inconsistent. Or by forcing the algorithm that generates suggested local addresses to test boundary conditions would have much sooner exposed this problem.
Now I don’t know Yahoo’s development and testing practices, and unfortunately it’s not uncommon to overlook bugs similar to this. But, I suspect that if developer rely on testers to find all their bugs, and testers primarily rely on testing through the user interface to find bugs then we are always going to find boundary bugs post release (and that’s a good thing because it gives me something to blog about).
Last year the University of Washington Extension Program started running a new Software Test Automation using C# program that I designed and developed for experienced testers with little or no programming background. The program is very popular and has more than 60 people waiting for the next offering. Unfortunately, the pay is not that great so I have no intention of quitting my day job. It helps with the moorage costs for my sailboat, but the stipend I receive is not my motivation for teaching this course.
A few years ago I realized the industry would once again require software testers to have a richer understanding of the complete ‘systems’ they are testing, and also require testers to have a wider range of ‘testing’ skills beyond emulating user behavior in an attempt to expose as many bugs as possible before the software is released. I also realized there are many testers in the Seattle area who are good testers but simply lacked the coding skills necessary to design and develop automated test cases (that more and more companies are expecting from their testing staff).
So, this program is one way I can help testers in the community gain additional skills and share some ideas with my colleagues in the local community. Don’t tell the program coordinator from UW, but my real reward comes when a student tells me about how he/she was able to solve a test problem using something they learned in class. Frankly, I don’t think I am a really great teacher, but it is nice to think that in some small way I can sometimes help testers unleash their own potential to overcome challenges and succeed.
Anyway, the final project after the first 10 weeks of the course is to design automated tests of 3 simple API methods from a ‘black box’ perspective (e.g. they had to design a test that called the API method in a DLL). Each method required one or more argument variables to be passed to the method’s parameters when it was called in the automated test case, and each method returned a type (bool, int, and string) that had to be checked against the expected result based on the variables used in the test. The final project also introduces data-driven automation concepts. The focus of the project was to reinforce the programming concepts and skills they learned over the previous 9 weeks and put that knowledge and skill to use in a reasonably realistic testing project.
I am a big fan of API testing, and at Microsoft we do a lot of API testing and I would venture to say that a significant portion of our test automation runs below the UI layer banging away at various APIs. If API is broken…well it’s that whole “lipstick on a pig” thing; you might mask it for awhile, but it is still a pig and eventually the lipstick wears off.
Prior to the project I try to set the stage by telling everyone that the key to data-driven testing is dependent on the test data crafted by the tester. If the test data is insufficient you potentially miss a critical error. If the data is wrong then you are likely to throw a false positive; an error or exception thrown by the test and not by the system under test (or API method in this case). If a C# method parameter takes an intrinsic data type of int (Integer32) then trying to pass a string variable into the test case from a test data file to that parameter will throw an exception in the test code well before it makes the call to the API method being tested.
For example, the simplified sample test case below is testing a simple API static method ConvertValueToUnicodeChar(int value) that takes a integer value and converts it to a UTF-16 Unicode character. If the integer value is outside the UTF-16 range (0 through 65535) the method ConvertValueToUnicodeChar(int value) will throw an ArgumentOutOfRangeException.
Instead of reading in test data from a file I simply created a string array called csvTestData to simulate a partial list of test data that might be contained in our csv formatted test data file. Notice that the test data on lines #25 and #26 are invalid integer types. So, when these test data variables are converted from strings to type int values in line #43 the int.Parse method will throw a FormatException which is caught by the outer catch block on line #89, marked as bad data and the oracle is skipped. Of course, we want to test the integer values that represent the physical boundaries for a UTF-16 char in C# (which are 0 and 65535) and the values immediately above and below those values (e.g. –1, 0, 1, 65534, 65535, and 65536). Then of course, we need to determine how many samples from the population of possible input variables (integer values between 0 and 65535) we need to test to attain a reasonable degree of confidence that the API method would return the correct UTF-6 Unicode character for a given integer value. (or in this case the population of test data is relatively small and we could simply run through all 65536 values because it would only take a minute or two).
Unfortunately, some of the test data files submitted in the final project contained invalid test data for the API method being called. In some test cases the parameter type required was a type int, but the test data read in from the file for that parameter was a real number such as 1.5, or a string such as “xyz” similar to the example above. I asked myself why would someone include these variables in a test that are being passed to a parameter of type int? The only thing I can think of is that when these testers designed their test data files, they were thinking about the problem as if they were testing the API method through a user interface. (And, in fact my suspicion was confirmed later when I asked them.)
The bottom line here is that we often times throw a lot of ‘tests’ or a lot of data at something in an attempt to trigger an unexpected error. Sometimes we are successful, and hopefully we document that information and share it with others so we can all learn. But, a lot of times it seems we can’t see the trees because of the forest and execute tests or include test data in our tests just for the sake of physical activity. I sometimes wonder whether or not it matters to think critically about the problem, analyze the situation, and design well-thought out tests, or is simply throwing stuff against the wall and seeing what sticks good enough testing?