<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>I.M. Testy &#187; Testing Practices</title>
	<atom:link href="http://www.testingmentor.com/imtesty/category/testing-practices/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.testingmentor.com/imtesty</link>
	<description>Treatises on the practice of software testing</description>
	<lastBuildDate>Thu, 01 Jul 2010 17:10:28 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Should we use boundary values in our combinatorial tests?</title>
		<link>http://www.testingmentor.com/imtesty/2010/07/01/should-we-use-boundary-values-in-our-combinatorial-tests/</link>
		<comments>http://www.testingmentor.com/imtesty/2010/07/01/should-we-use-boundary-values-in-our-combinatorial-tests/#comments</comments>
		<pubDate>Thu, 01 Jul 2010 17:10:28 +0000</pubDate>
		<dc:creator>Bj Rollison</dc:creator>
				<category><![CDATA[Testing Practices]]></category>
		<category><![CDATA[Boundary Testing]]></category>
		<category><![CDATA[Combinatorial Testing]]></category>

		<guid isPermaLink="false">http://www.testingmentor.com/imtesty/2010/07/01/should-we-use-boundary-values-in-our-combinatorial-tests/</guid>
		<description><![CDATA[I have been a little busy at work lately designing 2 new advanced software testing courses. One of the courses is on combinatorial testing. The course focuses primarily on feature decomposition to identify input parameter interactions, modeling input variables, using the more advanced features of our PICT tool to customize the model file, how to [...]]]></description>
			<content:encoded><![CDATA[<p>I have been a little busy at work lately designing 2 new advanced software testing courses. One of the courses is on combinatorial testing. The course focuses primarily on feature decomposition to identify input parameter interactions, modeling input variables, using the more advanced features of our PICT tool to customize the model file, how to generate a variety of subsets of combinatorial tests from a single model to increase test coverage using PICT, and how to design oracles for data-driven automated combinatorial tests.</p>
<p>In this particular course I used the Page Setup dialog in Paint as a feature to model in one of the exercises. And as it turns out, this was a good choice because as it turns out, it has made me rethink how to model input variables for use in combinatorial testing.</p>
<p><a href="http://testingmentor.com/imtesty/wp-content/uploads/2010/07/image.png"><img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://testingmentor.com/imtesty/wp-content/uploads/2010/07/image_thumb.png" width="421" height="295" /></a>&#160; <br />Paint’s Page Setup Dialog</p>
<p>I generally don’t advocate hard-coding specific values for input parameters that have a linear range of values. The reason should be reasonably obvious; if we have a range of values from 1 to 100, and I hard-code the values of 1, 10, 50, and 75, 100 (for a positive test) then I have absolutely 0 probability of ever including the value of 42 in combination with other input parameters. To avoid hard-coding values I usually recommend creating equivalent partitions of appropriate input parameters (e.g. xsmall (1-10), small (11 – 25), medium (26-50), etc). Modeling a range of input values using equivalent partitions allows me to randomly select a value in each set, increases my probability of testing with values that I might not otherwise include in a hard-coded set, and adds some degree of variability of inputs for improved test coverage of all possible input values.</p>
<p>However, sometimes we might want to include specific values in the model file we use to generate combinatorial tests. These specific values might include boundary conditions or other values based on historical failure indicators for that feature. In the past I suggested that we don’t necessarily have to specify boundary values in our combinatorial tests. The reason for this suggestion is based on the idea that:</p>
<ul>
<li>many boundary issues are single mode faults (meaning the error occurs when 1 parameter is set at or immediately above or below its boundary condition </li>
<li>testing for single mode errors is often easier and less costly as compared to combinatorial testing</li>
<li>combinatorial testing might obfuscate the cause of a boundary bug</li>
</ul>
<p>However, I am now convinced that </p>
<ul>
<li>some developers are so inept at unit testing and completely overlook boundary conditions (If you are a developer and only write “happy path” unit tests, please read <em>Pragmatic Unit Testing</em> by Hunt and Thomas, and <em>Clean Code</em> by Robert Martin)</li>
<li>we find boundary bugs so late in the test cycle that someone determines they are too obscure to fix</li>
<li>we have “trained” customers to avoid boundaries (due to the number of issues and resultant failures that often occur around boundaries) so we don’t care about them anymore either</li>
<li>we don’t understand the fault model and therefore don’t now how to adequately identify boundary conditions and test for them</li>
</ul>
<p>But boundary issues are still fun to find, and they always make for good examples in training or conference demos. </p>
<p>Anyway, on to the bug. While ‘checking’ the ranges of the margins on Paint’s Page Setup dialog for the exercise in this course I came across an interesting anomaly. When the margins were set to values that were grossly outside the allowable margin and I pressed the OK button I got an appropriate error message. But, when I changed the Scaling variable state from Fit to: to Adjust to: the Fit to: value changed to 0 although the textbox control was grayed out. I now realized the margin values are being used to auto calculate the Fit to: output values.</p>
<p><a href="http://testingmentor.com/imtesty/wp-content/uploads/2010/07/image1.png"><img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://testingmentor.com/imtesty/wp-content/uploads/2010/07/image_thumb1.png" width="432" height="302" /></a>     <br />Margin values used to calculate Fit to: value</p>
<p>Since the boundary value for letter size paper with a portrait orientation is 8.5 inches, I decided to see what happens when I set the left margin to 8.501 and the right margin to 0 and then change from Fit to: to Adjust to: to check and see what happens. Interestingly enough, the Fit to: value now changed to 4,294,965,329. OK…now, I just overflowed a variable (the developer only allows the user to input a maximum of 2 characters (99) in the Fit to textboxes).</p>
<p>&#160;<a href="http://testingmentor.com/imtesty/wp-content/uploads/2010/07/image2.png"><img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://testingmentor.com/imtesty/wp-content/uploads/2010/07/image_thumb2.png" width="426" height="298" /></a>     <br />Overflow in Fit to: value</p>
<p>Surely, I am thinking that a page size boundary is a standard value, and surely someone tested this. But, I decided to check the specific boundary value just to see what happens anyway. So, I set the left margin to 8.5 and the right margin to 0, change the Scaling from Fit to: to Adjust to: and…</p>
<p><a href="http://testingmentor.com/imtesty/wp-content/uploads/2010/07/image3.png"><img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://testingmentor.com/imtesty/wp-content/uploads/2010/07/image_thumb3.png" width="429" height="214" /></a>     <br />Uh-oh!</p>
<p><strong>Game over!</strong></p>
<p>There are many ways to expose this failure. Another fun way is to set the Scaling parameter to Adjust to: first. Next set the left margin to 8.5 and the right margin to 0 (assuming letter size paper with a portrait orientation), and click the OK button. Then, open the Page Setup dialog again and…game over!</p>
<p>Now, I really didn’t find this bug doing combinatorial testing. In fact, although combinatorial testing might ultimately reveal this problem (depending on the model of inputs provided to the tool that generates variable combinations), this bug was discovered during the data modeling process and discovering where calculations were occurring on certain variables. Once I saw an output boundary anomaly caused by other input variables&#160; I&#160; forced those input values to target the output boundary conditions of the output variable I wanted to further investigate. So, while we should use failure indicators and experience to specify important values in our combinatorial tests in conjunction with random values within the total population of possible values) I am still not thoroughly convinced that we should always include specific boundary values in our combinatorial test models because I suspect that even the process of modeling this feature for combinatorial testing would likely have exposed this issue.</p>
<p>But, in the end this is really just another example of a simple boundary bug that could have easily been found during unit testing.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.testingmentor.com/imtesty/2010/07/01/should-we-use-boundary-values-in-our-combinatorial-tests/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Boundary Bugs&#8230;like shooting fish in a barrel</title>
		<link>http://www.testingmentor.com/imtesty/2010/03/24/boundary-bugslike-shooting-fish-in-a-barrel/</link>
		<comments>http://www.testingmentor.com/imtesty/2010/03/24/boundary-bugslike-shooting-fish-in-a-barrel/#comments</comments>
		<pubDate>Thu, 25 Mar 2010 07:58:25 +0000</pubDate>
		<dc:creator>Bj Rollison</dc:creator>
				<category><![CDATA[General Testing Topics]]></category>
		<category><![CDATA[Testing Practices]]></category>
		<category><![CDATA[Boundary Testing]]></category>
		<category><![CDATA[Testing Techniques]]></category>

		<guid isPermaLink="false">http://www.testingmentor.com/imtesty/2010/03/24/boundary-bugslike-shooting-fish-in-a-barrel/</guid>
		<description><![CDATA[If there is a bug at a boundary that doesn’t lead to an unhandled exception or security exploit should we care?
Perhaps an even more important question is why do we find so many boundary type bugs via exploratory testing when they can and should be caught earlier? Why don’t we find these types of bugs [...]]]></description>
			<content:encoded><![CDATA[<p>If there is a bug at a boundary that doesn’t lead to an unhandled exception or security exploit should we care?</p>
<p>Perhaps an even more important question is why do we find so many boundary type bugs via exploratory testing when they can and should be caught earlier? Why don’t we find these types of bugs in our unit testing? Why don’t we find these types of bugs by more systematically testing the software? Maybe we do find them, and those who make the decisions to fix these types of bugs just don’t care if they are fixed because there is no severe negative impact to the user. Maybe someone just wants to give me fodder for my blog!</p>
<p>This week I wanted to compare the range of allowable font sizes for a simulation program I developed as an example for a magazine article that I am working on. I knew that Office applications allow a font size within the range of 1 – 1638. I thought that range might be too large for my purposes, and since I knew that Windows Notepad included a font dialog I decided to check the allowable range of font sizes in Notepad.</p>
<p>The first thing I discovered was that the combobox control allows up to 5 characters! Really? Someone decided it is a good idea to allow users to enter 5 characters? <a href="http://testingmentor.com/imtesty/wp-content/uploads/2010/03/notepad1.jpg"><img style="border-bottom: 0px; border-left: 0px; margin: 10px auto 0px; display: block; float: none; border-top: 0px; border-right: 0px" title="notepad 1" src="http://testingmentor.com/imtesty/wp-content/uploads/2010/03/notepad1_thumb.jpg" border="0" alt="notepad 1" width="441" height="307" /></a></p>
<p>OK, I’ll play along. Maybe if I put in a size of 99999 and press the OK button on the dialog I will get an error message, or at least Notepad defaults to the last ‘valid’ selected font size. That might seem reasonable. But is that what happens? NO! Instead of doing something reasonable (e.g. error message, default font size) the font changes to a size of 1 (yes that is a font size 1 in the upper left corner in the image below).</p>
<p><a href="http://testingmentor.com/imtesty/wp-content/uploads/2010/03/Notepad2.jpg"><img style="border-bottom: 0px; border-left: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px" title="Notepad 2" src="http://testingmentor.com/imtesty/wp-content/uploads/2010/03/Notepad2_thumb.jpg" border="0" alt="Notepad 2" width="458" height="197" /></a></p>
<p>I am sure that defaulting to a font size of 1 makes sense when the allowable size value overflows! Really…someone thought that was a good idea? Now I wanted to see what magical boundary value the developer decided was an acceptable font size. Since the combobox size property allowed 5 characters I immediately tried 65535. No, that also resulted in the overflow and displayed the text in a font size of 1. Next I tried 32767. Wait…32767 didn’t display the string in Notepad’s edit control at a font size of 1. Now, I am thinking the developer is using a data type of signed short for the font size variable. So, I enter 32768 expecting the value to overflow and display my string as a size 1 font again. But, no…that doesn’t happen.</p>
<p>Now, when I am design boundary tests I generally rely on 2 heuristics for identifying boundary values for input or output parameters.</p>
<ol>
<li>Values at the extreme edges of a physical range of values</li>
<li>Values at the edges of equivalence partitions of physical values</li>
</ol>
<p>So, in these situations I ask myself “What sort of demented developer debauchery have I now found myself?” I can’t think of any other obvious edge values that might apply, so out of curiosity I quickly narrow down the magical value to 39321. I then ask myself, “OK…even if there were a display capable of rendering or a printer capable of printing a font of this size, what is so unique about 39321?” In hexadecimal it is 0&#215;9999, and in binary it is 1001100110011001. OK…nothing obviously special here, but I am certain the implementation details are much more complex then a simple range of values and at this point I really don’t care because this bug just doesn’t make sense.</p>
<p>Maybe it’s not supposed to make sense! Maybe nobody really cares about these types of bugs!</p>
<p><em>(BTW…somebody please take the Thesaurus away from the developer…’Oblique?’ Are you serious…why not just be consistent and use the word ‘Italic?’)</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.testingmentor.com/imtesty/2010/03/24/boundary-bugslike-shooting-fish-in-a-barrel/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Code Coverage: More Than Just a Number</title>
		<link>http://www.testingmentor.com/imtesty/2010/01/21/code-coverage-more-than-just-a-number/</link>
		<comments>http://www.testingmentor.com/imtesty/2010/01/21/code-coverage-more-than-just-a-number/#comments</comments>
		<pubDate>Fri, 22 Jan 2010 02:09:22 +0000</pubDate>
		<dc:creator>Bj Rollison</dc:creator>
				<category><![CDATA[General Testing Topics]]></category>
		<category><![CDATA[Testing Practices]]></category>
		<category><![CDATA[Code Coverage]]></category>
		<category><![CDATA[Metrics & Measures]]></category>

		<guid isPermaLink="false">http://www.testingmentor.com/imtesty/2010/01/21/code-coverage-more-than-just-a-number/</guid>
		<description><![CDATA[When I was growing up I would sometimes go down into my grandfather’s basement. He had amassed a variety of tools during his lifetime and he was an excellent wood craftsman. I wasn’t allowed to touch any of the power tools, because his rule was, “if you don’t know how to use a tool properly [...]]]></description>
			<content:encoded><![CDATA[<p>When I was growing up I would sometimes go down into my grandfather’s basement. He had amassed a variety of tools during his lifetime and he was an excellent wood craftsman. I wasn’t allowed to touch any of the power tools, because his rule was, “<strong><em>if you don’t know how to use a tool properly then you shouldn’t play with it</em></strong>.”</p>
<p>Of course, I am a bit of a hard head (even back then) and one day I started playing with the wood lathe while my grandfather was upstairs. Everything seemed to be going pretty well until I pushed the chisel in too far too fast and the wood split and went flying. One piece shattered the overhead light and the other piece ricocheted off the back of my hand leaving an nice gash. I shut off the machine and ran upstairs. After my grandmother cleaned and wrapped my hand, my grandfather made me go back downstairs and clean up the mess and stood over me with a stern look of disapproval making sure I wiped up my blood trail. After that incident, I heeded my grandfather’s advice, at least in his basement shop.</p>
<p>Anyway, with the recent discussions of code coverage around the testing blogosphere I started thinking about what was really being discussed. The discussions (as is the case with most discussions about code coverage) were not actually about the application code coverage as a tool, but more about the code coverage metric. And more specifically the discussions were about how not to assume a high measure of code coverage implies something is well tested. Interestingly enough, 2 years ago I wrote a <a href="http://www.testingmentor.com/imtesty/2009/11/13/the-code-coverage-metric-is-inversely-proportional-to-the-criticality-of-the-information-it-provides/" target="_blank">post</a> illustrating how the metric can be gamed and how the code coverage measure tells us nothing about quality or test effectiveness, but also alluded to how it might be used more effectively.</p>
<p>I thought that how the metric is sometimes misused is mostly self-evident, but then I realized that almost every time testers start talking about code coverage the discussion tends to focus on the metric. This may seem a bit harsh, but if a person&#8217;s only contribution to a conversation about code coverage is about how the metric doesn’t relate to quality or testing effectiveness then that person should not be allowed to play with hammers, and employing more complex tools such a wheel-barrows are well beyond that person&#8217;s comprehension.</p>
<p>Only thinking of code coverage as a means to get some magic number is akin to thinking “how many nails can I pound with this hammer. The metric itself is mostly irrelevant; and it is completely irrelevant if you don’t know how to interpret it in a way that helps you as a tester. Think about it this way; if we told our managers “our tests achieved 80% code coverage” some of our managers would be elated. (Of course IMHO, these types of managers are metric morons.) But, what do you think these same pointy headed number zombies would say if we told them “we ran our tests and we only missed testing 20% of the code.” I suspect they would start pacing back and forth in the room mumbling “We must run more tests, we must run more tests.”</p>
<p>When we stop thinking of code coverage as a simply measure where our only use of the tool is to try and achieve some magical number then perhaps we can start thinking about how to actually use code coverage as an effective tool to help us design tests (in under-tested or untested areas of the code), reduce potential risk, and possibly even drive quality upstream.</p>
<p>For example, one of my mentees is currently working on a project that uses just in time code coverage as a tool to evaluate how tests exercise changed code and downstream dependencies prior to checking code changes (e.g. bug fixes) back into the main tree. The initial pushback by some members of the team (including some pointy headed managers) was “code coverage doesn’t tell us about product quality” or “its too hard to achieve 80% code coverage” (although no such goal had been mentioned), and my personal favorite, “it’s too difficult to get everyone to measure coverage.” I reminded my mentee that the project is not about achieving some magic number, and in fact, it’s really not even about measuring at all. It’s about using the tool to discover information and to help us design additional functional tests at the API or component level that we might otherwise overlook to help prevent downstream regressions. In a nutshell, its about using code coverage as a defect prevention tool in this case.</p>
<p>Bottom line, code coverage is a tool! If you don’t know how to use it to improve your testing, well…</p>
]]></content:encoded>
			<wfw:commentRss>http://www.testingmentor.com/imtesty/2010/01/21/code-coverage-more-than-just-a-number/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Boundary bug hunting; sometimes it’s almost too easy!</title>
		<link>http://www.testingmentor.com/imtesty/2010/01/14/boundary-bug-hunting-sometimes-its-almost-too-easy/</link>
		<comments>http://www.testingmentor.com/imtesty/2010/01/14/boundary-bug-hunting-sometimes-its-almost-too-easy/#comments</comments>
		<pubDate>Thu, 14 Jan 2010 23:14:14 +0000</pubDate>
		<dc:creator>Bj Rollison</dc:creator>
				<category><![CDATA[General Testing Topics]]></category>
		<category><![CDATA[Testing Practices]]></category>
		<category><![CDATA[Boundary Testing]]></category>
		<category><![CDATA[Testing Techniques]]></category>

		<guid isPermaLink="false">http://www.testingmentor.com/imtesty/2010/01/14/boundary-bug-hunting-sometimes-its-almost-too-easy/</guid>
		<description><![CDATA[This past weekend I was working on a new test tool library for generating random email addresses; specifically the local address segment of an email address. I know, there are already a lot of email address generators available and this could be construed as reinventing the wheel. But I wanted to give my students in [...]]]></description>
			<content:encoded><![CDATA[<p>This past weekend I was working on a new test tool library for generating random email addresses; specifically the local address segment of an email address. I know, there are already a lot of email address generators available and this could be construed as reinventing the wheel. But I wanted to give my students in my test automation course at the University of Washington something to test at the API level. So why not have them test a test tool and learn a bit more about API level testing and how to use combinatorial analysis of the input property values to drive a data-driven automated test case. Also, having them test it means that I don’t have too!</p>
<p>Anyway, one of the tool’s properties is a character array of invalid characters for the specific email address system under test. Although the guidelines for email addresses are outlined in RFC 5322 and RFC 2821 many companies can place greater restrictions on the characters that are allowed for the local address component of an email address (the local address is the part before the ‘@’ character).</p>
<p>For example, Yahoo only allows a local address to be between 4 and 32 characters, the first character must be a letter, and only letters, numbers, underscores and only 1 period character. The Google mail local address is between 6 and 30 characters, and only allows letters, numbers, and (multiple) period characters. Hotmail and Live mail allow local address name lengths between 6 and 64 characters (64 is the maximum allowable size according to RFC 5322), and can only contain letters, numbers, periods, hyphens, and underscores.</p>
<p>Even from these few examples we can see a couple of things. First, although we are testing email addresses there is not a universal set of equivalent partitions that works in all contexts. We need to partition the test data into equivalent class subsets based on the specific domain we are testing. For example, the invalid class subset of characters for a Google local address includes the underscore character, but both Yahoo and Hotmail allow the underscore as a valid character in an email local address. (But, I will talk next week about the equivalent partitioning of this data…for now let’s get back to boundary testing!)</p>
<p>Back to my story – as I was exploring each email providers requirements in order to determine how to partition the data I discovered a interesting problem with Yahoo. Remember, the maximum length of the local address for a Yahoo account is 32 characters. <a href="http://testingmentor.com/imtesty/wp-content/uploads/2010/01/yahoomsg.jpg"><img style="border-right-width: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto" title="yahoo msg" src="http://testingmentor.com/imtesty/wp-content/uploads/2010/01/yahoomsg_thumb.jpg" border="0" alt="yahoo msg" width="422" height="114" /></a></p>
<p>And, the textbox control property on the web page is set to only allow a maximum input of 32 characters to prevent the user from inputting more than 32 characters. Copying a string longer than 32 characters into that textbox simply truncates the string after the 32nd character.</p>
<p>But, when I bump up against the maximum allowable length with some test strings the underlying program that generates suggested alternative local address names will actually produce a local address of 35 characters in length!</p>
<p><a href="http://testingmentor.com/imtesty/wp-content/uploads/2010/01/yahoomsg2.jpg"><img style="border-right-width: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto" title="yahoo msg 2" src="http://testingmentor.com/imtesty/wp-content/uploads/2010/01/yahoomsg2_thumb.jpg" border="0" alt="yahoo msg 2" width="426" height="110" /></a></p>
<p>Now, if the software message tells me I can’t do something (like have a local address name of more than 32 characters and then the software generates a local address name of 35 characters for me…well, I am the sort of fellow who will push that button!</p>
<p><a href="http://testingmentor.com/imtesty/wp-content/uploads/2010/01/yahoomsg3.jpg"><img style="border-right-width: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto" title="yahoo msg 3" src="http://testingmentor.com/imtesty/wp-content/uploads/2010/01/yahoomsg3_thumb.jpg" border="0" alt="yahoo msg 3" width="431" height="155" /></a></p>
<p>And sure enough it looks like I can use it. But wait. Only one more button to push and…</p>
<p><a href="http://testingmentor.com/imtesty/wp-content/uploads/2010/01/yahoomsg4.jpg"><img style="border-right-width: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto" title="yahoo msg 4" src="http://testingmentor.com/imtesty/wp-content/uploads/2010/01/yahoomsg4_thumb.jpg" border="0" alt="yahoo msg 4" width="437" height="108" /></a></p>
<p>What do you mean “Sorry, this appears to be an invalid Yahoo ID?” You generated an invalid local address for me! Why would Yahoo mail torment me so?</p>
<p>I am thinking in the developers mind the user story went sort of like;</p>
<blockquote><p><strong>User</strong>: “I would like this.”</p>
<p><strong>System</strong>: “No you can’t have that, but you can have this.”</p>
<p><strong>User</strong>: “OK”</p>
<p><strong>System</strong>: “No, you can’t have that either.”</p></blockquote>
<p>It’s funny this came up this week because I was talking with a group of senior SDETs about defect prevention versus defect detection and how 99.999% of boundary issues can be found at the unit level or API level of testing well before the UI is slapped onto the functional layer.</p>
<p>Testing the functional layer more thoroughly or a code review would most likely have revealed this ‘magic’ number was inconsistent. Or by forcing the algorithm that generates suggested local addresses to test boundary conditions would have much sooner exposed this problem.</p>
<p>Now I don’t know Yahoo’s development and testing practices, and unfortunately it&#8217;s not uncommon to overlook bugs similar to this. But, I suspect that if developer rely on testers to find all their bugs, and testers primarily rely on testing through the user interface to find bugs then we are always going to find boundary bugs post release (and that’s a good thing because it gives me something to blog about).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.testingmentor.com/imtesty/2010/01/14/boundary-bug-hunting-sometimes-its-almost-too-easy/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Evaluating Exploratory Testing</title>
		<link>http://www.testingmentor.com/imtesty/2009/12/10/evaluating-exploratory-testing/</link>
		<comments>http://www.testingmentor.com/imtesty/2009/12/10/evaluating-exploratory-testing/#comments</comments>
		<pubDate>Thu, 10 Dec 2009 21:03:54 +0000</pubDate>
		<dc:creator>Bj Rollison</dc:creator>
				<category><![CDATA[Testing Practices]]></category>
		<category><![CDATA[Exploratory Testing]]></category>

		<guid isPermaLink="false">http://testingmentor.com/imtesty/2009/12/10/evaluating-exploratory-testing/</guid>
		<description><![CDATA[This month’s issue of Testing Experience published my article that summarizes the findings of several case studies of exploratory testing both inside and outside of Microsoft. Although some people consider me to be a harsh critic of exploratory testing nothing could be further from the truth. When I started my career as a professional tester [...]]]></description>
			<content:encoded><![CDATA[<p>This month’s issue of <a href="http://testingexperience.com/testingexperience04_09.pdf" target="_blank">Testing Experience</a> published my article that summarizes the findings of several case studies of exploratory testing both inside and outside of Microsoft. Although some people consider me to be a harsh critic of exploratory testing nothing could be further from the truth. When I started my career as a professional tester my approach to software testing was primarily exploratory in nature. I was focused on executing as many negative tests I could possibly conceive of in search of the most heinous bugs I could find; and I was good at it. My criticism is not of exploratory testing as an approach; however, I do ‘question’ the claim that claim exploratory testing is “orders of magnitude more productive.” And, I am also critical of the argument that we don’t understand exploratory testing if we don’t conform to one notion of the concept (or buy into an ideological doctrine) because I don’t believe that there is only one ‘right’ way to perform or think about exploratory testing.</p>
<p>Of course, I know it is un-unpopular to question the claims of exploratory testing ‘experts,’ but I just happen to be one of those people who question things that are founded on anecdotal observations without any hard data to substantiate those claims. I certainly don’t have all the information, but I personally like to be able to back up my position with facts (known at the time) and several verifiable/repeatable data points so I can answer questions from a defendable position rather than trying to convince or cajole someone with my subjective opinion. (<em>I know a lot of studies show that many Americans base their decisions on their emotional state at the time. But I learned a long time ago that you should never buy the boat you fall in love with because you will spend more time maintaining her than sailing her</em>.) Also, it’s easier to persuade me that I might be wrong with solid, verifiable information and repeatable data versus emotional rhetoric or personal insults.</p>
<p>I think most people who promote exploratory testing are well intentioned and realize in conjunction with other testing approaches that exploratory testing adds value to any testing effort. I also think that many practitioners realize that while we must not only hone our intellectual capabilities of critical thinking and logical reasoning, we must also constantly build our knowledge and skills of the other approaches, methods, and techniques used in our professional trade.</p>
<p>At Microsoft, I can’t think of any testing group that does not use exploratory testing as part of its overall strategy. We have learned not to rely on exploratory testing as our primary approach because it simply doesn&#8217;t scale as project size and complexity increase, and it is easy for testers to focus too much on out of context issues in hopes of finding another bug. As one Principal Test Manager summarized, exploratory testing helps</p>
<ul>
<li>flush out “low hanging fruit” (identify obvious issues very quickly)</li>
<li>provide welcomed context switching by getting folks to look at other areas of the product</li>
<li>to seed new testing ideas or helps identify holes (<em>which is great as long as we have a way to preserve those ideas and they are learnable by other testers</em>)</li>
</ul>
<p>But, of course, it was also noted that greater ‘system knowledge’ and an understanding of other various testing techniques and approaches enriched the overall effectiveness of the testers on the teams. My job as a teacher and mentor of software testing is to take really smart people who already know how to think critically about problems and provide them with the foundational knowledge of alternative techniques, methods, approaches, and the skills that are specific to the profession of software testing that will enable them to decide what approach to use depending on the context.</p>
<p>Similar to other testing approaches exploratory testing has benefits and limitations and is more effective in exposing certain categories of issues, and is less effective at exposing other types of problems. (See post on <a href="http://testingmentor.com/imtesty/2009/11/19/the-pesticide-paradox/" target="_blank">Pesticide Paradox</a>.) And now we have researched case studies that begin to help us understand how to utilize exploratory testing as part of our overall testing strategy. Of course, further research could be done in this area, but it is very interesting that the independent studies used in the article reached similar findings and conclusions.</p>
<p>Anyway, I look forward to comments or feedback on the article.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.testingmentor.com/imtesty/2009/12/10/evaluating-exploratory-testing/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Refactoring for Testability</title>
		<link>http://www.testingmentor.com/imtesty/2009/12/02/refactoring-for-testability/</link>
		<comments>http://www.testingmentor.com/imtesty/2009/12/02/refactoring-for-testability/#comments</comments>
		<pubDate>Wed, 02 Dec 2009 20:35:24 +0000</pubDate>
		<dc:creator>Bj Rollison</dc:creator>
				<category><![CDATA[Testing Practices]]></category>
		<category><![CDATA[Code Coverage]]></category>
		<category><![CDATA[Testability]]></category>
		<category><![CDATA[White Box Testing]]></category>

		<guid isPermaLink="false">http://testingmentor.com/imtesty/2009/12/02/refactoring-for-testability/</guid>
		<description><![CDATA[ 

One of my hobbies is shooting CMP matches and long range precision shooting. Besides lots of practice perfecting the techniques a big part of precision shooting depends on the ammunition and studying the ballistic patterns of various loads. All precision shooters custom load their ammunition and it is not as simple as simply reading a [...]]]></description>
			<content:encoded><![CDATA[<p> </p>
<div class="mceTemp">
<div id="attachment_255" class="wp-caption alignleft" style="width: 209px"><img class="size-medium wp-image-255 " style="margin-left: 5px; margin-right: 5px;" title="DSC_1276" src="http://testingmentor.com/imtesty/wp-content/uploads/2009/12/DSC_1276-199x300.jpg" alt="Teaching my daughter about bullet seating depth." width="199" height="300" /><p class="wp-caption-text">Teaching my daughter about bullet seating depth.</p></div>
<p>One of my hobbies is shooting <a href="http://odcmp.com/">CMP matches</a> and long range precision shooting. Besides lots of practice perfecting the techniques a big part of precision shooting depends on the ammunition and studying the ballistic patterns of various loads. All precision shooters custom load their ammunition and it is not as simple as simply reading a reloading manual. Slight variations of .001” of an inch in seating depth of a bullet or .1 grain of powder may determine whether the group of shots at a target 600 yards away is 1” MOA or 6” MOA. So, getting the ammunition to match the rifle requires continually analyzing your shots, making slight adjustments to the load, and repeating; in computer jargon we might call that refactoring. Reloading for precision is a continually optimizing process until we find the optimal load. Similarly, one of the things we do in the Engineering Excellence group at Microsoft is to continually analyze our internal processes and practices to see how we can help our business groups constantly improve and optimize towards their target. One of the big things on our plate these days is testability.</div>
<p>In <a href="http://www.amazon.com/Testing-Object-Oriented-Systems-Models-Patterns/dp/0201809389%3FSubscriptionId%3D0JTCV5ZMHMF7ZYTXGFR2%26tag%3Dbrdicr-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D0201809389">Testing Object-Oriented Systems: Models, Patterns, and Tools</a>, a book I consider one of the most important books on software testing practices, the author Robert Binder defines testability as “The relative ease or difficulty of producing and executing an economically feasible test suite to determine whether the [system under test ] SUT (i) conforms to stated requirements and specifications, and (ii) exhibits an acceptably low probability of failure.” This and several definitions of testability floating around on the web and all generally agree that testability generally involves</p>
<p>1.) The ease with which the SUT can be tested<br />
2.) The cost of testing is reasonable</p>
<p>So, as the testability increases the ease with which our tests can determine whether the SUT satisfies implicit and explicit requirements and has a lower chance of failure at reduced testing costs. This all sounds nice, but unfortunately testability cannot be directly measured; testability is a qualitative measure. Although we can’t accurately measure testability we can sometimes do small things to improve the characteristics of testability and help reduce testing costs by reducing the number of tests required to determine whether the SUT satisfies the stated requirements and also has a low chance of failure, or finding ways to test more efficiently through better designs.</p>
<p>In last week’s post I referred a pseudo code example that was written to illustrate how bugs could linger in code despite a high measure of code coverage. Of course we should realize that pseudo code is generally a far cry from the real implementation of the code. Pseudo code is simply a model, and there are many ways to implement that model. The advantage of a model is that we can often test a model earlier to identify potential issues before a single line of code is written. In this particular pseudo code sample, there were a couple of things that stood out that could likely impact the testability of an implementation of the pseudo code model. So, the neurons in my brain starting firing with lots of testing related questions.</p>
<p>So, let’s use that example to discuss potential testability issues. The sample was based on a requirement that stated “Student ID’ are seven digit numbers between one million and 6 million inclusive.” The function is relatively simple in that it takes a string type passed to the <em>sid</em> parameter, and returns a Boolean true or false to the calling function depending on whether the string satisfies the internal Boolean conditions it is being compared against. But this function also calls 2 other functions; the <em>length</em> () function, and the <em>number</em> () function. From the function names I would think the <em>length</em> () function provides a numeric value that represents the number of characters in the string passed to the <em>sid</em> parameter. I am also betting the <em>number</em> () function returns a numeric value (it converts the string variable to a numeric type such as an integer. The pseudo code example was</p>
<blockquote><p>function validate_studentid(string sid) return<br />
TRUEFALSE<br />
BEGIN<br />
  STATIC TRUEFALSE isOk;<br />
  isOk = true;</p>
<p>  if ((length(sid) is not 7) then<br />
    isOk = False;</p>
<p>  if (number(sid) &lt;= 1000000 or number(sid) &gt; 6000000 then<br />
     isOk = False;</p>
<p>  return isOk;</p>
<p>END</p></blockquote>
<p>One of the reasons that we hire testers with a programming background at Microsoft is that they can help the developer identify potential issues, reduce the probability of failure, and improve testability by stepping through the code during peer reviews, or while designing additional tests to cover un-tested or under-tested areas of the code that are exposed by code coverage analysis. So, when I come across a code sample, I generally step through it to</p>
<ul>
<li>See if it will work as intended (basic unit test)</li>
<li>See if there are any potential obvious errors in logic</li>
<li>Identify tests necessary for branch or conditional coverage (because developers are usually only concerned with block coverage)</li>
<li>Identify argument values for negative testing that might expose undesirable results (bugs)</li>
</ul>
<p>So, in this pseudo code example, once I got to the second conditional clause (if (number (sid)) &lt;= 1000000 or number (sid) &gt; 6000000 then) the little cranks in my brain began to turn. I thought to myself, why are we checking the length of the string? I mean, if the number can only be between 1,000,000 and 6,000,000 then it seems to me that checking the length of the string is simply redundant.</p>
<p>If we remove the first conditional clause (if ((length(sid) is not 7) then) then we actually reduce the number of tests to 3 instead of 4 assuming <a href="http://www.student.cs.uwaterloo.ca/~cs132/Weekly/W02/SCBooleans.html">short-circuiting</a> since short-circuiting compound Boolean expressions is one of several code optimization techniques. (By the way, the first caveat example in Wikipedia on short-circuiting where a function used as a Boolean conditional also “performs some required operation regardless of whether the first conditional evaluates true or false” is simply poor architectural design and is very, very likely to be problematic.) The 3 tests for condition (and basis path) coverage to exercise the true and false outcome of every single Boolean conditional expression are listed in the table below.</p>
<table border="1" cellspacing="0" cellpadding="2" width="681">
<tbody>
<tr>
<td width="173" valign="top"> </td>
<td width="196" valign="top">Conditional 1</td>
<td width="186" valign="top">Conditional 2</td>
<td width="124" valign="top"> </td>
</tr>
<tr>
<td width="172" valign="top">Test</td>
<td width="195" valign="top">number (sid) &lt;= 1000000</td>
<td width="186" valign="top">number (sid) &gt; 6000000</td>
<td width="126" valign="top">Expected Result</td>
</tr>
<tr>
<td width="172" valign="top">Any value between 1000000 and 6000000</td>
<td width="194" valign="top">false</td>
<td width="185" valign="top">false</td>
<td width="128" valign="top">true</td>
</tr>
<tr>
<td width="171" valign="top">Any value &gt; 6000000</td>
<td width="194" valign="top">false</td>
<td width="185" valign="top">true</td>
<td width="129" valign="top">false</td>
</tr>
<tr>
<td width="171" valign="top">Any value &lt; 1000000</td>
<td width="194" valign="top">true</td>
<td width="185" valign="top">(short-circuited)</td>
<td width="130" valign="top">false</td>
</tr>
</tbody>
</table>
<p>Of course, even testing several samples from the equivalent partitions may not expose the bug in this code because the bug in this code is a typical boundary error. (In a <a href="http://testingmentor.com/imtesty/2009/11/18/boundary-testing-isnt-guessing-at-numbers/">previous post</a> I explained the basic fault model that caused many boundary issues. In a nutshell, boundary bugs are generally caused by incorrect relational operators or <a href="http://en.wikipedia.org/wiki/Magic_number_(programming)">magic numbers</a> in code.) Without recognizing that we also need to test the boundaries (999999, 1000000, 1000001, and 5999999, 6000000, 6000001) also we could easily overlook the error in the pseudo code.</p>
<p>Another thing that caught my attention was the lack of exception handling. Some people may not consider including exception handling in pseudo code and take it as a given. But, as a tester when I don’t exception handling in pseudo code in a review then I need to start asking questions so I can better design tests to exercise the exception handling control flow paths that directly impact code coverage measures. Another reason this is an important consideration is because results of code coverage analysis indicates that exception handlers are generally under-tested. It seems we are really good at finding unhandled exceptions with our negative tests (which is really good), but we do not seem to be as thorough in testing the logical code paths of exception handlers. This is especially true for predicate statement with multiple Boolean sub-expressions might trigger an exception. We tend to test one of the conditionals, and the other conditionals expressions in that statement are often under-tested.</p>
<p>So, we can surmise the <em>number</em> () function must be converting the string parameter (the <em>sid</em> variable) to a numeric type and returning a type of number because the conditional clause is comparing it to magic numbers (1000000 and 6000000). But if we entered a string that contained non-numeric characters my initial thought was that the <em>number</em> () function would throw an exception that is unhandled by the <em>validate</em>_<em>studentID</em> () function.</p>
<p>Then I thought a bit more, and considered that the <em>number</em> () function might swallow the exception and return a 0 or even a -1. Now, there are some <a href="http://haacked.com/archive/2005/08/10/9293.aspx">arguments in favor of swallowing exceptions</a>, but in general it is not a good idea. In this case, it is probably a bad idea because one of the primary purposes of a separate function is reusability. If the <em>number</em> () is reused in some other code, or other part of the code where we need to convert a string to a numeric type regardless of the range (within the range of the data type being converted to), I would suspect we would want to throw an exception, and then rethrow the exception in the calling function. Of course, this is where the rubber hits the road, and a professional tester needs to dig in and start asking some hard questions as to how the developer is going to handle this situation. If the <em>number</em> () function is not going to be reused, then most modern programming languages include a function call that will easily convert the string to a numeric type and do it more efficiently as compared to calling a separate function. And may in that case we could swallow the exception in the <em>validate</em>_<em>studentID</em> () function and simply return false as illustrated in the C# code below.</p>
<div id="codeSnippetWrapper" style="text-align: left; line-height: 12pt; background-color: #f4f4f4; margin: 20px 0px 10px; width: 97.5%; font-family: 'Courier New', courier, monospace; direction: ltr; max-height: 200px; font-size: 8pt; overflow: auto; cursor: text; border: silver 1px solid; padding: 4px;">
<div id="codeSnippet" style="text-align: left; line-height: 12pt; background-color: #f4f4f4; width: 100%; font-family: 'Courier New', courier, monospace; direction: ltr; color: black; font-size: 8pt; overflow: visible; border-style: none; padding: 0px;">
<pre style="text-align: left; line-height: 12pt; background-color: white; margin: 0em; width: 100%; font-family: 'Courier New', courier, monospace; direction: ltr; color: black; font-size: 8pt; overflow: visible; border-style: none; padding: 0px;"><span id="lnum1" style="color: #606060">   1:</span> <span style="color: #0000ff">try</span></pre>
<p><!--CRLF--></p>
<pre style="text-align: left; line-height: 12pt; background-color: #f4f4f4; margin: 0em; width: 100%; font-family: 'Courier New', courier, monospace; direction: ltr; color: black; font-size: 8pt; overflow: visible; border-style: none; padding: 0px;"><span id="lnum2" style="color: #606060">   2:</span> {</pre>
<p><!--CRLF--></p>
<pre style="text-align: left; line-height: 12pt; background-color: white; margin: 0em; width: 100%; font-family: 'Courier New', courier, monospace; direction: ltr; color: black; font-size: 8pt; overflow: visible; border-style: none; padding: 0px;"><span id="lnum3" style="color: #606060">   3:</span>     <span style="color: #0000ff">if</span> (<span style="color: #0000ff">int</span>.Parse(sid) &lt; minValue || <span style="color: #0000ff">int</span>.Parse(sid) &gt; maxValue)</pre>
<p><!--CRLF--></p>
<pre style="text-align: left; line-height: 12pt; background-color: #f4f4f4; margin: 0em; width: 100%; font-family: 'Courier New', courier, monospace; direction: ltr; color: black; font-size: 8pt; overflow: visible; border-style: none; padding: 0px;"><span id="lnum4" style="color: #606060">   4:</span>     {</pre>
<p><!--CRLF--></p>
<pre style="text-align: left; line-height: 12pt; background-color: white; margin: 0em; width: 100%; font-family: 'Courier New', courier, monospace; direction: ltr; color: black; font-size: 8pt; overflow: visible; border-style: none; padding: 0px;"><span id="lnum5" style="color: #606060">   5:</span>         isOk = <span style="color: #0000ff">false</span>;</pre>
<p><!--CRLF--></p>
<pre style="text-align: left; line-height: 12pt; background-color: #f4f4f4; margin: 0em; width: 100%; font-family: 'Courier New', courier, monospace; direction: ltr; color: black; font-size: 8pt; overflow: visible; border-style: none; padding: 0px;"><span id="lnum6" style="color: #606060">   6:</span>     }</pre>
<p><!--CRLF--></p>
<pre style="text-align: left; line-height: 12pt; background-color: white; margin: 0em; width: 100%; font-family: 'Courier New', courier, monospace; direction: ltr; color: black; font-size: 8pt; overflow: visible; border-style: none; padding: 0px;"><span id="lnum7" style="color: #606060">   7:</span> }</pre>
<p><!--CRLF--></p>
<pre style="text-align: left; line-height: 12pt; background-color: #f4f4f4; margin: 0em; width: 100%; font-family: 'Courier New', courier, monospace; direction: ltr; color: black; font-size: 8pt; overflow: visible; border-style: none; padding: 0px;"><span id="lnum8" style="color: #606060">   8:</span> <span style="color: #0000ff">catch</span> (FormatException)</pre>
<p><!--CRLF--></p>
<pre style="text-align: left; line-height: 12pt; background-color: white; margin: 0em; width: 100%; font-family: 'Courier New', courier, monospace; direction: ltr; color: black; font-size: 8pt; overflow: visible; border-style: none; padding: 0px;"><span id="lnum9" style="color: #606060">   9:</span> {</pre>
<p><!--CRLF--></p>
<pre style="text-align: left; line-height: 12pt; background-color: #f4f4f4; margin: 0em; width: 100%; font-family: 'Courier New', courier, monospace; direction: ltr; color: black; font-size: 8pt; overflow: visible; border-style: none; padding: 0px;"><span id="lnum10" style="color: #606060">  10:</span>     isOk = <span style="color: #0000ff">false</span>;</pre>
<p><!--CRLF--></p>
<pre style="text-align: left; line-height: 12pt; background-color: white; margin: 0em; width: 100%; font-family: 'Courier New', courier, monospace; direction: ltr; color: black; font-size: 8pt; overflow: visible; border-style: none; padding: 0px;"><span id="lnum11" style="color: #606060">  11:</span> }</pre>
<p><!--CRLF--></p>
<pre style="text-align: left; line-height: 12pt; background-color: #f4f4f4; margin: 0em; width: 100%; font-family: 'Courier New', courier, monospace; direction: ltr; color: black; font-size: 8pt; overflow: visible; border-style: none; padding: 0px;"><span id="lnum12" style="color: #606060">  12:</span> <span style="color: #0000ff">catch</span> (OverflowException)</pre>
<p><!--CRLF--></p>
<pre style="text-align: left; line-height: 12pt; background-color: white; margin: 0em; width: 100%; font-family: 'Courier New', courier, monospace; direction: ltr; color: black; font-size: 8pt; overflow: visible; border-style: none; padding: 0px;"><span id="lnum13" style="color: #606060">  13:</span> {</pre>
<p><!--CRLF--></p>
<pre style="text-align: left; line-height: 12pt; background-color: #f4f4f4; margin: 0em; width: 100%; font-family: 'Courier New', courier, monospace; direction: ltr; color: black; font-size: 8pt; overflow: visible; border-style: none; padding: 0px;"><span id="lnum14" style="color: #606060">  14:</span>     isOk = <span style="color: #0000ff">false</span>;</pre>
<p><!--CRLF--></p>
<pre style="text-align: left; line-height: 12pt; background-color: white; margin: 0em; width: 100%; font-family: 'Courier New', courier, monospace; direction: ltr; color: black; font-size: 8pt; overflow: visible; border-style: none; padding: 0px;"><span id="lnum15" style="color: #606060">  15:</span> }</pre>
<p><!--CRLF--></div>
</div>
<p>With the push to drive quality upstream, reduce costs (especially testing costs), and improve testability I envision that many testers will be working alongside our development counterparts to help them prevent defects from getting into the product code base, and improve the maintainability of the code. This doesn’t mean that testers will become developers or visa versa; it simply means that testers are (generally) experts in designing tests, and developers are experts in designing solutions that adhere to requirements. Rather than an adversarial relationship, I suspect in the future developers and testers will have a more symbiotic relationship to improve the intrinsic quality of our code bases.</p>
<p>The bottom line of all this is that in teams where testers are designing white box tests for improved code coverage (control flow testing), or where testers are engaged in design reviews or peer reviews of code prior to check in, I hope this gives you some things to think about.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.testingmentor.com/imtesty/2009/12/02/refactoring-for-testability/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Reconsidering Code Coverage</title>
		<link>http://www.testingmentor.com/imtesty/2009/11/25/reconsidering-code-coverage/</link>
		<comments>http://www.testingmentor.com/imtesty/2009/11/25/reconsidering-code-coverage/#comments</comments>
		<pubDate>Wed, 25 Nov 2009 10:44:13 +0000</pubDate>
		<dc:creator>Bj Rollison</dc:creator>
				<category><![CDATA[Testing Practices]]></category>
		<category><![CDATA[Code Coverage]]></category>
		<category><![CDATA[Structural Testing]]></category>

		<guid isPermaLink="false">http://testingmentor.com/imtesty/2009/11/25/reconsidering-code-coverage/</guid>
		<description><![CDATA[Tonight on my way to teach a test automation course at the University of Washington I had some free time to catch up on my reading. My manager asked me if I had read this month’s copy of one of the several testing magazines we get and I replied that I had downloaded it but [...]]]></description>
			<content:encoded><![CDATA[<p>Tonight on my way to teach a test automation course at the University of Washington I had some free time to catch up on my reading. My manager asked me if I had read this month’s copy of one of the several testing magazines we get and I replied that I had downloaded it but hadn’t had a chance to read it yet. So, he tossed me the hardcopy of the magazine and said, “Enjoy.” Now this should have been a clue because although Alan is a great manager and mentor, I think he secretly likes to see the veins in my neck swell and blood shoot out of my eyes from time to time.</p>
<p>I read a lot of articles, white papers, and books. I like most of what I read, even if I disagree with some of the points being made. I can’t remember ever reading an article on software testing that ever made me angry. I was not angry because of the message of the article. In fact, I think the point the authors are trying to make is valid and I agree with them on their fundamental point. Unfortunately, the article is filled with technical inaccuracies the end message was almost lost.</p>
<p>I spent the last 10 years studying various techniques, methods, and approaches in software testing. I teach more than 500 testers a year on structural testing techniques, and am now working with a team in the Windows division to implement a new tool for just in time code coverage analysis at the component level that allows us to see how our tests exercise code paths in changed code and the dependent modules. I also discuss structural testing in chapter 5 of our book <em><a href="http://www.hwtams.com" target="_blank">How We Test Software At Microsoft</a></em>. I don’t really consider myself to be an expert in the subject, but I might know a thing or two about it. So, let’s Reconsider Code Coverage!</p>
<p>In August 2007 I wrote an <a href="http://blogs.msdn.com/imtesty/archive/2007/08/14/code-coverage-is-inversely-proportional-to-the-critical-information-it-provides.aspx">informative blog post</a> on the potential misuse of the code coverage measure. But code coverage measures are used by some companies as one of many ways to help them reduce risk. And, let me be very clear here, <strong><em>there is no correlation between code coverage and quality, and code coverage measures don’t tell us “how well” the code was tested</em></strong>. The code coverage measure simply measures what code has been executed, and more importantly what code has not been executed. The value of measuring code coverage is not in producing some “magic number,” but that it helps testers investigate untested or under-tested areas of the product and design additional tests (generally using structural testing techniques) to improve coverage and reduce overall risk.</p>
<blockquote><p><em>Just because you execute a line of code doesn’t mean a bug doesn’t still exist, but if you don’t execute a line of code you have 0 probability of finding a bug if one exists!</em></p></blockquote>
<p>Also it is important to note there are several ways to measure code coverage. Different tools employ different measures and sometimes different tools measure the same type of coverage differently. Also, I discovered that even the same tool can measure the same code differently depending on how it is compiled (debug, retail, etc.) and previously <a href="http://testingmentor.com/imtesty/2009/11/18/basic-blocks-arent-so-basic/">wrote</a> about my study. Some of the basic ways to measure code coverage (not test coverage) include:</p>
<ul>
<li><strong>Function coverage</strong> measures the percentage of functions or methods in a class or application that are called at runtime.</li>
<li><strong>Statement coverage</strong> measures the percentage of executable statements exercised at runtime.</li>
<li><strong>Block coverage</strong> measures the percentage of each sequence of non-branching statements that are executed at runtime. Block coverage subsumes statement coverage.</li>
<li><strong>Decision or branch coverage</strong> measures the percentage of both Boolean (not binary) outcomes (true and false) of simple conditional expressions at runtime. If a predicate statement has more than one conditional sub-expression decision (or branch) coverage treats that predicate statement as one conditional clause. Decision coverage subsumes block coverage.</li>
<li><strong>Condition coverage</strong> measures the percentage of both Boolean outcomes of each conditional sub-expressions that are separated by logical and or logical or in compound predicate statements. Condition coverage subsumes decision coverage.</li>
<li><a href="http://www.mccabe.com/pdf/nist235r.pdf"><strong>Basis path coverage</strong></a> measures the number of linearly independent paths through a program. Basis path coverage is based on <a href="http://www.literateprogramming.com/mccabe.pdf">McCabe’s cyclomatic complexity</a> research.</li>
<li><strong>Path coverage</strong> measures every possible path from the entry to the return statement (or exception) or exit of every method. Unfortunately path testing is usually impossible due to the sheer number of path combinations, and the inability to execute constrained path combinations.</li>
</ul>
<p>Clearly there are different measures of code coverage, and certain types of measures subsume other measures. So, now that we have a handle on the different types of code coverage measures, let’s look at testing some code. We will use the same pseudo code used in the aforementioned article which is based upon the following requirement.</p>
<blockquote><p>“Student ID’ are seven digit numbers between one million and 6 million inclusive.”</p></blockquote>
<p>The authors provided the following pseudo code example for a function to meet this requirement.</p>
<blockquote><p>function validate_studentid(string sid) return<br />
TRUEFALSE<br />
BEGIN<br />
  STATIC TRUEFALSE isOk;<br />
  isOk = true;</p>
<p>  if ((length(sid) is not 7) then<br />
    isOk = False;</p>
<p>  if (number(sid) &lt;= 1000000 or number(sid) &gt; 6000000 then<br />
     isOk = False;</p>
<p>  return isOk;</p>
<p>END</p></blockquote>
<p>So, other than the fact that there is no reason to ‘test’ the length of the sid variable before evaluating it to see if it is within the allowable range (removing this first conditional improves performance and also improves testability of the code), and that if the call to the number() function fails to convert the string to a number for a valid Boolean comparison it will throw an unhandled exception, let’s look at path testing of this simple example by starting with control flow diagrams of each possible path (assuming the call to the number() function does not throw an unhandled exception by passing this message a string of characters such as “foo” rather than a string of digits).</p>
<div id="attachment_244" class="wp-caption aligncenter" style="width: 594px"><img class="size-full wp-image-244" title="path" src="http://testingmentor.com/imtesty/wp-content/uploads/2009/11/path3.jpg" alt="Control flow diagram for validate_studentID() function pseudo-code" width="584" height="653" /><p class="wp-caption-text">Control flow diagram for validate_studentID() function pseudo-code</p></div>
<p>(Edited 11/25: After thinking about this a bit more, if the number() function returned a 0 (zero) if the input was incorrectly formatted, then the number() function would not throw an exception, and the control flow path would be identical to the first test in the table below).</p>
<p><a href="http://testingmentor.com/imtesty/wp-content/uploads/2009/11/path.jpg"></a></p>
<p>Because we are doing path coverage testing and not decision testing, we actually have to separate each Boolean conditional sub-expression in the second compound predicate statement if (number(sid) &lt;= 1000000 or number(sid) &gt; 600000. The example in the article treated both sub-expressions in the compound predicate statement as a single Boolean expression which would be synonymous with decision coverage. Path coverage actually treats each sub-expression as if there were 2 single Boolean conditions such as</p>
<blockquote><p>if (number(sid) &lt;= 1000000<br />
  isOk = False;</p>
<p>if number(sid) &gt; 600000<br />
  isOk = False;</p></blockquote>
<p>The table below illustrates the tests required for testing control flow through this function for path coverage (again assuming we are going to ignore the unhandled exception in the code that would occur by passing in a string such as “foo.”)</p>
<table border="1" cellspacing="0" cellpadding="2" width="553">
<tbody>
<tr>
<td width="71" valign="top">Input (sid)</td>
<td width="101" valign="top">Conditional<br />
length(sid)!= 7</td>
<td width="103" valign="top">Conditional<br />
number &lt;= 1mill</td>
<td width="93" valign="top">Conditional<br />
number &gt; 6mil</td>
<td width="86" valign="top">Expected<br />
Result</td>
<td width="97" valign="top">Actual<br />
Result</td>
</tr>
<tr>
<td width="71" valign="top">999999</td>
<td width="101" valign="top">true</td>
<td width="103" valign="top">true</td>
<td width="93" valign="top">false</td>
<td width="86" valign="top">False</td>
<td width="97" valign="top">False</td>
</tr>
<tr>
<td width="71" valign="top">6500000</td>
<td width="101" valign="top">false</td>
<td width="103" valign="top">false</td>
<td width="93" valign="top">true</td>
<td width="86" valign="top">False</td>
<td width="97" valign="top">False</td>
</tr>
<tr>
<td width="71" valign="top">1000000</td>
<td width="101" valign="top">false</td>
<td width="103" valign="top">true</td>
<td width="93" valign="top">false</td>
<td width="86" valign="top"><strong>True</strong></td>
<td width="97" valign="top"><strong>False</strong></td>
</tr>
<tr>
<td width="71" valign="top">6000000</td>
<td width="101" valign="top">false</td>
<td width="103" valign="top">false</td>
<td width="93" valign="top">false</td>
<td width="86" valign="top">True</td>
<td width="97" valign="top">True</td>
</tr>
</tbody>
</table>
<p>The first test would be a value less than 7 digits, and would cause all Boolean conditional expressions to evaluate as true which will set the isOk variable to false (3 times), and we correctly return the expected result of false (or invalid ID). The second test is a number greater than 6,000,000 (but less than the maximum value that would result in an unhandled overflow exception hopefully being thrown by the number() function). In this case the 3rd conditional expression (<em>if (number(sid) &gt; 6000000</em>) would evaluate as true and the function would return false. The 3rd path is buggy. In this pseudo code example, the only possible way to exercise the true outcome of the Boolean condition <em>if (number(sid) &lt;= 1000000</em> is to use the value of 1,000,000; any other value larger or smaller will cause this Boolean condition to evaluate as false. In this case we expect the function to return true, but it in fact will return false. Finally, any number greater than 1000001 and less than or equal to 6000000 will return a true result indicating a valid student ID.</p>
<p>The article also suggest that structural testing misses other problems. But, when we look at these issues, they actually have nothing to do with structural testing of the function; in other words they are completely out of context of the problem being discussed.</p>
<p>For example, the assert is the requirement is incorrect and should have read 6,999,999 (<strong><em>which I believe is a typo and should be 5,999,999</em></strong>) because of confusion over the word “inclusive.” Inclusive means “<em>including the stated limit or extremes in consideration or account,</em>” but in computing inclusive means “<em>the predicate holds for all elements of an increasing sequence then it holds for their least upper bound.</em>” I disagree with this assumption because I suspect the analyst writing the spec is basing the inclusive range on the common definition, and not a definition based on <a href="http://www.cs.bham.ac.uk/~axj/pub/papers/handy1.pdf">domain theory</a>.</p>
<p>The article questions what would occur with incorrectly formatted numbers such as 123 456 789 or 123,456,789. So, beside the point that these values are not within the valid range of student id numbers, the answer to the question would actually lie in how the <em>number()</em> function being called handles improperly formatted numbers (e.g throwing a format exception, which again is unhandled in our <em>validate_studentid()</em> function), or how an event handler that sits between the UI and the function might deal with invalid or incorrectly formatted inputs.</p>
<p>The next question concerned resizing of the input window or the screen (assuming desktop resolution) and repainting the window or form and its affect on code coverage of the <em>validate_studentid()</em> function. Well, I am going out on a limb here and I am going to say…”what are you talking about?” I am not quite sure how to phrase this, but let me try…resizing or repainting a window has 0 effect on the structural control flow of the <em>validate_studentid()</em> function. (Of course, I could be wrong, and the length() function number() function might have some code that mysteriously interacts with the repainting libraries and how it determines the length of a string or whether a string is a valid number.)</p>
<p>Bugs in external libraries are part of the business. Hopefully those external libraries are well tested or at least documented especially if our development team wrote them. Personally, I have not encountered any public functions or APIs which use wild ass random numbers such as 5.8 million as boundary values, but that’s not to say it couldn’t happen. And of course, if these external functions throw exceptions (as they should based on what they are probably doing), we should have exception handler code in our function to deal with any exceptions thrown from external libraries or function calls.</p>
<p>Based on incorrect path analysis, and out-of-context questions that have nothing to do with control flow through the <em>validate_studentid()</em> function the article suggests that path testing is not a magic potion, but I am not too sure that anyone actually believes it is. And so, the article suggests that “input combinatorics coverage” might work better. Hmm…now I have been teaching combinatorial testing for over 10 years and have read some interesting papers on the effectiveness of combinatorics on statistical testing and code coverage, and I must say I pretty sure you need more than one input parameter in combinatorial testing!</p>
<p>Finally, I don’t agree that code coverage measures tell us “how well the developers have tested their code.” The code coverage measure only tells us what percentage of the code has been executed in a particular way, and more importantly it tells us how what percentage of code has been untested. We must determine whether we need to investigate that area to reduce risk. Of course, many code coverage tools provide a “heat map” that helps us and developers identify untested code, and that is where we shift from the simple act of measuring coverage to the testing method of code coverage analysis in order to design new tests that effectively exercise previously untested code if that level of coverage is important to reduce overall risk.</p>
<p><a href="http://testingmentor.com/imtesty/wp-content/uploads/2009/11/heatmap.jpg"><img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="heat map" src="http://testingmentor.com/imtesty/wp-content/uploads/2009/11/heatmap_thumb.jpg" border="0" alt="heat map" width="545" height="514" /></a></p>
<p>My intent here is not to ridicule the authors of the article. In fact, I agree with their summation that testers should not believe high code coverage numbers mean “well tested.” (Again see my <a href="http://testingmentor.com/imtesty/2009/11/13/the-code-coverage-metric-is-inversely-proportional-to-the-criticality-of-the-information-it-provides/">blog post from Aug 2007</a>.) Unfortunately, the path to the point was fraught with inaccuracies and tangents that I almost never made it to the end.</p>
<p>There are many books and white papers on this subject in the ACM and IEEE libraries. Books by Boris Beizer, Robert Binder, and others go into great detail on structural testing. McCabe’s papers linked to in this post are an excellent resources.</p>
<p>OK…I feel better now. I need to clean up the blood, take a sedative, and go to sleep.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.testingmentor.com/imtesty/2009/11/25/reconsidering-code-coverage/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Adding Variability in Test Case Design</title>
		<link>http://www.testingmentor.com/imtesty/2009/11/18/adding-variability-in-test-case-design/</link>
		<comments>http://www.testingmentor.com/imtesty/2009/11/18/adding-variability-in-test-case-design/#comments</comments>
		<pubDate>Thu, 19 Nov 2009 06:37:42 +0000</pubDate>
		<dc:creator>Bj Rollison</dc:creator>
				<category><![CDATA[Testing Practices]]></category>
		<category><![CDATA[Test Case]]></category>

		<guid isPermaLink="false">http://testingmentor.com/imtesty/2009/11/18/adding-variability-in-test-case-design/</guid>
		<description><![CDATA[ Published Tuesday, October 20, 2009 
I love autumn! Yes, I am definitely a boy of summer and very much prefer warmer weather; however, there is something special about autumn. This past weekend my daughter, and my 2 friends Dongyi and her husband Yuning and I participated in the Rum Run sailboat fun race with [...]]]></description>
			<content:encoded><![CDATA[<p> Published Tuesday, October 20, 2009 <a href="http://testingmentor.com/imtesty/wp-content/uploads/2009/11/IMG_5549.jpg"><img style="border-bottom: 0px; border-left: 0px; margin: 0px 0px 0px 10px; display: inline; border-top: 0px; border-right: 0px" title="IMG_5549" border="0" alt="IMG_5549" align="right" src="http://testingmentor.com/imtesty/wp-content/uploads/2009/11/IMG_5549_thumb.jpg" width="305" height="230" /></a></p>
<p>I love autumn! Yes, I am definitely a boy of summer and very much prefer warmer weather; however, there is something special about autumn. This past weekend my daughter, and my 2 friends Dongyi and her husband Yuning and I participated in the Rum Run sailboat fun race with an overnight raft up at Bainbridge Island’s Port Madison. Saturday morning was quite rainy, but the wind was blowing 15 knots with gusts to 25 knots and NOAA weather radio announcing gale force warnings in Puget Sound. Wow…what a ride! But, it was actually the rather relaxing sail back to my marina on Sunday morning that rekindled the beauty of autumn in my mind. The bright reds, golden yellows, and pastel browns of the foliage seemed to blend into a collage framed by the darkness of the waters of Puget Sound and the snow covered peaks of the Olympic mountains. The beauty of autumn reminds me about change. A sloughing of the old, the cleansing brought about by the pure white snows, eventually followed by the new and fresh growth that blossoms in spring. </p>
<p>Just as the earth goes through variable cycles of rejuvenation, we must also continually update our tests, and (more importantly) the test data we use in our test cases to prevent them from becoming stale. Trees shed their leaves in the autumn and new leaves emerge in the spring, but the tree is fundamentally still the same tree. Similarly, a well-designed test case has a unique fundamental purpose and by changing the variables we can grow the value of that test case. Of course, the cycle of change in test data should be dramatically shorter in duration as compared to the seasonal changes of mother earth. </p>
<p>Here is a simple example of how a well-designed test case using variable test data can increase the value of the information each&#160; test iteration provides through increased confidence and also potentially reduce overall risk. In my role at Microsoft I am in a unique position to not only conduct controlled studies, but I can also implement ideas into practice on enterprise level software projects. One experiment I started about 2 years ago involved multiple groups of testers (sessions) located around the world divided into 3 separate control groups. Each control group tested the identical web page that would display the stock price if the user input a valid stock ticker symbol into a single textbox on the page and pressed the OK button. The only difference in the control groups was the instructions to perform single positive test case with the specific purpose of “ensure any valid stock ticker symbol displays the current stock price for the publicly traded stock specified by its symbol.” The purpose of the study was to determine if different cultural and experiential backgrounds impacted the test data used in a test based on the instructions for a test case. The study collected demographic information on the participants as well as specific inputs applied to the web page. Information on the oracle used by the students was collected anecdotally. Step one in each test was identical because we were not interested in how the tester launched the browser. (Of course this assumes there are other tests that test the multitude of ways to launch a browser and navigate to a URL. Also, if the browser failed to launch the test case is blocked.) </p>
<p>Group 1 was given the most vague instructions for the test case. The instruction was simply: </p>
<ol>
<li>Launch browser and navigate to [url address] </li>
<li>Enter a valid stock ticker symbol and press the OK button and verify the accuracy of the returned stock price. </li>
</ol>
<p>The instructions in the test case given to Group 2 were also somewhat vague, but provided a little guidance both on input and oracle. </p>
<ol>
<li>Launch browser and navigate to [url address] </li>
<li>Enter a valid stock ticker symbol (e.g. “MSFT”) </li>
<li>Press the OK button </li>
<li>Verify the returned stock price is identical to the current stock price listed on the appropriate exchange      </li>
</ol>
<p>Group 3 had similar instructions to Group 2, but the group was given additional guidance as indicated below. </p>
<ol>
<li>Launch browser and navigate to [url address] </li>
<li>Enter a valid stock ticker symbol from a publicly traded stock listed on any public stock exchange      <br />Listings of valid stock ticker symbols are on stock exchange web sites such as:       <br /><a href="http://www.nyse.com">http://www.nyse.com</a>      <br /><a href="http://www.eoddata.com/Symbols.aspx">http://www.eoddata.com/Symbols.aspx</a>      <br /><a href="http://www.nasdaq.com">http://www.nasdaq.com</a>      <br /><a href="http://www.londonstockexchange.com">http://www.londonstockexchange.com</a></li>
<li>Press the OK Button </li>
<li>Verify the returned stock price is identical to the current stock price listed on the appropriate exchange </li>
</ol>
<h2><strong>Results </strong></h2>
<p>The results were mostly not surprising, but rather reinforcing. For example, we expected Group 1 to be rather random, but mostly aligned with ticker symbols they were familiar with. Of course, the majority (90%) of stock ticker symbols entered was MSFT and there was no significant difference in cultural background, locale, experience or educational background. (As this study was conducted at Microsoft I am sure there was some bias as to the symbol entered.) What was most interesting was that testers with no formal training (no previous courses in testing, no CS degree, and read less than one discipline specific book) and with more than 2 years of test experience were approximately more likely (25%) to violate the purpose of the test and enter random or completely invalid data as their first action. In other words, instead of executing the required test their initial reaction was to immediately go on a bug hunt. </p>
<p>In group 2 99% of the participants simply entered the stock ticker symbol “MSFT.” But, what was even more surprising was the fact that one the next day, the same people in that group were given the same exact test, and 95% of them simply reentered MSFT. Perhaps this is laziness, perhaps this is related to the superficial nature of the study, or perhaps this is due to individuals taking the path of least resistance. The percentage of people who entered identical stock ticker symbols on consecutive days was not significantly different between group 1 and group 2. </p>
<p>It should be no surprise that group 3 had the greatest distribution of variable test data applied to the web page. Demographics had no impact on any of the people who were in group 3. The majority of people in group 3 (78%) would select the first stock exchange listed (regardless of what link it was) but there was no significant overlap in the selected stock ticker symbols. When asked to repeat the test on the next day 83% of the participants selected a different link and and a different symbol. Of those who selected the same link 97% selected a different stock ticker symbol. On the down side, approximately 4% of the people simply took the path of least resistance and input MSFT as the test data on both days of the experiment. </p>
<h2><strong>Conclusion</strong> </h2>
<p>One of the most common problems I hear about ‘scripted,’ or pre-defined test cases is that they are too prescriptive and not flexible enough to allow the tester to try things. Of course, a well-designed test case is not simply a prescriptive set of steps inputting the same hard coded test data they run over and over. So, in this study we made the assumption that a scripted test case that specified “Enter MSFT in the textbox” would simply result in the tester entering “MSFT” without any thinking on the part of the tester. Hard-coding variable test data is often times the worse possible way to design a test case. </p>
<p>Vaguely written test cases added some level of variability, but also seemed to increase the probability of the tester executing context free tests outside the scope of the purpose of the test. In fact, what we found was some testers (approx 2%) simply went on a bug hunt and never actually input a valid stock ticker symbol at all during the session. </p>
<p>A test case that provided only one example that is representative of the type of test data required for the test case produced the least desirable results in this study. I am not sure this would be the case in practice. However, based on this study if I were to outsource execution of a test case similar to that used by group 2 the only thing I could guarantee is that MSFT would definitely be tested numerous times, and the variability of other test data would be extremely limited regardless of the number of testers executing that test or the number of iterations. </p>
<p>When faced with a virtually infinite number of possibilities for input variables as test data used in either positive or negative tests we need to test as many possibilities as possible given the available resources in order to increase test coverage and reduce overall risk. So, one way increase the coverage of test data while still achieving the specific purpose of the test case is to provide useful resources that help guide the tester while relying on the tester’s creative thinking skills and curiosity to expand the test coverage. </p>
<p>Of course, we can also increase variability of test data and capture the essence of the tester’s creativity using a similar approach in a well-designed automated test case as well. In fact, a similarly designed automated test case enables us to significantly increase the amount of variable test data that is exercised in order to expand test coverage and increase overall confidence. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.testingmentor.com/imtesty/2009/11/18/adding-variability-in-test-case-design/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Testing is Sampling</title>
		<link>http://www.testingmentor.com/imtesty/2009/11/18/testing-is-sampling/</link>
		<comments>http://www.testingmentor.com/imtesty/2009/11/18/testing-is-sampling/#comments</comments>
		<pubDate>Thu, 19 Nov 2009 05:27:56 +0000</pubDate>
		<dc:creator>Bj Rollison</dc:creator>
				<category><![CDATA[Testing Practices]]></category>
		<category><![CDATA[Sampling]]></category>
		<category><![CDATA[Test Tools]]></category>

		<guid isPermaLink="false">http://testingmentor.com/imtesty/2009/11/18/testing-is-sampling/</guid>
		<description><![CDATA[Originally Published Thursday, July 16, 2009
It seems it is about this time of year that I need to detach a bit from the world to reflect back on the past year and reevaluate my personal and professional goals moving forward. Perhaps I am just getting older or perhaps just a bit wiser (that is synonymous [...]]]></description>
			<content:encoded><![CDATA[<p>Originally Published Thursday, July 16, 2009</p>
<p>It seems it is about this time of year that I need to detach a bit from the world to reflect back on the past year and reevaluate my personal and professional goals moving forward. Perhaps I am just getting older or perhaps just a bit wiser (that is synonymous with &#8217;sapient&#8217; for the C-D crowd), but I find it refreshing to break away this time of year to tend to my gardens, work on my boat, read some novels, and contemplate life&#8217;s joys. Now, the major work projects are (almost) finished on my boat, the garden is planted and we are harvesting the early produce, and I reset both personal and professional development objectives for the next year and beyond. So, let me get back to sharing some of my ideas about testing.</p>
<p>Many of you who read this blog also know of my website <a href="http://www.TestingMentor.com">Testing Mentor</a> where I post a few job aids and random test data generation tools I&#8217;ve created. I am a big proponent of random test data using an approach I refer to as <em><strong>probabilistic stochastic test data</strong></em>.&#160; In May I was in Dusseldorf, Germany at the Software &amp; Systems Quality Conference to present a talk on my approach. I especially enjoy these <a href="http://www.sqs-conferences.com/index.htm">SQS conferences</a> (now igniteQ) because the attendees are a mix of industry experts and academia, and I was looking for feedback on my approach. I call my approach probabilistic stochastic test generation because the process is a bit more complex than simple random data generation. Similar to random data generation we cannot absolutely predict a <em>probabilistic</em> system, but we can control the feasibility of specified behaviors. And the adjective <em>stochastic</em> simply means &quot;pertaining to a process involving a randomly determined sequence of observations each of which is considered as a sample of one element from a probability distribution.&quot; In a nutshell, my approach involves segregating the population into equivalence partitions, then randomly selects elements from specified parameterized equivalence partitions (which is how we know the probability of specific behaviors), finally the data may be mutated until the test data satisfies the defined fitness criteria. By combining equivalence partitioning and basic evolutionary computation (EA) concepts it is possible to generate large amounts of random test data that is representative from a virtually infinite population of possible data.</p>
<p>One of the questions that came up during the presentation was how many random samples are required for confidence in any given test case; in other words how to we determine the number of tests using randomly generated test data? This is not an easy question to answer because the sample size of any given population depends on several factors such as:</p>
<ul>
<li>variability of data </li>
<li>precision of measurement </li>
<li>population size </li>
<li>risk factors </li>
<li>allowable sampling error </li>
<li>purpose of experiment or test </li>
<li>probability of selecting &quot;bad&quot; or uninteresting data </li>
</ul>
<h6><strong>Using sampling for equivalence class partition testing</strong></h6>
<p>But, the question also brought to mind a parallel discussion regarding how we go about selecting elements from equivalence class partition subsets. I am adamantly opposed to hard-coding test data in a test case (automated or manual), but a colleague challenged me and said that since any element in an equivalent partition is representative of all elements in that partition then why can&#8217;t we simple choose a few values from that equivalence subset. I realize this approach is done all the time by many testers; which is perhaps why we sometimes miss problems. But, hard-coding some small subset of values from a relatively large population of possible values is rarely a good idea, and is generally not the most effective approach for robust test design. One problem with hard-coding a variable is that the hard-coded value becomes static, and we know that static test data loses its effectiveness over time in subsequent tests using the same exact test data. Also, by hard-coding specific values in range of values means that we have absolutely 0% probability of including any other values in that range that are not specified. Another problem with hard-coded values stems from the selection criteria used to choose the values from a set of possible values. Typically we select values from a set based on based historical failure indicators, customer data, and our own biased judgment or intuition of ‘interesting’ values. </p>
<p>However, the problem is that any equivalence class partition is a hypothesis that all elements are equal. Of course, the only way to validate or affirm that hypothesis is to test the entire population of the given equivalence class partition. Using customer-like values, or values based on failure indicators, and especially values we select based on our intuition are biased samples of the population, and may only represent a small portion of the entire population. Also, the number of values selected from any given equivalence partition set is usually fewer than the number required for some reasonable level of statistical confidence. So, while we definitely want to include values representative of our customers, values derived from historical failure indicators, and even our own intuition, we should also apply scientific sampling methods and include unbiased, randomly sampled values or elements from our set of values or population to help reduce uncertainty and increase confidence.</p>
<p>For example, lets say that we are testing font size in Microsoft Word. Most font sizes range from 1pt through 1638pt and include half-sized fonts as well within that range. That is a population size of 3273 possible values. If we suspected that any value in the population had an equal probability of causing an error the standard deviation would be 50%. In this example, we would need a sample size of 343 statistically unbiased randomly selected values from the population to assert a 95% confidence level with a sampling error or precision of ±5%. Even in this situation, the number of values may appear to be quite large if the tests are manually executed which is perhaps one reason why extremely small subsets of hard-coded values fail to find problems that are exposed by other values within that equivalent partition (all too often after the software is released). Fortunately, statistical sampling is much easier and less costly with automated test cases and probabilistic random test data generation.</p>
<h6><strong>Testing is Sampling</strong></h6>
<p>Statistical sampling is commonly used for experimentation in natural sciences as well as studies in social sciences (where I first learned it while studying sociology an anthropology). And, if we really stop to think about it; any testing effort is simply a sample of tests of the virtually impossible infinite population of possible tests. Of course, there is always the probability that sampling misses or overlooks something interesting. But, this is true of any approach to testing and explained by B. Beizer&#8217;s Pesticide Paradox. The question we must ask ourselves is will statistical sampling of values in equivalence partitions or other test data help improve my confidence when used in conjunction with customer representative data, historical data, and data we intuit based on experience and knowledge?&#160; Will scientifically quantified empirical evidence help increase the confidence of the decision makers?</p>
<p>In my opinion anything that helps improve confidence and provides empirical evidence is valuable, and statistical sampling is a tool we should understand put into our professional testing toolbox. There are several well established formulas for calculating sample size that can help us establish a baseline for a desired confidence level. But, rather than belabor you with formulas, I decided to whip together a Statistical Sample Size Calculator that I posted to <a href="http://ssscalculator.codeplex.com/">CodePlex</a> and also on my <a href="http://www.TestingMentor.com">Testing Mentor</a> site to help testers determine the minimum number of samples of statistically unbiased randomly generated test data from a given equivalence partition to use in a test case to help establish a statistically reliable level of confidence. </p>
<p><em><strong>Cockamamie chaos causes confusion; controlled chaos cultivates confidence!</strong></em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.testingmentor.com/imtesty/2009/11/18/testing-is-sampling/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Better Bug Reports</title>
		<link>http://www.testingmentor.com/imtesty/2009/11/18/better-bug-reports/</link>
		<comments>http://www.testingmentor.com/imtesty/2009/11/18/better-bug-reports/#comments</comments>
		<pubDate>Thu, 19 Nov 2009 05:25:46 +0000</pubDate>
		<dc:creator>Bj Rollison</dc:creator>
				<category><![CDATA[Testing Practices]]></category>
		<category><![CDATA[Bug Reports]]></category>

		<guid isPermaLink="false">http://testingmentor.com/imtesty/2009/11/18/better-bug-reports/</guid>
		<description><![CDATA[Originally Published Wednesday, May 20, 2009 
When we report a bug our hope is that bug is fixed. But, of course we know that isn’t always the case which is why there are usually several alternative resolutions developers, project managers, or managers may choose for resolving a bug such as postponed, won’t fix, and by [...]]]></description>
			<content:encoded><![CDATA[<p>Originally Published Wednesday, May 20, 2009 </p>
<p>When we report a bug our hope is that bug is fixed. But, of course we know that isn’t always the case which is why there are usually several alternative resolutions developers, project managers, or managers may choose for resolving a bug such as postponed, won’t fix, and by design. It is unfortunately quite common to see a tester metaphorically explode into passionate fits of outrage when one of their bugs is resolved as postponed, won’t fix, or by design. It is unfortunate because these tantrums often involve the tester hurling personal insults (e.g. “How can the developer be so stupid not to fix this bug&quot;?”), decrying product quality (e.g. “If we don’t fix this bug this product will totally suck!”), and playing the whiny customer card (e.g. “We will loose customers if we don’t fix this bug.”). Yes, in my early years I was also guilty of these sorts of irrational outbursts of hyperbole when a bug that I thought was important was resolved not fixed. But, of course, I quickly learned that such sophistical speculations rarely resulted in the bug being fixed, and mostly lessened my credibility with developers and managers.</p>
<p>The other day I was speaking with a tester who was a bit miffed because the developer had resolved a few of her bugs as by design and won’t fix and she asked how she could ‘fight’ these resolutions. “Well,” I began, “Getting people to change their minds usually involves negotiation and the logical presentation of facts in a non-judgmental approach. Sometimes you will succeed, and sometimes you will not succeed. As testers surely we want all our bugs to be fixed; however, from a practical standpoint that may not always be the case especially if the bug is subjective.” I previously wrote about <a href="http://blogs.msdn.com/imtesty/archive/2006/06/28/649862.aspx">10 common problems with bug reporting</a>, but, in this case I proceeded to discuss a few strategies I use to advocate bugs.</p>
<p><strong>Make it easy for the developer to fix the bug</strong></p>
<blockquote><p>As a minimum a tester must provide a description of the problem, the environmental conditions in which the problem occurred (if localized to a specific environment), the shortest number of exact steps to reproduce the bug, and the actual results versus the expected results. Occasionally a screen shot may be beneficial, but mostly if there is a contrasting example. But, I will also point the developer to my test; especially if it is automated. Providing the developer an automated mechanism to reproduce a problem reduces a lot of overhead. Of course, in this case I am talking about an automated test case that runs in a few seconds, or an automated script that even assists the developer reproduce the problem quickly.</p>
</blockquote>
<p><strong>Provide specific contradictions to specified and/or implied requirements or standards</strong></p>
<blockquote><p>Of course, if the product design or functionality deviates from stated requirements pointing this out in a non-confrontational way is a no-brainer. The key here is our argument must be non-confrontational because sometimes we may misinterpret the requirements, and sometimes the requirements may change without us being aware of those changes. There are also occasionally deviations from implied requirements such a UI design guidelines as a result of the introduction of new technologies, or changes in how customers use the product based on usability studies. Other implied standards include competing products or previous versions of the product. In any case, when arguing for a bug fix based on specified or implied requirements I recommend using a compare and contrast type of approach to better illustrate the problem as I perceive it.</p>
</blockquote>
<p><strong>Provide concrete examples of customer impact</strong></p>
<blockquote><p>This is really important! Providing a real world scenario that clearly illustrates not only how this bug will manifest itself to the customer, but also providing corroborating evidence from customers presents a strong case in favor of a bug fix. There are several useful repositories of customer feedback testers can use to bolster their point of view such as newsgroups, popular blogs, trade journal reviews of past or similar products, at Microsoft we also have Watson and SQM data, and product support reports. Using ‘real-world’ constructive feedback is often more meaningful than an internal mutiny by a portion of the test team.</p>
</blockquote>
<p><strong>Know your primary target customer profile</strong></p>
<blockquote><p>Testers often like to think we are representative of our customers. However, this may not always be the case. (It has always puzzled me as to why testers seem to think they have some greater affinity to the end user customer as compared to others on the product team.) Yes, it is important that testers understand who the primary target customer is for the current project or release and that is why many teams have detailed personas of primary, secondary, and sometimes even tertiary customer audiences. Of course, if we are in the commercial software business we want our customer base to be as large as possible. But, as the number of customers increase so does the diversity of value, and as they say…you can never please everyone! So, when defending your position to fix a particular bug it is always better to frame the discussion from the point of view of the primary customer persona as compared to your own personal bias.</p>
</blockquote>
<p><strong>Use your brain, not your emotions</strong></p>
<blockquote><p>Passion has long been an admired trait in software testers. However, unbridled passion fraught with antagonistic accusations can be detrimental to a successful bug resolution (and sometimes even a career). Some bugs obviously need to be fixed, while others may be more dependent on several mitigating (and competing) factors such as where you are in the software lifecycle, business impact, primary customer impact, risk, etc. I think it is largely agreed that perhaps the primary role of testers is to provide information, but that means we must also gather the pertinent information and represent that information logically within the relevant context to the management team (or decision makers). Remember…reckless rants rarely render reasonable results! </p>
</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.testingmentor.com/imtesty/2009/11/18/better-bug-reports/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
