<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>I.M. Testy &#187; Metrics &amp; Measures</title>
	<atom:link href="http://www.testingmentor.com/imtesty/tag/metrics-measures/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.testingmentor.com/imtesty</link>
	<description>Treatises on the practice of software testing</description>
	<lastBuildDate>Thu, 01 Jul 2010 17:10:28 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Meaningful Measures</title>
		<link>http://www.testingmentor.com/imtesty/2010/03/21/meaningful-measures/</link>
		<comments>http://www.testingmentor.com/imtesty/2010/03/21/meaningful-measures/#comments</comments>
		<pubDate>Sun, 21 Mar 2010 10:22:26 +0000</pubDate>
		<dc:creator>Bj Rollison</dc:creator>
				<category><![CDATA[General Testing Topics]]></category>
		<category><![CDATA[Test Management]]></category>
		<category><![CDATA[Metrics & Measures]]></category>

		<guid isPermaLink="false">http://www.testingmentor.com/imtesty/2010/03/21/meaningful-measures/</guid>
		<description><![CDATA[I arrived in Switzerland on Monday morning and met with our team here in Zurich who work on the communication server. Tuesday I presented a tutorial on advanced combinatorial testing and delivered a keynote address at Swiss Testing Days on Wednesday. Unfortunately, I really didn’t get to spend a lot of time exploring the city, [...]]]></description>
			<content:encoded><![CDATA[<p>I arrived in Switzerland on Monday morning and met with our team here in Zurich who work on the communication server. Tuesday I presented a tutorial on advanced combinatorial testing and delivered a keynote address at Swiss Testing Days on Wednesday. Unfortunately, I really didn’t get to spend a lot of time exploring the city, but it was great to catch up with my long time friend James Whittaker. James and I also gave brief presentations at an executive dinner the night prior to the conference. It was also really nice to meet new friends from SwissQ who put together Swiss Testing Days. This was my first time to present at this conference and I was greatly impressed. More than 750 people attended the conference! It was quite an event and I hope to return next year.</p>
<p>At the executive dinner and during my keynote I discussed various challenges in software engineering that directly impact testers. One of those challenges we need to get our heads wrapped around is software measures. By software measures I am referring to objects in software engineering mapped to various scales in the mathematical world. Although we sometimes also use biased qualitative measures, such as “too slow,” if we are to be taken with any degree of credibility we have to define what to slow is and set a reasonable goal for ‘acceptable’ based on customer values.</p>
<p>As testers we expend a lot of cycles collecting buckets full of metrics. We spend time producing fancy charts, and spend countless hours ‘looking’ at the data as if it were some type of oracle that would speak to us and tell us what we wanted to know. In the best case we convince ourselves that the numbers are telling us what we want the numbers to tell us. In the worse case the decision makers do not even consider the measures, or we don’t analyze the data in an attempt to identify ways to improve some of our engineering processes and practices. In the end, all the fancy charts are taken off the walls only to be shredded and we start over.</p>
<p>We often get caught up in tracking mostly useless data such as bug count and code coverage. What in the world does bug count or code coverage tell us (or the decision makers) about quality? Nothing; absolutely nothing! Some people want to believe that finding a lot of bugs or have high levels of code coverage means better quality, but that is sort of like believing that you’ll find a pot of gold and a leprechaun at the end of every rainbow. So, why do we measure bug counts and code coverage? Simple…because they are easy to measure!</p>
<p>Good metrics are hard to define mostly because we don’t always have clear goals, or we use a scatter-gun approach to setting a bunch of disparate wishful goals (goals that we hope we can achieve, but nobody is accountable if we don’t). I personally advocate the <a href="http://www.cs.umd.edu/~basili/publications/technical/T78.pdf" target="_blank">Goal/Question/Metric paradigm</a> by Victor Basili. But, the biggest problem I have in using this approach is in establishing meaningful goals! People are generally good with coming up with superfluous objectives such as 100% automation or 80% code coverage. But, when you ask those people why they want 100% automation or 80% code coverage they retort only with a bunch of hand-waving and philosophical arguments. It seems we sometimes have difficulty expressing the ‘why’ of setting certain goals. Of course the answer in most cases is to ‘get better’ or ‘improve’ something! But, why? What is the business value?</p>
<p>Once we establish clear goals the next step is to understand the variables that we can manipulate to help us achieve those goals. Then we must decide on which ones we want to change that we think will have the biggest bang for the buck. Finally, we figure out which measures will let us know whether we are progressing towards our goal. (This usually isn’t a single point of measurement.)</p>
<p>At one time I naively believed that there was a core set of metrics that all teams should be collecting all the time that we could put into a ‘dashboard’ and compare across teams. In retrospect that was really a bone-headed notion. Identifying these measures is not easy, and there is no cookie-cutter approach. Each project team needs to decide on their specific goals that may increase customer value or impact business costs. Testers should ask themselves, “why are we measuring this?” “What actions will be taken as a result of these measures?” And, “if there is no actionable objective associated with this measure, then why am I spending time measuring this?”</p>
<p>At times is seems we are locked in a vicious cycle of relearning things via tribal knowledge, and we make decisions based mostly on ‘gut-feel’ and emotion. We collect a bunch of measures and display them similar to how the ancient Chinese used the mystical ‘<a href="http://www.columbia.edu/cu/lweb/indiv/eastasian/starrnews/oracle_bones.html" target="_blank">dragon bones</a>’ as oracles. But, if we are interested in being able to articulate business impact (either positive or negative) in a professional manner then we must be able to find ways to measure the things that are really important and actionable, and spend less time collecting numbers for wall decorations. At the end of the day someone is going to ask, “How do we know?” And trust me on this…really great managers will eat you alive if you answer with “well, we think…” or “we feel…” or try to evaluate success on some other subjective measure.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.testingmentor.com/imtesty/2010/03/21/meaningful-measures/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Code Coverage: More Than Just a Number</title>
		<link>http://www.testingmentor.com/imtesty/2010/01/21/code-coverage-more-than-just-a-number/</link>
		<comments>http://www.testingmentor.com/imtesty/2010/01/21/code-coverage-more-than-just-a-number/#comments</comments>
		<pubDate>Fri, 22 Jan 2010 02:09:22 +0000</pubDate>
		<dc:creator>Bj Rollison</dc:creator>
				<category><![CDATA[General Testing Topics]]></category>
		<category><![CDATA[Testing Practices]]></category>
		<category><![CDATA[Code Coverage]]></category>
		<category><![CDATA[Metrics & Measures]]></category>

		<guid isPermaLink="false">http://www.testingmentor.com/imtesty/2010/01/21/code-coverage-more-than-just-a-number/</guid>
		<description><![CDATA[When I was growing up I would sometimes go down into my grandfather’s basement. He had amassed a variety of tools during his lifetime and he was an excellent wood craftsman. I wasn’t allowed to touch any of the power tools, because his rule was, “if you don’t know how to use a tool properly [...]]]></description>
			<content:encoded><![CDATA[<p>When I was growing up I would sometimes go down into my grandfather’s basement. He had amassed a variety of tools during his lifetime and he was an excellent wood craftsman. I wasn’t allowed to touch any of the power tools, because his rule was, “<strong><em>if you don’t know how to use a tool properly then you shouldn’t play with it</em></strong>.”</p>
<p>Of course, I am a bit of a hard head (even back then) and one day I started playing with the wood lathe while my grandfather was upstairs. Everything seemed to be going pretty well until I pushed the chisel in too far too fast and the wood split and went flying. One piece shattered the overhead light and the other piece ricocheted off the back of my hand leaving an nice gash. I shut off the machine and ran upstairs. After my grandmother cleaned and wrapped my hand, my grandfather made me go back downstairs and clean up the mess and stood over me with a stern look of disapproval making sure I wiped up my blood trail. After that incident, I heeded my grandfather’s advice, at least in his basement shop.</p>
<p>Anyway, with the recent discussions of code coverage around the testing blogosphere I started thinking about what was really being discussed. The discussions (as is the case with most discussions about code coverage) were not actually about the application code coverage as a tool, but more about the code coverage metric. And more specifically the discussions were about how not to assume a high measure of code coverage implies something is well tested. Interestingly enough, 2 years ago I wrote a <a href="http://www.testingmentor.com/imtesty/2009/11/13/the-code-coverage-metric-is-inversely-proportional-to-the-criticality-of-the-information-it-provides/" target="_blank">post</a> illustrating how the metric can be gamed and how the code coverage measure tells us nothing about quality or test effectiveness, but also alluded to how it might be used more effectively.</p>
<p>I thought that how the metric is sometimes misused is mostly self-evident, but then I realized that almost every time testers start talking about code coverage the discussion tends to focus on the metric. This may seem a bit harsh, but if a person&#8217;s only contribution to a conversation about code coverage is about how the metric doesn’t relate to quality or testing effectiveness then that person should not be allowed to play with hammers, and employing more complex tools such a wheel-barrows are well beyond that person&#8217;s comprehension.</p>
<p>Only thinking of code coverage as a means to get some magic number is akin to thinking “how many nails can I pound with this hammer. The metric itself is mostly irrelevant; and it is completely irrelevant if you don’t know how to interpret it in a way that helps you as a tester. Think about it this way; if we told our managers “our tests achieved 80% code coverage” some of our managers would be elated. (Of course IMHO, these types of managers are metric morons.) But, what do you think these same pointy headed number zombies would say if we told them “we ran our tests and we only missed testing 20% of the code.” I suspect they would start pacing back and forth in the room mumbling “We must run more tests, we must run more tests.”</p>
<p>When we stop thinking of code coverage as a simply measure where our only use of the tool is to try and achieve some magical number then perhaps we can start thinking about how to actually use code coverage as an effective tool to help us design tests (in under-tested or untested areas of the code), reduce potential risk, and possibly even drive quality upstream.</p>
<p>For example, one of my mentees is currently working on a project that uses just in time code coverage as a tool to evaluate how tests exercise changed code and downstream dependencies prior to checking code changes (e.g. bug fixes) back into the main tree. The initial pushback by some members of the team (including some pointy headed managers) was “code coverage doesn’t tell us about product quality” or “its too hard to achieve 80% code coverage” (although no such goal had been mentioned), and my personal favorite, “it’s too difficult to get everyone to measure coverage.” I reminded my mentee that the project is not about achieving some magic number, and in fact, it’s really not even about measuring at all. It’s about using the tool to discover information and to help us design additional functional tests at the API or component level that we might otherwise overlook to help prevent downstream regressions. In a nutshell, its about using code coverage as a defect prevention tool in this case.</p>
<p>Bottom line, code coverage is a tool! If you don’t know how to use it to improve your testing, well…</p>
]]></content:encoded>
			<wfw:commentRss>http://www.testingmentor.com/imtesty/2010/01/21/code-coverage-more-than-just-a-number/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Measuring Test Automation ROI</title>
		<link>http://www.testingmentor.com/imtesty/2009/11/18/measuring-test-automation-roi/</link>
		<comments>http://www.testingmentor.com/imtesty/2009/11/18/measuring-test-automation-roi/#comments</comments>
		<pubDate>Thu, 19 Nov 2009 06:42:54 +0000</pubDate>
		<dc:creator>Bj Rollison</dc:creator>
				<category><![CDATA[Test Automation]]></category>
		<category><![CDATA[Metrics & Measures]]></category>

		<guid isPermaLink="false">http://testingmentor.com/imtesty/2009/11/18/measuring-test-automation-roi/</guid>
		<description><![CDATA[Originally Published Tuesday, August 25, 2009
I just finished reading Implementing Automated Software Testing by E.Dustin, T. Garrett, and B. Gauf and overall this is a good read providing some well thought out arguments for beginning an automation project, and provides strategic perspectives to manage a test automation project. The first chapter made several excellent points [...]]]></description>
			<content:encoded><![CDATA[<p>Originally Published Tuesday, August 25, 2009</p>
<p>I just finished reading Implementing Automated Software Testing by E.Dustin, T. Garrett, and B. Gauf and overall this is a good read providing some well thought out arguments for beginning an automation project, and provides strategic perspectives to manage a test automation project. The first chapter made several excellent points such as:</p>
<ul>
<li>Automated software testing “<strong>is software development</strong>.” </li>
<li>Automated software testing “and manual testing are intertwined and <strong>complement </strong>each other.” </li>
<li>And, “The overall objective of AST (automated software testing) is to design, develop, and deliver an automated test and retest capability that<strong> increases testing efficiencies</strong>.”</li>
</ul>
<p>Of course, I was also pleased to read the section on test data generation since I design and develop<a href="http://www.testingmentor.com/tools/testdatagenerators.htm"> test data generation tools</a> as a hobby. The authors correctly note that random test data increases flexibility, improve functional testing, and reduce limited in scope and error prone manually produced test data.</p>
<p>There is also a chapter on presenting the business case for an automation project by calculating a return on investment (ROI) measure via various worksheets. I have 2 essential problems with ROI calculations within the context of test automation. First, if the business manager doesn’t understand the value of automation within a complex software project (especially one which will have multiple iterations) they should read a book on managing software development projects. I really think most managers understand that test automation would benefit their business (in most cases). I suspect many managers have experienced less than successful automation projects but don’t understand how to establish a more successful automation effort. I also suspect really bright business managers are not overly impressed with magic beans. </p>
<p>Magic beans pimped by a zealous huckster are the second essential problem with automation ROI calculations. Let’s be honest, the numbers produced by these worksheets or other automation ROI calculators are simply magic beans. Now, why do I make this statement? Because the numbers that are plugged into the calculators or worksheets are <a href="http://www.jargondatabase.com/Jargon.aspx?id=9903">ROMA data</a>. I mean really, how many of us can realistically predict the number of atomic tests for any complex project? Also, do all tests take the same amount of time, or will all tests be executed the same number of iterations? Does it take the same amount of time to develop all automated tests, and how does one go about predicting a realistic time for all automated tests to run? And of course, how many of those tests will be automated? (Actually, that answer is easy….the number of automated tests should be 100% of the tests that should be automated.)</p>
<p>Personally, I think test managers should not waste their time trying to convince their business manager of the value of a test automation project; especially with magic beans produced from ROMA data. Instead test managers should start helping their team members think about ROI at the test level itself. In other words, teach your team how to make smart decisions about what tests to automate and what tests should not be automated because they can be more effectively tested via other approaches.</p>
<p>In my next post I will outline some factors that testers, and test managers can use to help decide which tests you might consider automating. Basically, the bottom line here is that an automated test should provide significant value to the tester and the organization, and should help free up the testers time in order to increase the breadth and/or scope of testing.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.testingmentor.com/imtesty/2009/11/18/measuring-test-automation-roi/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Assessing Tester Performance</title>
		<link>http://www.testingmentor.com/imtesty/2009/11/18/assessing-tester-performance/</link>
		<comments>http://www.testingmentor.com/imtesty/2009/11/18/assessing-tester-performance/#comments</comments>
		<pubDate>Thu, 19 Nov 2009 06:07:23 +0000</pubDate>
		<dc:creator>Bj Rollison</dc:creator>
				<category><![CDATA[Test Management]]></category>
		<category><![CDATA[Metrics & Measures]]></category>

		<guid isPermaLink="false">http://testingmentor.com/imtesty/2009/11/18/assessing-tester-performance/</guid>
		<description><![CDATA[Originally Published Tuesday, April 28, 2009 
Using context-free software product measures as personal performance indicators (KPI) is about as silly as pet rocks!
Periodically a discussion of assessing tester performance surfaces on various discussion groups. Some people offer advice such as counting bugs (or some derivation thereof), number of tests written in x amount of time, [...]]]></description>
			<content:encoded><![CDATA[<p>Originally Published Tuesday, April 28, 2009 </p>
<p>Using context-free software product measures as personal performance indicators (KPI) is about as silly as <a href="http://en.wikipedia.org/wiki/Pet_rock">pet rocks</a>!</p>
<p>Periodically a discussion of assessing tester performance surfaces on various discussion groups. Some people offer advice such as counting bugs (or some derivation thereof), number of tests written in x amount of time, number of tests executed, % of automated tests compared to manual tests, and (my one of my least favorite measures of individual performance) % of code coverage. </p>
<p>The problem with all these measures is they lack context, and tend to ignore dependent variables. It is also highly likely that an astute tester can easily game the system and potentially cause detrimental problems. For example, if my manager considered one measure my performance on the number of bugs found per week, I would ask how many I had to find per week to satisfy the &#8216;expected&#8217; criteria. Then each week I would report 2 or 3 more bugs than the &#8216;expected&#8217; or &#8216;average&#8217; number (in order to &#8216;exceed&#8217; expectations), and any additional bugs I found that week, I would sit on and hold in case I was below my quota the following week. Of course, this means that bug reports are being artificially delayed which may negatively impact the overall product schedule. </p>
<p>The issue at hand is this bizarre desire by some simple-minded people who want an easy solution to a difficult problem. But, there is no simple formula for measuring the performance of an individual. Individual performance assessments are often somewhat subjective, and influenced by external factors identified through <a href="http://www.humanperformancetechnology.org/">Human Performance Technology (HPT)</a> research such as motivation, tools, inherent ability, processes, and even the physical environment.</p>
<p>A common problem I often see is unrealistic goals such as &quot;Find the majority of bugs in my feature area.&quot; (How do we know what the majority is? What if the majority doesn&#8217;t include the most important issues? etc.) Another problem I commonly see is for individuals to over-promise and under-deliver relative to their capabilities. I also see managers who dictate the same identical set of performance goals to all individuals. While there may be a few common goals, as a manager I would want to tap into the potential strengths of each individual on my team. I also have different expectations and levels of contributions from individuals depending on where they are in their career, and also based on their career aspirations.</p>
<p>So, as testers we must learn to establish <a href="http://www.topachievement.com/smart.html">SMART</a> goals with our managers that include:</p>
<ul>
<li>goals that align with my manager&#8217;s goals </li>
<li>goals that align with the immediate goals of the product team or company </li>
<li>and stretch goals that illustrate continued growth and personal improvement relative to the team, group, or company goals</li>
</ul>
<p>(This last one may be controversial; however, we shouldn&#8217;t be surprised to know individual performance is never constant in relation to your peer group. )</p>
<p>But, (fair or not) for a variety of reasons most software companies do (at least periodically) evaluate their employee performance in some manner, the key to success is in HPT and agreeing on SMARTer goals upfront.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.testingmentor.com/imtesty/2009/11/18/assessing-tester-performance/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Are Software Metrics and Measurements Really Important?</title>
		<link>http://www.testingmentor.com/imtesty/2009/11/12/are-software-metrics-and-measurements-really-important/</link>
		<comments>http://www.testingmentor.com/imtesty/2009/11/12/are-software-metrics-and-measurements-really-important/#comments</comments>
		<pubDate>Thu, 12 Nov 2009 18:32:45 +0000</pubDate>
		<dc:creator>Bj Rollison</dc:creator>
				<category><![CDATA[General Testing Topics]]></category>
		<category><![CDATA[Metrics & Measures]]></category>

		<guid isPermaLink="false">http://testingmentor.com/imtesty/2009/11/12/are-software-metrics-and-measurements-really-important/</guid>
		<description><![CDATA[Originally Published Monday, November 20, 2006
On a recent flight back from Boston to Seattle I decided to read Measuring the Software Process: A practical guide to functional measurements by David Garmus and David Herron. The book does a really good job of explaining functional points, what they are, how to identify them, and how to [...]]]></description>
			<content:encoded><![CDATA[<p>Originally Published Monday, November 20, 2006</p>
<p>On a recent flight back from Boston to Seattle I decided to read <i>Measuring the Software Process: A practical guide to functional measurements</i> by David Garmus and David Herron. The book does a really good job of explaining functional points, what they are, how to identify them, and how to use them for scheduling estimates, benchmarking, and process improvements. The use of functional points hasn’t exactly proliferated throughout the industry, so the overall value of the book in terms of practicality must be judged on a case by case basis. Microsoft doesn’t use functional points (at least not that I am aware of), so I read the book mainly for new insights and perspectives on measures in general, and to really understand the whole concept of functional points. In all honesty, it was quite a dull read, but that is not to say I didn’t learn anything. </p>
<p>In fact, I would say that the first two chapters of the book are excellent and perhaps well ahead of their time. Although the book was published in 1996, the first chapter discusses software as a business, and the second discusses performance measurements. Below are some key points I took from the book (primarily the 1<sup>st</sup> and 2<sup>nd</sup> chapters).</p>
<p><b>The business case for metrics and measurements</b></p>
<p>“Developing software is more of a business or a process than an art form,… a business or process needs to be managed through the use of various control functions.” <b>“The key to successful risk management is in the ability to measure.”</b> “In order to be successful a rigorous and well-thought path to managing these [risk] issues must be continuously developed. At the hear of it all is the notion of having key business measures. Industry gurus have told us for years that we need to measure what we manage.” These excerpts from the book succinctly illustrate the importance of a measurement program from a business perspective. If our only measures are bug counts and “smiley face type” customer satisfaction surveys how to we know if we are improving in critical areas of success that are important for the business or the customer? The metrics and measures that are critical for the success of a software project vary, so I won’t be so pretentious as to suggest one over another. However, I will say that without identifying critical “pain-points,” developing a formula to establish a baseline, and continue with a long term (at least 5 years) plan to assess strategic changes using (reasonably) consistent measures we shall never know if our processes are improving. </p>
<p><b>Why is measuring software process so hard?</b></p>
<p>“As software professionals, we have not fully acknowledged the fact that developing software is a business unto itself that requires unique measures and monitors.” <b>“…the ability to consistently and accurately quantify return on investment for software technologies does not exist at the present time.</b>” So how do we really know if we are becoming more effective or efficient? Some companies use CMMI as a measure. But, let me say this about the CMM. I spoke to Watts Humphrey a few years ago when he introduced TSP/PSP, and asked him about the CMM with regards to measuring maturity level and he stated the CMM was not created as a tool to measure an organization’s capabilities or abilities. The book, Measuring the Software Process also reiterates that by stating, “<b>There are no quantitative measures directly associated with the SEI maturity model.</b> In other words, there is no opportunity, based on the results of the assessment, to determine the quantitative value of moving from a Level 1 organization to a Level 2 organization.” Watt’s did say the primary driving force behind Level measures for an organization was the military (which is understandable because of the bureaucratic policies and political oversight which force falsification and artificial inflation of facts). Also, we need to learn that short-term measures that are narrowly focused change behavior and result biased results. The key for successful measurement programs will be for an organization to identify the key performance indicators (KPI) that are critical for success of the business and identify a means to measure those effectively long term with minimal impact (measures should not be a distraction from the day to day work) to the organization. Most importantly, we all agree the role of testing is to provide information, but we need to start providing quantifiable information rather than “best-guess” or “feel-good” type estimations. </p>
<p><b>Are measures really important?</b></p>
<p>The conclusion from a case study in the book summarized the importance of measures and metrics very clearly. “… project productivity is increased as quality increases. <strong>In order to increase quality and productivity, weaknesses must be identified in the methods currently used and steps taken to strengthen these areas of our software development process</strong>. To accomplish this, factors must first be measured – the ones that influence productivity.” I think this speaks a lot to some of the recent trends in the industry such as test driven development (TDD). We all know that thoughtful design and unit testing is a generally a best practice (especially compared to simply writing a bunch of code and throwing it over the wall for a bunch of testers to bang on in hopes of finding all the defects). But, I doubt we can actually quantify the value in terms of cost or resource allocation. So, if we really want to know whether or not we are improving our effectiveness and efficiency then we should really spend some time understanding why measures are important, and define critical metrics (from both a business and customer standpoint). </p>
<p>The true value of a testing effort is in its ability to accurately assess risk and product ‘quality’ (however you define quality.) I wouldn’t pay a vendor to test a product if they couldn’t provide me with concrete evidence and empirical results on what was tested and how it was tested. Measuring software quality, measuring productivity, and measuring effectiveness are really hard problems. But, I suspect that as long as we ignore these issues, and unwilling to understand the value of metrics, or simply base decisions on short-term (biased) measures, the role of testing will continue to be viewed with skepticism and little more than simply glorified bug hunting.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.testingmentor.com/imtesty/2009/11/12/are-software-metrics-and-measurements-really-important/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Test Effectiveness</title>
		<link>http://www.testingmentor.com/imtesty/2009/11/11/test-effectiveness/</link>
		<comments>http://www.testingmentor.com/imtesty/2009/11/11/test-effectiveness/#comments</comments>
		<pubDate>Wed, 11 Nov 2009 19:12:58 +0000</pubDate>
		<dc:creator>Bj Rollison</dc:creator>
				<category><![CDATA[General Testing Topics]]></category>
		<category><![CDATA[Metrics & Measures]]></category>

		<guid isPermaLink="false">http://testingmentor.com/imtesty/2009/11/11/test-effectiveness/</guid>
		<description><![CDATA[Originally Published Friday, October 13, 2006
Boris Biezer stated black box testing was approximately 35 to 65% effective. I had also read that Gerald Weinberg conducted studies at IBM with similar results. I recently spoke at the SQS conference in London and in the opening presentation Bob Barlett stated that SQS studies indicated that formal test [...]]]></description>
			<content:encoded><![CDATA[<p>Originally Published Friday, October 13, 2006</p>
<p>Boris Biezer stated black box testing was approximately 35 to 65% effective. I had also read that Gerald Weinberg conducted studies at IBM with similar results. I recently spoke at the SQS conference in London and in the opening presentation Bob Barlett stated that SQS studies indicated that formal test design was almost twice as effective in defect detection per test case as compared to expert (exploratory) type testing, and of course put into perspective the infamous &#8220;death by checklist&#8221; syndrome.</p>
<p><img src="http://blogs.msdn.com/photos/imtesty/images/822569/original.aspx" alt="" width="500" height="358" /></p>
<p>About 4 years ago I began a 3 year study at Microsoft to verify assertions on testing effectiveness from a black box approach. I used Weinberg’s famous Triangle paradigm for the assessment. Given a brief functional requirement participants in the case study were asked to define tests to validate a program written in C# against the stated requirements. The basic requirements are outlined in Glendford Myer’s book <em>The Art of Software Testing</em> as “A program reads three (3) integer values. The three values are interpreted as representing the lengths of the sides of a triangle. The program displays a message that states whether the triangle is scalene, isosceles, or equilateral.”</p>
<p>Based on the implementation in C# (pseudo code below) and assuming that all inputs are valid integer values we determined the minimum number of tests to validate a program against this functional requirement is 11 tests as outlined below.</p>
<p>if (a + b &lt;= c) or (b + c &lt;= a) or (a + c &lt;= b)<br />
then invalid triangle<br />
else if (a equals b) and (b equals c)<br />
then equilateral triangle<br />
else if (a not equal b) and (b not equal c) and (a not equal c)<br />
then scalene triangle<br />
else isosceles triangle</p>
<p>The minimum tests for conditional control flow and data flow (again assuming valid integer inputs)</p>
<ul>
<li>6 tests to validate the invalid triangle path<br />
a + b &lt; c<br />
a + b = c<br />
etc.</li>
<li>1 test for the equilateral path</li>
<li>1 tests for scalene</li>
<li>3 tests for isosceles (which actually verify the false outcomes in the sub-expressions of the scalene predicate statement)</li>
</ul>
<p>I collected data for 3 years with more than 500 participants ranging from &lt; 6 months to more than 5 years testing experience but non having formal training in testing techniques or methodologies. Interestingly enough, the data changed very little from the first few groups. The empirical results of this case study demonstrate the average effectiveness of tests in the most critical area of the program was only 36%. This literally means that of the minimum 11 tests for control and data flow coverage this section of code the average tester defined only 4 tests (1 test for invalid, 1 for equilateral, 1 for scalene, and 1 for isosceles). During this time period Microsoft was also making a transition to hire testers with greater technical competence and coding skills. Perhaps not surprising to most, the testers with a coding background increased the test effectiveness ratio by 50%.</p>
<p>This is just a small snap shot of the overall case study, but the overall conclusions determined that untrained testers using only an exploratory black box approach to testing are less effective and non-technical testers are 50% more likely to perform redundant or ineffective tests as compared to testers with greater technical competence (not necessarily coding skills, but a greater understanding of the entire system under test.)</p>
<p>Some managers at Microsoft scoffed at these results. One said that if he asked one of his non-technical testers to test the design of a new coffee cup that person would probably do better as compared to someone with a computer science background. OK…he probably has a point. But, I would argue that Microsoft and many other software companies are in the business of producing technological solutions to customers, and not in the business of making mugs (unless perhaps Microsoft’s ceramic team is in building 7 and that is a new LOB I don’t know about). The bottom line is that formal training in established, time proven formal functional and structural techniques can increase effectiveness of testers and reduce potential risk in a software project.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.testingmentor.com/imtesty/2009/11/11/test-effectiveness/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Software Metrics: Guidelines for Establishing Effective Measurement Programs</title>
		<link>http://www.testingmentor.com/imtesty/2009/11/10/software-metrics-guidelines-for-establishing-effective-measurement-programs/</link>
		<comments>http://www.testingmentor.com/imtesty/2009/11/10/software-metrics-guidelines-for-establishing-effective-measurement-programs/#comments</comments>
		<pubDate>Tue, 10 Nov 2009 08:34:40 +0000</pubDate>
		<dc:creator>Bj Rollison</dc:creator>
				<category><![CDATA[General Testing Topics]]></category>
		<category><![CDATA[Metrics & Measures]]></category>

		<guid isPermaLink="false">http://testingmentor.com/imtesty/2009/11/10/software-metrics-guidelines-for-establishing-effective-measurement-programs/</guid>
		<description><![CDATA[One of the most important artifacts we produce as testers are measures of the software product or processes. Many people scoff at metrics, but there is little doubt management team wants to see numbers. But, the true value of an effective measurement program is not simply the numbers by themselves, it is the ability of [...]]]></description>
			<content:encoded><![CDATA[<p>One of the most important artifacts we produce as testers are measures of the software product or processes. Many people scoff at metrics, but there is little doubt management team wants to see numbers. But, the true value of an effective measurement program is not simply the numbers by themselves, it is the ability of the test team to analyze the data to look for trends and to provide valuable information regarding risk.</p>
<p>In order to effectively analyze data testers should collect metrics with clear goals that directly impact the product or team processes the management team identifies as needing improvement. Once the test team knows the specific area they need to assess, they need to define the appropriate set of measures that will best indicate whether or not the team is tracking towards the goals. The ultimate purpose of any successful metrics program is to produce information that will help the management team make strategic decisions; it is not simply to collect numbers and build fancy charts and graphs to hang on a wall.</p>
<p>Below are some guidelines for effective metrics programs.</p>
<ul>
<li><strong>Define the issue –</strong> ask the management team what keeps them up at night and why it is a primary area of concern</li>
<li><strong>Establish clear goals –</strong> ask the management team how they would define success for each issue defined</li>
<li><strong>Identify the proper metrics –</strong> make sure the measurements directly relate to the specific problem or issue being addressed, and the metrics can illustrate immediate benefit to the team</li>
<li><strong>Use more than one measurement –</strong> it is not a good idea to make a decision based upon one metric by itself; proper analysis of data requires multiple perspectives (or ways of measuring) the area being evaluated</li>
<li><strong>Select simple metrics –</strong> the measurements should be clearly understood by those collecting the data and the management team</li>
<li><strong>Make data collection easy –</strong> the best measurements are built into the production process so data collection does not distract from productivity</li>
<li><strong>Create a baseline –</strong> set a baseline to track direction towards (or away) from targeted goals</li>
<li><strong>Communicate metric program goals –</strong> preferably management should communicate the problem space and discuss what success looks like to get buy-in from the team (Remember…the testing team is collecting and analyzing the data; it’s the management team that will ultimately effect change!)</li>
<li><strong>Periodically review the measures – </strong>review the measurements periodically to validate the metrics are evaluating the issue effectively</li>
<li><strong>Avoid changing metrics during a project –</strong> changing what is being measured during a project cycle can disrupt the team and denigrate the overall use of metrics</li>
<li><strong>Prevent abusing the metrics –</strong> avoid using metrics to analyze things which are tangential to the specific issue the metrics are established to evaluate, and avoid changing or with-holding data to “paint a prettier picture” for management or the team</li>
<li><strong>Publicize the data and analysis –</strong> make the metric data and an analysis of the data publically available to the whole team; transparency of the information is critical to the success of change resulting from a metrics program</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.testingmentor.com/imtesty/2009/11/10/software-metrics-guidelines-for-establishing-effective-measurement-programs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bug Counts as Key Performance Indicators (KPI) for Testers</title>
		<link>http://www.testingmentor.com/imtesty/2009/11/10/bug-counts-as-key-performance-indicators-kpi-for-testers/</link>
		<comments>http://www.testingmentor.com/imtesty/2009/11/10/bug-counts-as-key-performance-indicators-kpi-for-testers/#comments</comments>
		<pubDate>Tue, 10 Nov 2009 08:28:48 +0000</pubDate>
		<dc:creator>Bj Rollison</dc:creator>
				<category><![CDATA[Test Management]]></category>
		<category><![CDATA[Metrics & Measures]]></category>

		<guid isPermaLink="false">http://testingmentor.com/imtesty/2009/11/10/bug-counts-as-key-performance-indicators-kpi-for-testers/</guid>
		<description><![CDATA[Originally Published Monday, June 26, 2006
Every once in awhile I meet testers who say their manager rates individual performance based on bug metrics. It is no secret that management is constantly looking at bug metrics. But, bug numbers are generally a poor indication of any direct meaningful measure, especially individual human performance. Yet, some managers [...]]]></description>
			<content:encoded><![CDATA[<p>Originally Published Monday, June 26, 2006</p>
<p>Every once in awhile I meet testers who say their manager rates individual performance based on bug metrics. It is no secret that management is constantly looking at bug metrics. But, bug numbers are generally a poor indication of any direct meaningful measure, especially individual human performance. Yet, some managers continue this horrible practice and even create fancy spreadsheets with all sorts of formulas to analyze bug data in relation to individual performance. Number of bugs reported, fix rates, severity, and other data points are tracked in a juvenile attempt to come up with some comparative performance indicator among testers. Perhaps this is because bugs numbers are an easy metric to collect, or perhaps it is because management maintains the antiquated view that the purpose of testing is to simply find bugs!</p>
<p>Regardless of the reasons, using bug numbers as a direct measure of individual performance is ridiculous. There are simply too many variables in bug metrics to use these measures in any form of comparative analysis for performance. Consider a team of testers of equal skills, experience and domain knowledge there are several factors that affect the number of defects or defect resolutions such as:</p>
<p>· <strong>Complexity</strong> –the complexity coefficient for a feature area under test impacts risk. For example a feature with a high code complexity measure has higher risk and may have a greater number of potential defects as compared to a feature with a lower code complexity measure.</p>
<p>· <strong>Code maturity</strong> – a product or feature with a more mature code base may have less defects than a newer product or feature.</p>
<p>· <strong>Defect density</strong> – a new developer may inject more defects than an experienced developer. A developer that performs code reviews and unit tests will likely produce less defects in their area as compared to a developer who simply throws his or her code over the wall. Are defect density ratios used to normalize bug counts?</p>
<p>· <strong>Initial design</strong> – if the customer needs are not well understood, or if the requirements are not thought out before the code is written then there will likely be lots of changes. Changes in code are more likely to produce defects as compared to ‘original’ code.</p>
<p>Attempting to use bug counts as performance indicators must also take into account the relative value of reported defects. For example, surely more severe issues such as data loss are given more weight compared to simple UI problems such as a misspelled word. And we all know the sooner defects are detected the cheaper they are in the grand scheme of things. So, defects reported earlier are certainly valued more than defects reported later in the cycle. Also, we all know that not all defects will be fixed. Some defects reported by testers will be postponed, some will simply will not be fixed, and others may be resolved as “by design.” A defect that the management team decides not to fix is still a defect! Just because the management team decides not of fix the problem doesn’t totally negate the value of the bug.</p>
<p>The bottom line is that using bug metrics to analyze trends is useful, but using them to assess individual performance or comparative performance among testers is absurd. Managers who continue to use bug count as performance indicators are simply lazy, or don’t understand testing well enough to evaluate key performance indicators of professional testers.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.testingmentor.com/imtesty/2009/11/10/bug-counts-as-key-performance-indicators-kpi-for-testers/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
