I.M. Testy

Treatises on the practice of software testing

Meaningful Measures

with 2 comments

I arrived in Switzerland on Monday morning and met with our team here in Zurich who work on the communication server. Tuesday I presented a tutorial on advanced combinatorial testing and delivered a keynote address at Swiss Testing Days on Wednesday. Unfortunately, I really didn’t get to spend a lot of time exploring the city, but it was great to catch up with my long time friend James Whittaker. James and I also gave brief presentations at an executive dinner the night prior to the conference. It was also really nice to meet new friends from SwissQ who put together Swiss Testing Days. This was my first time to present at this conference and I was greatly impressed. More than 750 people attended the conference! It was quite an event and I hope to return next year.

At the executive dinner and during my keynote I discussed various challenges in software engineering that directly impact testers. One of those challenges we need to get our heads wrapped around is software measures. By software measures I am referring to objects in software engineering mapped to various scales in the mathematical world. Although we sometimes also use biased qualitative measures, such as “too slow,” if we are to be taken with any degree of credibility we have to define what to slow is and set a reasonable goal for ‘acceptable’ based on customer values.

As testers we expend a lot of cycles collecting buckets full of metrics. We spend time producing fancy charts, and spend countless hours ‘looking’ at the data as if it were some type of oracle that would speak to us and tell us what we wanted to know. In the best case we convince ourselves that the numbers are telling us what we want the numbers to tell us. In the worse case the decision makers do not even consider the measures, or we don’t analyze the data in an attempt to identify ways to improve some of our engineering processes and practices. In the end, all the fancy charts are taken off the walls only to be shredded and we start over.

We often get caught up in tracking mostly useless data such as bug count and code coverage. What in the world does bug count or code coverage tell us (or the decision makers) about quality? Nothing; absolutely nothing! Some people want to believe that finding a lot of bugs or have high levels of code coverage means better quality, but that is sort of like believing that you’ll find a pot of gold and a leprechaun at the end of every rainbow. So, why do we measure bug counts and code coverage? Simple…because they are easy to measure!

Good metrics are hard to define mostly because we don’t always have clear goals, or we use a scatter-gun approach to setting a bunch of disparate wishful goals (goals that we hope we can achieve, but nobody is accountable if we don’t). I personally advocate the Goal/Question/Metric paradigm by Victor Basili. But, the biggest problem I have in using this approach is in establishing meaningful goals! People are generally good with coming up with superfluous objectives such as 100% automation or 80% code coverage. But, when you ask those people why they want 100% automation or 80% code coverage they retort only with a bunch of hand-waving and philosophical arguments. It seems we sometimes have difficulty expressing the ‘why’ of setting certain goals. Of course the answer in most cases is to ‘get better’ or ‘improve’ something! But, why? What is the business value?

Once we establish clear goals the next step is to understand the variables that we can manipulate to help us achieve those goals. Then we must decide on which ones we want to change that we think will have the biggest bang for the buck. Finally, we figure out which measures will let us know whether we are progressing towards our goal. (This usually isn’t a single point of measurement.)

At one time I naively believed that there was a core set of metrics that all teams should be collecting all the time that we could put into a ‘dashboard’ and compare across teams. In retrospect that was really a bone-headed notion. Identifying these measures is not easy, and there is no cookie-cutter approach. Each project team needs to decide on their specific goals that may increase customer value or impact business costs. Testers should ask themselves, “why are we measuring this?” “What actions will be taken as a result of these measures?” And, “if there is no actionable objective associated with this measure, then why am I spending time measuring this?”

At times is seems we are locked in a vicious cycle of relearning things via tribal knowledge, and we make decisions based mostly on ‘gut-feel’ and emotion. We collect a bunch of measures and display them similar to how the ancient Chinese used the mystical ‘dragon bones’ as oracles. But, if we are interested in being able to articulate business impact (either positive or negative) in a professional manner then we must be able to find ways to measure the things that are really important and actionable, and spend less time collecting numbers for wall decorations. At the end of the day someone is going to ask, “How do we know?” And trust me on this…really great managers will eat you alive if you answer with “well, we think…” or “we feel…” or try to evaluate success on some other subjective measure.

Written by Bj Rollison

March 21st, 2010 at 2:22 am

hallerman.deadra@mailxu.com perrottchelsea@mailxu.com sitrarsk