I.M. Testy

Treatises on the practice of software testing

Archive for the ‘General Testing Topics’ Category

Perceptions of Testing

with 4 comments

This week I was in Ho Chi Minh City, Vietnam. After 3 days of attending the VistaCon 2010 conference and 1 day visiting the MS office here I got to explore the city, visit some museums, and end my trip by zipping around the city on a rented scooter. What great fun! Michael Hackett and I rented scooters on Saturday, drove around for a bit, then around noon got separated. Three hours later we met back at the place we rented the bikes and Hung Nguyen was there as well. We once again set out for a coffee shop, then on to a Czech style beer hall well hidden in the middle of the city. After a beer (or two) and a great squid dinner Hung and I headed off to Vive for a few games of pool with his son Denny. Next we headed to a karaoke place I embarrassed myself and also got a treat listening to Hung, Denny, and our companions sing songs in Vietnamese. Today was a continuation of the day before and I spent the day riding out to Hiep Phuoc to explore the countryside a bit. Getting caught in 2 torrential down pours was not part of the plan and forced me to stop once in a small coffee shop for an hour, then a pho stand for another hour to wait out the rain. Away from the city communication is mostly smiles, head movements and hand gestures, but ice coffee is well understood and I knew how to say pho bo to order noodles with beef. All in all, I had way too much fun this week immersing myself in the culture and getting to meet so many wonderfully friendly Vietnamese!

Back to the conference, I was very honored when I was invited to present the opening keynote for VistaCon 2010, the first software testing conference in Vietnam. My dear friends Hung Nguyen and Michael Hackett and the rest of the staff at LogiGear organized a fantastic conference that hosted about 180 people from 6 different countries. I also presented a 2 hour tutorial on combinatorial testing practices, and another talk on random test data generation in automated tests. The talks went well although culturally the Vietnamese people are a bit shy about speaking out, and many opted to talk with me one on one or in smaller groups during breaks or the lunch hour. One attendee later told me, “I didn’t understand everything you said, but your talks really made me think.”

The software industry in Vietnam is growing rapidly, and Ho Chi Minh City has tremendous potential as both an outsourcing destination and for distributed development. So, it was not a surprise to repeatedly hear the recurring question, “What skills will we need as software testers in the future?”

Those of you who read this blog regularly or have listened to me speak at conferences know that I think that software testers should have a rich understanding of the “system.” In my opinion, the less you know about the “system” in which you are testing the greater the potential to miss important issues, or decrease your ability to troubleshoot issues and identify patterns of software testing that can then be applied in the appropriate context.

For example, many testers have heard of the problems with the antiquated double byte encoding systems (DBCS) and will continually cut and paste hard-coded strings containing “problematic” CJK characters into various input textboxes. Unfortunately, much of this ‘testing’ is simply wasted effort. Due to a lack of “system” knowledge they don’t know is that these characters were problematic in file I/O operations on ANSI based systems, or where an application is thunking between Unicode and ANSI. Today the native character encoding of the Windows operating system is Unicode. Different encoding; different problems.

I also think testers should be competent in at least one programming language for several reasons that I have discussed in previous posts. Over the past few years, my message has been pretty clear; to grow as a professional in the software testing discipline many of us must improve our overall technical skills and knowledge. I have leaned in this direction mostly because I saw this skill gap in many testers and understood the needs of companies that produce software will require a greater breadth and depth of skills from their testers. The key operative phrase in that last sentence is “companies that produce software” because that is the lens through which I view the maturing role of software testing.  My perspective of testing is biased towards testing practices at companies that produce software.

Cem Kaner also gave a keynote at VistaCon. At first I wasn’t sure if his keynote was on investment banking or testing, but he eventually drew a parallel between the job of “quants” in the financial sector and the role of a business domain expert tester working for an outsourcing company. But he did say that if you work for a company that produces software then you probably want to increase your technical skills, and if you work for a outsourcing company that tests software produced by software producer then you should become a specialist in the business domain of the software you are testing. You know…I agree. If you want to grow your career as a vendor for companies that outsources the testing of their products to specialists that test software from an end-user perspective then you should definitely become an expert in that business domain. On the other hand if you work for a company that produces software, or creates software solutions then I think you need to constantly improve your overall knowledge of the complete system; including some understanding of the domain in which you are testing.

Surely in the future there is demand for testers who are business domain specialist working for outsourcing vendors, and there will be demand for testers who want to work at a company that produces software and will need an increasingly richer set of technical skills, and I suspect that professional testers in the future will have a mix of skills and knowledge that will enable them to adapt to the changing job market and demands of the industry.

Written by Bj Rollison

September 26th, 2010 at 7:59 am

Posted in General Testing Topics

Tagged with

Code Coverage: Unreachable Code and Hard to Reach Code

with one comment

image Well, I am back from a sailing excursion to the San Juan Islands. I wanted to go to the Gulf Islands, but considering an unexpected ordeal with a kidney stone just before taking off on the trip I decided it might be better to be a bit closer…just in case. The weather was great, and we spent a lot of time exploring Stuart and James Islands, and dropped into Roche Harbor the first night and Friday Harbor the last night in the islands. We limited out on Dungeness crabs on all but 2 days where we only managed to get 3 legal size crabs on those 2 days. Basically this translates to a lot of crab cakes in the freezer…yum! This was the first time my daughter went crabbing with me. My daughter would ride out in the dinghy with me to check the pots, and would point out the male crabs for me, but she wouldn’t reach in an help me throw back the females or undersized males. Come to think about it, she didn’t help me cook and clean the crabs either…she just ate the crab cakes we made on the boat. I think the rules might have to change next year! All in all it was a great month decompressing and recharging, and contemplating my personal and professional future.

But enough about me. My last 2 posts have been discussing code coverage analysis. The primary purpose of using code coverage tools as either a developer or as a tester is not to try to obtain some magical ROMA number. The biggest value of measuring code coverage is to help us analyze untested areas of code and make informed decisions of whether or not we need to design additional tests to increase test coverage and help reduce exposure to risk.

The last post illustrated how we might use code coverage results to help us design additional tests we might have missed during the execution of any pre-defined tests (automated or manual) and additional exploratory testing efforts. But remember, the goal is simply not to design tests in order to get the tool to report 100% code coverage. In fact, in just about any complex system executing 100% of the statements in code may not be feasible or provide any practical value. This is generally referred to as unreachable code.

For example, let’s look at this (albeit antiquated) code snippet.

image

The code coverage tool is indicating that this conditional statement has been exercised to its true outcome, but not it’s false outcome. This was a common approach used in 16-bit applications to prevent multiple instances of the same application on a single machine. However, in the 32-bit world hPrevInstance always returns null, which means there is no practical way to make this conditional statement return false.

This is a bit of an obscure example, but is used to illustrate why a greater understanding of the programming language used by the development team would help testers from banging their heads against the wall ‘trying’ different things until someone realizes we could never make this conditional statement return false. By analyzing this section of the code coverage results we might suggest refactoring for a Win32/64 environment, or at least be able to explain why this conditional will not return false. (Remember…it’s all about information.)

Another example of unreachable code is sometimes caused by coding style or possibly unnecessary code. For example, the following lines are also in the WinMain() function that is called when the ‘user’ launches the application.

image

In this situation when the application initially starts and calls the WinMain() function these 2 conditional statements in WinMain() determine whether the Frog and Car bitmaps are true. Since we just launched the application and the bitmaps have not yet been loaded by any other calls to LoadBitmap() then the conditional statements in lines 104 and 109 will never go true, and lines 105 and 110 will never be executed. Again, following an analysis of the section of the code we can provide information regarding why we can’t design a test to cause these conditional statements to return true without fault injection or code mutation. Additional information that we might provide based on our analysis of the code coverage results may be a suggestion to refactor this code to improve testability.

A similar example of unreachable code is a common coding style involving switch statements where developers included a case statement for each possible value, and also included a default statement. For example in the last post we saw how we saw this code chunk which is essentially the menu structure.

image

When the menu item is selected the submenu displays the submenu items Start and Exit. When the submenu is displayed the only actions possible is to select the Start submenu item (line 270), or the Exit submenu item (line 274). Without fault injection there is no practical way to execute the default statement in line 277. Again, this may be another example where refactoring could improve testability because if the default statement is removed control flow would simply pass out of the switch block.

However, this is not always the case with switch statements. Here is an example in which a modal message box is displayed and draws 2 buttons; a yes button to restart the game, and a no button to quit the game. But notice that the default case statement (line 295) has not been executed.

image

In this situation if we launch the game, move the frog to get hit by a car this modal message box will display showing the ‘end user’ 2 possible buttons to press (in this case the ESC key or the Close control button in the upper right corner of the modal dialog are not possible options). However, if we put the game into a state where this modal dialog is displayed and then kill the application process using Windows Task Manager control flow will pass to line 295 as the process terminates.

Of course, it may not be practical or reasonable to terminate the application process from every possible machine state. Also, this simply increases the costs of testing without adding any real practical value. Providing this information to the decision makers along with suggestions to refactor and improve testability to reduce overall testing costs is another way code coverage analysis can be a valuable tool in a tester’s toolbox.

Another example of hard to reach code. In this case the conditional statement is if the RegisterClass() function fails then we want to return false.

image

The RegisterClass() function is also called within the WinMain() function when the application initially launches. So, while analyzing the code coverage results the question we ask ourselves is, “Without fault injection can we make the conditional statement in line 88 return true, and if so how?”

Well, we can. All we have to do is launch about 450 instances of this application to cause line 88 return true. Now, we have to ask ourselves, “What value does this test provide?” Especially since the code design should only allow 1 instance of the application (although it fails to do that because it is a 16-bit app running on a 32-bit environment and that is the nature of hPrevInstance as explained earlier.

From a testing perspective the primary goal of code coverage is not to achieve some magic number; the objective of code coverage is to analyze code coverage results in order to

  • Improve test coverage
  • Reduce overall risk
  • Potentially increase testability of the project

The code coverage number is not really useful information to anyone. It is the analysis of the code coverage results that can help us decide whether we need to design additional tests, identify areas of the code that can’t be executed without even more expensive testing such as fault injection and/or code mutation, or refactor the code to improve testability (which often increases the code coverage measure).

But, this is not to suggest that we should employ code coverage and analyze the results for all software projects. Analyzing code coverage results and designing additional tests from a white box perspective, or refactoring code are all additional expenses for any project. For each project we (or our managers) must decide whether the cost is worth the improved coverage and potentially a reduction in overall risk.

Another way to look at it…it is our responsibility as testers to provide valued information to the decision makers. If the only information we are providing is that we achieved 80% code coverage then we really aren’t doing an effective job. Yes, many managers are number focused; however, the valuable information is in the rest of the story about the 20% that has not been executed.

Written by Bj Rollison

August 27th, 2010 at 1:44 pm

Code Coverage: Finding missing tests

with 4 comments

hockey

[NOTE: This was written last week but due to a glitch did not get automatically posted before I left on a boat trip where I disconnected from the world. How refreshing…but more about that later.]

Well, we got a buy in the quarter final play-offs, and we won our semi-final game against the Sockeyes (2 – 0) last Tuesday. Unfortunately, while celebrating with my teammates I was literally knocked to the floor by a kidney stone in my bladder. After a trip to the hospital I am now taking a battery of pain-killers until I pass the stone. Unfortunately, this put me on the injured roster and knocked me out of the final game and until I get an OK from the doctor. Fortunately, I am not prone to kidney stones and this shall pass (literally). I have only had one kidney stone in the past and that was about 20 years ago. For those of you who have not experienced a kidney stone, trust me on this…be very, very glad and I hope you never experience this malady.

Last week I attempted to to illustrate how we might achieve high levels of code coverage (structural control flow) but potentially overlook critical tests, especially from a ‘black-box’ testing approach. The bottom line message…high code coverage does not necessarily equal good test coverage. In reality it is really unlikely to get 100% measured code coverage of an reasonably complex application under test. Unfortunately this often begs the question, “What is the right amount of code coverage for [my] application?” To which I have heard several leads/managers reply, “Our goal is 80% code coverage?” Really? C’mon…that’s just plain ROMA data. Setting arbitrary goals for code coverage is about as pointless as tits on a boar hog. The real answer is that we simply don’t know what measure of code coverage is the ideal level for any product.

imageAlso, for those who have read this article will know that regardless of the testing approach at some point the effectiveness of our tests to hit untested areas of code diminishes. While more time and effort may increase data flow coverage and expose issues, it is unlikely to increase control flow (code) coverage. Remember, just because you exercise a line of code doesn’t mean you found all the bugs, but you have 0% probability of finding any bugs in any untested code.

Fortunately, the majority of customers will traverse the same code paths covered by many of our tests. Also, if your team/organization does robust unit testing then there is a good probability that unit tests at least provided some minimal level of code coverage. (NOTE: While I highly recommend that unit tests and coverage results should be transparent to the test team, I do not recommend using unit tests as part of the battery of tests designed by the test team and executed against the whole build to measure code coverage.) So, there are a couple of questions we have to ask ourselves.

“Does the untested code present significant risk to our customers?”

“Do we need to reduce exposure to risk in the untested areas of code?”

“What is the most efficient way to effectively evaluate the untested code?”

As I wrote last week, code coverage is not about the number. Code coverage is about analyzing the results and potentially designing additional functional tests, or at least being able to explain why areas of the code are untested. If we determine that it is important for our business to better understand the untested code, or improve overall confidence and reduce potential risk then we should use a tool to measure code coverage. But, again it is not about simply measuring code coverage and reporting some magical metric.

Code coverage analysis is the most efficient method to help testers evaluate untested code. Code coverage analysis basically involves the tester reviewing untested code reported by the code coverage tool and determining why some code was not exercised, and possibly design additional tests to exercise the previously untested code. (Remember I also wrote last week the future of professional testing is about analyzing information and designing tests…so, here we go!)

Missing tests

For several years I’ve used the triangle simulation to help set a ‘test effectiveness’ baseline for new testers who had never been formally trained in different test techniques, patterns, or approaches. After a few years of analyzing the results we found that there was about a 70 to 75% probability of tests exercising true branch of the first conditional expression in a compound predicate statement in a key method in the program. There was about a 20 to 25% chance of tests exercising the true branch of the second conditional expression, and there was less than a 10% probability of tests exercising the third conditional expression in the predicate statement. When I found these results I could hardly believe it, so I changed the third conditional expression to inject a bug and sure enough the results held true; in any class of 20 people on average only 1 or 2 people found the bug in the software.

From a black box approach let’s say our tests used the following values for sides A, B, and C respectively:

  1. 1, 2, 3 – an error message indicating the values would not produce a triangle
  2. 2, 1, 3 – an error message indicating the values would not produce a triangle
  3. 4, 5, 6 – scalene triangle
  4. 2, 1, 2 – isosceles triangle
  5. 5, 5, 5 – equilateral triangle

In this case our code coverage tool would report our coverage is less than 100%. As we drill down we see that the IsValidTriangle() method illustrated below is not completely covered. So, (assuming the arguments values passed to the parameters in this method are all validated to be greater than 0) we analyze the code below in our coverage tool and realize that we need a test to evaluate the third conditional expression to true (e.g. 1, 3, 2 for sides A, B, and C respectively).

   1: internal bool IsValidTriangle(int sideA, int sideB, int sideC)

   2: {

   3:   bool result = true;

   4:   if ((sideA + sideB <= sideC) || (sideB + sideC <= sideA) || (sideA + sideC <= sideB))

   5:   {

   6:     result = false;

   7:   }

   8:  

   9:   return result;

  10: }

This is a simple example, and things aren’t always so easy in the ‘real-world.’ So, let’s look at something a bit more complicated. In this example, we are testing a game (Frogger) and discover that we called the MoveFrogLeft function, but we didn’t evaluate the conditional expression in line 499 to its false outcome as illustrated below by the red arrow and blue ‘T’.
 
image
 
For this we must find where the frogExists variable is set to true (or 1). So, we search the code for the frogExists variable and soon find the FrogStart() function where the frogExists variable is set to 1.
 
image
Now we must find when the FrogStart() function is called and we find the following case statement that basically deals with sending windows messages from the window menu (WM_COMMAND msg). In this example, when we start the game and click the Start menu item (IDM_START) the StartWorkers() and FrogStart() functions are called.
 
image
So, now we need to put the puzzle pieces together and design a test that will make the conditional expression in line 499 to evaluate false. In this case the test will be to launch the game, then press the left arrow key before clicking the game’s Start menu item.
 
These are rather simple examples, but should serve to illustrate how we can use code coverage tools and analyze the results to help us design additional tests from a white box test design approach to identify additional tests to increase control flow (code) coverage and improve functional testing effectiveness.
 
Next week I will discuss unreachable code.

Written by Bj Rollison

August 22nd, 2010 at 4:11 pm

Code Coverage: Did we run the right tests?

with 4 comments

It has been awhile since my last post, and I apologize for the few folks who follow my rants. Over the last few months I have been busy working on my boat (she is almost 30 years old now and needs some major refitting). Since starting to play hockey again I have been spending a lot of time at the gym and on the ice and trying to keep my body from getting too battered in the games (for I too am older now…let’s just say more than 30). I have also done a bit of  soul searching on the past year or so reflecting on the high points and the low points as well.

So, the refitting on the boat is almost complete and she is ready for a nice long trip to the Gulf Islands starting next week. The Monarch’s Div 7A team finished the regular summer hockey season in first place and playoffs are this week. Winter season doesn’t start until October so I have a month and a half to recuperate. And, I have contemplated a few things to help nurture my professional and personal growth that I will proactively begin working on…beginning with a vacation to the Gulf Islands next week.

But, for now I want to talk more about code coverage. Not the number! Forget the measure! The code coverage percentage is simply a magic metric for pointy-haired managers and other thoughtless chowderheads who like to wave it around as if it actually means something. Look…as a manager I really don’t give a rat’s butt about what percentage of coverage was achieved by testing. If you tell me that your testing achieved 80% code coverage I will probably say, “That’s great! Now tell me about the 20% of the code that you didn’t test!” As a manager what I want to know is:

  • Did we run the right tests? Did the test suites we ran give us the information that we need to make good business decisions?
  • What is my potential exposure to risk in the areas of untested code and how do/can we reduce that risk?

I have said for years now that the future of professional testing is not simply about beating on a product and finding bugs. The future of testing lies in our ability to design effective tests and critically analyze results in order to provide better information to our primary customers (the decision makers). Code coverage is about analyzing the results and potentially designing additional functional tests, or at least being able to explain why areas of the code are untested.

Did we run the right tests?

The code coverage metric simply measures the number of statements in the code have been exercised by monitoring control flow through the code. Control flow through code is sequential until it hits some type of branch such as an if statement, a for loop, or an exception. For example, when we call the CalculateMonthlyMortgage method below control flows sequentially from the statement in line 6 to the statement in lines 7 and 8 and finally to the return statement in line 9.

   1: public static double CalculateMonthlyMortgage(

   2:   double principle,

   3:   double annualInterest,

   4:   int months)

   5: {

   6:   double monthlyInterest = (0.01 * annualInterest) / 12;

   7:   double monthlyPayment = principle * monthlyInterest /

   8:     (1 - Math.Pow((1 + monthlyInterest), (-1) * months));

   9:   return monthlyPayment;

  10:  }

So, by passing in a value of 250000 for the principle, 4.5 for the annual interest rate, and 360 for the months parameter the method would return an expected value of 1266.71327456471 and we would measure 100% coverage of this method. That’s a happy path test. Yes, it does what we think it is supposed to do. But, what happens when we pass in a value of 0 to the annualInterest parameter? Control flows through the method as described above giving us 100% code coverage, but the return value is NaN, or Not a Number. Similarly, if the value for the months parameter is 0 we get 100% code coverage and the return value is Infinity. Of course, negative values for any of the parameters would also produce undesirable output results.

This example illustrates sequential control flow through a method that contains simple statements. When branching conditions occur in software the control flow through the method becomes a bit more complicated as does the testing required. The following snippet counts the number of characters in a string. But, to prevent a null reference exception we use a predicate or conditional statement in line 4 that branches control flow from line 4 to line 12 if the input argument is null, and from line 4 to the loop structure starting in line 6 if the string is not null. If the input argument is an empty string control will flow from line 6 to line 10 bypassing the inner block that increments the count variable. But, if the input argument is a string of at least 1 character control flows from line 6 to line 8 and loops back to line 6 until all characters in the string are counted. (This previous post shows how you can create a model or control flow graph to map control flow through an algorithm.)

   1: public static int CharacterCounter(string input)

   2: {

   3:   int count = 0;

   4:   if (input != null)

   5:   {

   6:     foreach (char c in input)

   7:     {

   8:       count++;

   9:     }

  10:   }

  11:  

  12:    return count;

  13: }      

 
With 3 tests (null, empty string, and a string of at least 1 character) we can achieve 100% code coverage of this method. However, there are still 2 basic problems. The first problem is that a null and an empty string both produce an output of 0. This may be desired output, but not in all cases. Secondly, if the input argument is the string “ꇔㄣᦅrえꞄ௏Жᾁ” the return value will be 11 instead of the expected 10. This is because the last character in the string is actually a Unicode surrogate-pair character composed of 2 UTF-16 code points. (Assuming that 1 character glyph is 1 byte or one UTF-16 encoded code point value is a common mistake in string parsing.)

These are just 2 simple examples designed to illustrate how we can get high levels of code coverage yet still have bugs lurking in our software. Code coverage tools tell us whether we exercised code; it doesn’t tell us if we ran the right set of tests to expose potential issues.

Next week let’s continue this with some thoughts on analyzing code coverage results to help reduce risk.

Written by Bj Rollison

August 9th, 2010 at 12:14 pm

Agile is not a process; Agile is a mindset.

with 6 comments

When someone compares agile with “traditional’ development I have no clue what they are talking about. I have never worked in a software company that produced software as if it were a widget on a factory assembly line. I have never worked in an organization where people didn’t talk to each other constantly. I have never worked for a company that didn’t periodically reflect on its own processes and/or procedures to identify areas of improvement. I have never worked for a company that wasn’t capable of adapting to changing to ‘requirements’ when necessary. And, I would say that the majority of people that I have worked with are highly motivated individuals who strive to do the best possible work, and who are capable of adapting, improvising, and over-coming obstacles in order to ship complex world class software.

This is the way it has been for me during my almost 2 decades in this industry. So, when I read books and papers that compare agile to “traditional” approaches I ask myself, “What in the hell do they mean by “traditional” approach?

For example, when I joined the Windows 95 team we had weekly builds, and at least once per week the build had to satisfy the requirements for “self-host” status. Self-host meant that anyone in the company could use that build for their day to day work. Also, many of the self-host builds are released to key strategic industry partners and 3rd party vendors so they could take advantages of new features in their software.

After the PDC release of Windows 95 we made several key changes in the operating system based on feedback from our customers and partners. Also, late in the cycle after beta 2 we had to make a major change in how Windows started because of a quirk in a relatively popular MS DOS based game. Throughout a typical lifecycle features are added, removed, and tweaked based on several factors including customer feedback.

Instead of sprints we generally have milestones. At the end of each milestone many teams have a post-mortem or retrospective to review key objectives, discuss things that went well, things that could have been improved, and  “tune and adjust its behavior” to make sure everyone is aligned and find ways to increase effectiveness.

Most developers were also doing unit testing. (TDD is simply another way of restating Dave Gelperin’s and Bill Hetzel’s famous quote from around 1987 “Test then code.”). Testers and developers talked frequently, and on many occasions the developer I worked closely with would come to my office to debug a hard to reproduce issue. And the majority of testers who were hired in the early 90’s were typically required to have an in-depth technical and often coding background in order to be able to engage in all aspects of the software development lifecycle and perform all the various job roles that might be required from a testing professional.

By 1998 we were doing daily builds. We have well established unit test libraries that not only assist developers in refactoring, but also help prevent build breaks and down-stream regressions. We still do a lot of API (component) level testing, as well as integration and system level testing.

Of course some of our products have longer release lifecycles than others. However, as indicated earlier we deliver “working software” at least once a week internally, and even to some premier customers (especially after the third milestone leading up to the final release of that version. But, some of our teams release every month.

Stepping back and looking at the debates, the books, and conference presentations I realized that I can relate well to the Agile principles. We often use models, principles and other abstractions to describe things. And, when all is said and done Agile principles are simply another way to present concepts that are intended to enable a team of highly motivated people to work together to ship a high quality product that satisfies their customers.

We have different models to generalize our processes to help us describe our typical workflow to others. But in truth there is no single way to build software, and models are simply our abstract expression of how we view the process. In fact, each team chooses processes and procedures that work for them and their particular product in their unique context.

(BTW, I am also really tired of Agile pundits always comparing Agile to waterfall models. Why don’t they compare their understanding of Agile to the V-model, W-model, or especially to spiral or other iterative models of software lifecycles? These are models; how your team chooses to implement a development lifecycle may resemble one of these models or you may adopt things from different models to implement your processes and approaches to software development.

Remember, George Box stated, “All models are wrong, some models are useful.” We should implement the concepts embodied in the model;  we should not implement the model.)

So, to me, Agile is the ability of a team to interact with each other, to adapt to changes, to periodically reassess its productivity in order to strive to become more effective and efficient in delivering software to its customers. Agile (and every other lifecycle models) is not a process; Agile is a mindset.

Technorati Tags: ,

Written by Bj Rollison

June 20th, 2010 at 8:16 am

Posted in General Testing Topics

Tagged with

Shipping software is a team effort

with one comment

HockeyShot

Since I was young I was involved in team sports. I played little league baseball during my elementary years, in junior high I switched to football, and in high school I played lacrosse. Growing up on the right (east) coast of America a lot of us boys would also play pick-up pond hockey when the rivers and ponds froze over in the winter. Hockey has always been a passion of mine, and I am a big Bruins fan (perhaps because Maryland didn’t have a hockey team), and Bobby Orr was amazing!

There is an ice rink very close to my home, and my daughter started skating at 4 years of age. She has no ambition to be the next Nancy Kerrigan, but she loves to ice skate. She also loves to watch hockey (she is a die hard Penguins fan and her favorite player is Sid The Kid).

The past few weeks I have been involved in a hockey tournament in conjunction with my regular Monarchs schedule, work schedule, and getting the boat ready for the up-coming Leukemia Cup Regatta and summer sailing season. (I’ll use that as an excuse for my long absence from the blogosphere.) In the tournament we won some games, and we lost some games. Eventually our team won the silver medal yesterday afternoon in the final match of the series. One thing all this practice and play has reminded me of is the importance of the team. Whether it is playing hockey or shipping software a team that plays well together is required for success.

Playing well together is more than simply showing up on the ice. Each player has a position, each player is expected to communicate effectively to move the puck around and “read the play,” occasionally players will “block a shot” on goal, and when you are on the ice you give 150% (which is why the shifts (time on ice) are short). Essentially no player can do everything alone, and the weakest link means that others on the team must take up the slack (which eventually leads to mistakes and players getting “out of position”).

HockeyTeam

I play defenseman, and usually left defense.There are general skills for the game that all players must have, but there are skills and strategies for defense positions that are different than for forwards. And although the 3 forwards and the 2 defensive players must work together as a team to score (or prevent a goal) the responsibilities of each position are different, and each player must play their position.

I have read a lot of analogies between individual sports such as martial arts and the discipline of software testing. These analogies are sometimes good in that they help testers understand why they must constantly learn new things and practice in order to grow in our (or any) professional discipline. Similarly, I could make an analogy between the defensemen on a hockey team and software testers. But, while we might all aspire to be a “testing ninja” (or a Bobby Orr) in our career, we also need a team of other professionals each doing their part to ship a great software product.

When players in the software game are out of position or not playing their best the team is looking for a loss, or at best a very hard earned win (that usually burns players out). For example, program managers who don’t to provide useful customer scenarios or models of potential feature designs, or developers who fail to unit test their code or run unit tests that are only slightly better than “does it compile” put the burden on the test team (the defensive players) to hopefully “score the goal.”

While a defensive player may occasionally score a goal, their primary role (on offense) is to put the puck in front of the opposing net. As defensive players we must be able to read the play (understand our customer scenarios and analyze abstract models of a design and provide critical feedback). Communicate effectively to other defensemen and forwards to get the puck into position, or protect our goal (put the important information in front of those who make the critical business decisions).

At times defensemen need to help the forwards by driving the puck out of the zone, but without the forwards doing their part the defensemen would surely get exhausted very quickly and the outcome would not likely be favorable. Likewise, if the forwards don’t help defend their goal it is likely the opposing team will wear down the defense and score.

So, while we (as testers) sometimes like to think that the success or failure of a software project depends solely on our defensive skills to find bugs and drive in quality, the simple fact is that we need a team of professional to be successful and “quality” depends on the value each player brings to the table. If we don’t start a game with a good strategy and people playing their best in their assigned positions the likelihood of success is marginal at best. A strong defense is critical, but no one ever wins by playing defense.

"In every chain of reasoning, the evidence of the last conclusion can be no greater than that of the weakest link of the chain, whatever may be the strength of the rest." – Thomas Reid’s Essays on the Intellectual Powers of Man, 1786

Written by Bj Rollison

May 24th, 2010 at 9:28 am

Posted in General Testing Topics

Tagged with

Is this as good as it gets?

with 3 comments

Last month I was travelling throughout Europe. A few days in Switzerland to speak at Swiss Testing Days and visit our communication server team in Zurich. About a week and a half in Denmark teaching at our offices there, a few days in Dublin teaching, and finally a full day meeting with one of our teams in Reading, UK. The only good thing about being on the road that much is getting in more reading time. Two of the books I read while travelling were Agile Testing by Lisa Crispin and Janet Gregory. The other was Clean Code by Robert Martin. I found both books thought provoking and informative. Of course I didn’t agree with everything, but even the statements I didn’t agree with made me think about that topic a bit more. Martin’s book inspired me to improve my testing and coding craft. If nothing else, testers should read the first chapter of Clean Code, those who design and develop automated tests should read the whole book.

But, one statement that has always troubled me that I saw repeated in the Agile Testing book is in regards to exploratory testing as the most effective way to expose important bugs. Every time I hear/read this or one of its many derivatives it makes me ponder. If this is really true then why do we invest so much time and resources in other approaches to software testing. Why do we spend so much time in static and dynamic analysis, reviews, and coverage analysis? Why do we try to prevent bugs when they can (and presumably are) found easier and ‘quicker’ downstream via exploratory testing? Why don’t we simply continue to hire hoards of diverse people to rapidly explore the product just before release and beat out all those important/critical bugs?

I wonder why it seems that we can ‘think critically’ when we have the product to play with, but we can’t seem to apply critical thinking while reviewing documented requirements, or a model? Personally, I am not sure that ‘trying’ something and then deciding what to do based on the outcome or state of the computer, or deciding whether something is a bug after the fact constitutes critical thinking. Perhaps sometimes it is, but I suspect that sometimes it is just guessing. To me, critical thinking involves the ability to foresee potential issues; not simply stumble upon them, or apply an attack from my bag of exploratory tricks to reveal them downstream in the development lifecycle when I have the product to play with. 

I sometimes wonder if exploratory testing really is the holy grail of software testing, or if we have just given up on other approaches because they are too hard. Could it be that our testers are not capable of ‘testing’ without also engaging their fingers? Are our testers untrained or under-trained in our profession? Is the reason why our pre-defined test cases are seemingly ineffective due to the realization that we suck at test design, and learning how to design more robust and ineffective tests is just too hard? Is the reason we can’t think of tests until we get the product in hand perhaps due to our inability to model or is it because we really don’t know that much about the systems we are testing? (You can’t model what you don’t understand and you don’t understand what you can’t model.) Besides, why should we burden testers with the overhead of learning about programming concepts, computer science, and math. You don’t need all that stuff to find bugs.

Now many of you might think that I despise exploratory testing. I am not anti-exploratory testing (whichever form that takes); because we all do it! Show me a tester who doesn’t include some form of exploratory testing in his or her repertoire and I’ll show you a thoughtless simpleton who needs explicit instructions for determining which side of the heal of a loaf of bread to butter. But, I have always thought that we could actually improve the craft of testing. I thought we might play a much bigger role in an organization. I thought we could help reduce overall production costs by being involved earlier and throughout the lifecycle. I thought we could help reduce risk through defect prevention and in-depth analysis. And, I thought we could learn to use tools and techniques to help us design better tests up front, and rely on a team of testers who are capable of thinking while executing a test case.

Yes, it is true that exploratory testing is fun, and we don’t really need to understand all that computer science stuff to find bugs using an exploratory approach. Regardless of what empirical studies on the topic reveal, exploratory testing ‘feels’ so much more productive. Why should we base our profession on evidence when we can get by so easily with emotion? Why should we have to think critically during the design or implementation of software when we can simply muddle our way through some graphical user interface to find bugs? If our approaches to unit testing and component level testing are so shoddy as to simply invoke happy path” tests then of course finding bugs by exploration seems rather logical. But, as developers become more adept at unit can component level testing in flushing out functional bugs earlier we may only need ‘testers’ to explore the product before release for behavioral issues, compatibility problems, and the few remaining obscure functional bugs. Or possibly exploratory testing providing a ‘good enough’ service to our team by finding the bugs that matter to the majority of customers, and perhaps that is ‘good enough.’

Personally, I think we can improve our testing processes. I think we will need to improve the practice of software testing. I think that our job is not simply to find bugs, but a more important objective is to prevent defects (which can help reduce overall product costs). I also don’t view testing as destructive, and I think those who do are doomed to become obsolete. Successful businesses don’t hire folks who just want to ‘destroy’ the products they build, I suspect they want to hire people who can help the company grow and improve. Finding bugs helps me ship a product, but it doesn’t necessarily help me grow the business, reduce production costs, or improve quality long-term.

But maybe we don’t need to change. Maybe our businesses are satisfied with the status quo. Maybe we don’t need to think about cost savings or helping our teams mature their processes, reduce overall risk, and potentially improve quality earlier. Maybe I am too  idealistic thinking that testers can provide greater value to an organization beyond finding bugs at the end of the development cycle.  Maybe our craft has reached the the apex of of our profession and this is as good as it gets.

Written by Bj Rollison

April 8th, 2010 at 3:13 pm

Complex != Better

with one comment

I know all about over-engineering. I previously wrote about a barn my father designed and built using telephone poles and oak planks and pallets for the stall walls. From a structural perspective this barn would have stood virtually anything mother nature could have thrown at it. And if someday people wanted to tear down that barn they would surely curse the builders because the job would be much harder then they expect.

I sometimes find code that is way over-engineered. In some cases it may be necessary for making the code more robust. But, over-engineered code may also result from ignorance of more efficient algorithms or design patterns. And, sometimes overly complex algorithms are a result of developers trying to craft something that obfuscates the simplicity of the solution and fool others into believing the complexity of the solution is somehow better than a more simple solution.

I sometimes see this in test code. I am a firm believer in robust, ‘bullet-proof’ test code because each time an automated test case throws a false positive or ‘breaks’ the team loses confidence in the automation project, and it takes our time to ‘massage’ the test back to health. But, there seems to be 2 extremes in test automation. Simplistic prescriptive scripted tests with a bunch of hard-coded ‘test-data,’ or overly complex test code that is virtually undecipherable by anyone other than the original developer that renders any downstream maintenance virtually impossible. I have seen many instances where complete libraries of automation was scrapped and re-developed simply because it was easier to re-write the code rather than trying to wade through the quagmire of complexity.

In a recent example, some of my students submitted their test automation projects with a library that contained a method to generate a random string. When I first looked at their method for generating a random string I was a bit perplexed. Besides the fact that the method only produced a grand total of 26 upper case characters, the code was much more complicated than necessary. Despite having showed them how to use the Babel random string generator they choose to make their own, so I asked them how they came up with this solution and they said “they searched on the Internet. ” Others in the class indicated they also used the same method in their projects. So, I when I got home I did a quick search and found the code sample.

   1: private static string RandomString(int size)

   2: {

   3:   StringBuilder builder = new StringBuilder();

   4:   Random random = new Random();

   5:   

   6:   char ch;

   7:   for (int i = 0; i < size; i++)

   8:   {

   9:     ch = Convert.ToChar(

  10:       Convert.ToInt32(Math.Floor(26 * random.NextDouble() + 65)));

  11:     builder.Append(ch);

  12:   }

  13:  

  14:   return builder.ToString();

  15: }

OK…for a moment let’s overlook the fact that if we make 2 consecutive calls to this method we will get the same identical random string of characters (which the author of this code also discovered), and let’s assume that the range of characters is limited to upper case A through Z, and let’s also ignore the fact that we are not seeding our Random generator for reproducibility of the randomly generated output from this method.

Let’s focus on the statement in lines 9 and 10. Why in the world would we generate a number between 0.0 and 1.0, multiply it by 26 and add 65, and then use Math.Floor to return the largest integer less then or equal to the resultant double, then convert that double to a type Int32, and then convert the Int32 to a type char? Now I ask you, can we make this any more complicated?

Now, I am not critiquing the style of the code or its inherent limitations, but this solution is an example of over-engineering. As I have said before, complexity cultivates chaos. At first I thought maybe this approach might give me a better distribution of characters, but when I tested this hypothesis it simply didn’t pan out. So, the way the random character is generated is simply too complex. An easier, more readable, and effective alternative might be to simply generate a random integer within the allowable range and cast that int to a type char as illustrated below in line 10. (Yes, I realize the limitations with this approach if trying to generate surrogate pair characters above U+FFFF.)

   1: private const int minCharacter = 65;

   2: private const int maxCharacter = 91;

   3: private static string SimpleRandomString(int size)

   4: {

   5:   StringBuilder sb = new StringBuilder();

   6:   System.Threading.Thread.Sleep(1);

   7:   Random r = new Random();

   8:   for (int i = 0; i < size; i++)

   9:   {

  10:     sb.Append((char)r.Next(minCharacter, maxCharacter + 1));

  11:   }

  12:  

  13:   return sb.ToString();

  14: }

BTW…the Sleep() in line 6 is a easy solution to preventing identical return values for consecutive calls. But, a more effective solution would be to pass in a seed value as a parameter to the method and use that to seed the new Random generator as illustrated below. Different seeds not only guarantee randomness in the resultant string, but also allow for repeatability if necessary as long as the seed value is preserved.

   1: private static string SimpleRandomString(int size, int seed)

   2: {

   3:   StringBuilder sb = new StringBuilder();

   4:   Random r = new Random(seed);

   5:   ...

   6: }

But, I digress. This post isn’t about random generation or style…it’s about complexity. Overly complex code tends to harbor errors that might go undetected (at least initially). Also, maintenance of complex code not only becomes more problematic, it often leads to costly re-writes down the road.

If we consider that greater complexity may increase the likelihood of error, then we don’t want our development partners to unnecessarily over-engineered algorithms. Also, overly complex solutions often require overly complex testing. This not only leads to increased testing costs, but it also increases the probability of an important test being missed or overlooked. Think testability!

Finally, with regard to test automation we should consider that our automated tests might be reused by other teams, and they certainly will be used during maintenance or sustained engineering efforts well after the product has released. And, in some cases, the people maintaining the product might not be the same people who shipped the product. Well-written automated tests are not just reviewable by someone other than the author of the code, but they should also be easily maintainable. Otherwise, we might end up paying double for that automated test case. Ouch!

Written by Bj Rollison

March 31st, 2010 at 8:43 pm

Boundary Bugs…like shooting fish in a barrel

with 6 comments

If there is a bug at a boundary that doesn’t lead to an unhandled exception or security exploit should we care?

Perhaps an even more important question is why do we find so many boundary type bugs via exploratory testing when they can and should be caught earlier? Why don’t we find these types of bugs in our unit testing? Why don’t we find these types of bugs by more systematically testing the software? Maybe we do find them, and those who make the decisions to fix these types of bugs just don’t care if they are fixed because there is no severe negative impact to the user. Maybe someone just wants to give me fodder for my blog!

This week I wanted to compare the range of allowable font sizes for a simulation program I developed as an example for a magazine article that I am working on. I knew that Office applications allow a font size within the range of 1 – 1638. I thought that range might be too large for my purposes, and since I knew that Windows Notepad included a font dialog I decided to check the allowable range of font sizes in Notepad.

The first thing I discovered was that the combobox control allows up to 5 characters! Really? Someone decided it is a good idea to allow users to enter 5 characters? notepad 1

OK, I’ll play along. Maybe if I put in a size of 99999 and press the OK button on the dialog I will get an error message, or at least Notepad defaults to the last ‘valid’ selected font size. That might seem reasonable. But is that what happens? NO! Instead of doing something reasonable (e.g. error message, default font size) the font changes to a size of 1 (yes that is a font size 1 in the upper left corner in the image below).

Notepad 2

I am sure that defaulting to a font size of 1 makes sense when the allowable size value overflows! Really…someone thought that was a good idea? Now I wanted to see what magical boundary value the developer decided was an acceptable font size. Since the combobox size property allowed 5 characters I immediately tried 65535. No, that also resulted in the overflow and displayed the text in a font size of 1. Next I tried 32767. Wait…32767 didn’t display the string in Notepad’s edit control at a font size of 1. Now, I am thinking the developer is using a data type of signed short for the font size variable. So, I enter 32768 expecting the value to overflow and display my string as a size 1 font again. But, no…that doesn’t happen.

Now, when I am design boundary tests I generally rely on 2 heuristics for identifying boundary values for input or output parameters.

  1. Values at the extreme edges of a physical range of values
  2. Values at the edges of equivalence partitions of physical values

So, in these situations I ask myself “What sort of demented developer debauchery have I now found myself?” I can’t think of any other obvious edge values that might apply, so out of curiosity I quickly narrow down the magical value to 39321. I then ask myself, “OK…even if there were a display capable of rendering or a printer capable of printing a font of this size, what is so unique about 39321?” In hexadecimal it is 0×9999, and in binary it is 1001100110011001. OK…nothing obviously special here, but I am certain the implementation details are much more complex then a simple range of values and at this point I really don’t care because this bug just doesn’t make sense.

Maybe it’s not supposed to make sense! Maybe nobody really cares about these types of bugs!

(BTW…somebody please take the Thesaurus away from the developer…’Oblique?’ Are you serious…why not just be consistent and use the word ‘Italic?’)

Written by Bj Rollison

March 24th, 2010 at 11:58 pm

Meaningful Measures

with 2 comments

I arrived in Switzerland on Monday morning and met with our team here in Zurich who work on the communication server. Tuesday I presented a tutorial on advanced combinatorial testing and delivered a keynote address at Swiss Testing Days on Wednesday. Unfortunately, I really didn’t get to spend a lot of time exploring the city, but it was great to catch up with my long time friend James Whittaker. James and I also gave brief presentations at an executive dinner the night prior to the conference. It was also really nice to meet new friends from SwissQ who put together Swiss Testing Days. This was my first time to present at this conference and I was greatly impressed. More than 750 people attended the conference! It was quite an event and I hope to return next year.

At the executive dinner and during my keynote I discussed various challenges in software engineering that directly impact testers. One of those challenges we need to get our heads wrapped around is software measures. By software measures I am referring to objects in software engineering mapped to various scales in the mathematical world. Although we sometimes also use biased qualitative measures, such as “too slow,” if we are to be taken with any degree of credibility we have to define what to slow is and set a reasonable goal for ‘acceptable’ based on customer values.

As testers we expend a lot of cycles collecting buckets full of metrics. We spend time producing fancy charts, and spend countless hours ‘looking’ at the data as if it were some type of oracle that would speak to us and tell us what we wanted to know. In the best case we convince ourselves that the numbers are telling us what we want the numbers to tell us. In the worse case the decision makers do not even consider the measures, or we don’t analyze the data in an attempt to identify ways to improve some of our engineering processes and practices. In the end, all the fancy charts are taken off the walls only to be shredded and we start over.

We often get caught up in tracking mostly useless data such as bug count and code coverage. What in the world does bug count or code coverage tell us (or the decision makers) about quality? Nothing; absolutely nothing! Some people want to believe that finding a lot of bugs or have high levels of code coverage means better quality, but that is sort of like believing that you’ll find a pot of gold and a leprechaun at the end of every rainbow. So, why do we measure bug counts and code coverage? Simple…because they are easy to measure!

Good metrics are hard to define mostly because we don’t always have clear goals, or we use a scatter-gun approach to setting a bunch of disparate wishful goals (goals that we hope we can achieve, but nobody is accountable if we don’t). I personally advocate the Goal/Question/Metric paradigm by Victor Basili. But, the biggest problem I have in using this approach is in establishing meaningful goals! People are generally good with coming up with superfluous objectives such as 100% automation or 80% code coverage. But, when you ask those people why they want 100% automation or 80% code coverage they retort only with a bunch of hand-waving and philosophical arguments. It seems we sometimes have difficulty expressing the ‘why’ of setting certain goals. Of course the answer in most cases is to ‘get better’ or ‘improve’ something! But, why? What is the business value?

Once we establish clear goals the next step is to understand the variables that we can manipulate to help us achieve those goals. Then we must decide on which ones we want to change that we think will have the biggest bang for the buck. Finally, we figure out which measures will let us know whether we are progressing towards our goal. (This usually isn’t a single point of measurement.)

At one time I naively believed that there was a core set of metrics that all teams should be collecting all the time that we could put into a ‘dashboard’ and compare across teams. In retrospect that was really a bone-headed notion. Identifying these measures is not easy, and there is no cookie-cutter approach. Each project team needs to decide on their specific goals that may increase customer value or impact business costs. Testers should ask themselves, “why are we measuring this?” “What actions will be taken as a result of these measures?” And, “if there is no actionable objective associated with this measure, then why am I spending time measuring this?”

At times is seems we are locked in a vicious cycle of relearning things via tribal knowledge, and we make decisions based mostly on ‘gut-feel’ and emotion. We collect a bunch of measures and display them similar to how the ancient Chinese used the mystical ‘dragon bones’ as oracles. But, if we are interested in being able to articulate business impact (either positive or negative) in a professional manner then we must be able to find ways to measure the things that are really important and actionable, and spend less time collecting numbers for wall decorations. At the end of the day someone is going to ask, “How do we know?” And trust me on this…really great managers will eat you alive if you answer with “well, we think…” or “we feel…” or try to evaluate success on some other subjective measure.

Written by Bj Rollison

March 21st, 2010 at 2:22 am