Originally Published Wednesday, May 30, 20
I am not a big fan of static test data, so this month’s issue of Software Testing and Performance magazine published an article I wrote outlining one approach for generating random string data (although the basic concepts can be used for generating other types of random data).
Unfortunately, it appears that some of the numbers got a little screwed up and the printer did not superscript the exponents correctly so the numbers in the third paragraph are probably looking pretty strange. So, to clarify, the paragraph should read:
Using only the characters ‘A’ – ‘Z’ the total number of possible character combinations using for a filename with an 8-letter filename and a 3-letter extension is 268 + 263, or 208,827,099,728. If we were assigned to test long filenames on a Windows platform using only ASCII characters (see Table 1), the number of possibilities increases because there are 86 possible characters we can use in a valid filename or extension and a maximum filename length is 251 characters with a 3 character extension is 86251 + 863. Trust me, that is one big number.
(NOTE: There have been several assertions regarding the above formula for determining the number of tests, here is the explanation. Essentially, the Windows platform file system treats the base filename and the file extension as 2 separate components and there is no interaction or dependencies between these two components. (For example, we cannot save a filename as CON.txt, but we can save a filename as myFile.CON.) Since there is no dependencies between the base filename component and the extension component they are treated as 2 independent parameters which would mathematically result in 268 + 263, or 208,828,082,152 tests if we elected to test all possible combinations of the base filename component with a nominal valid extension, then test all possible extension component combinations with a nominal valid base filename. One could argue we could combine the 17576 unique 3-character extension combinations with various combinations of the 8-character base filename component to reduce the overall number of tests by 17576; however I choose not to use that approach and instead test each parameter independently. If we mistakenly assumed dependency or inter-relationship between the base filename and extension components of a filename on the Windows platform testing all combinations (or 268 * 263 (or simply 2611) on a Windows OS would result in approximately 3,670,135,659,905,624 redundant tests (if we could do exhaustive testing). This is where in-depth knowledge of the ‘system’ really pays off.)
Of course, the filename length and extension length is variable. Also, 251 characters assumes a base filename component length from the root directory (it does not take into account the MAXPATH constant). So, the total number of combinations using only ASCII characters is much greater because the base filename component length with a ‘default’ 3-letter extension from the root directory is actually 86251 + 86250 + 86249 + 86248 + 86247 … + 861. Then, of course vary the length of extensions, and the total number of combinations increases even further. But, all this is only to provide some scope the magnitude of the testing problem.
Also, the equivalence class table (Table 2) is simplified and does not include reserved device names. For example, Windows will/should prevent a user from saving a filename of LPT1, or COM6, or CON, etc. (The behavior for saving filenames with strings composed of reserved device names is different on Windows Xp and Windows Vista…Vista finally got it right!).
Unfortunately, I did not get a chance to read the edited copy before print, but I think the basic idea comes through and I hope you find value of using intelligent random test data in your testing and would be interested in hearing your feedback.
2 Comments
Could you help me understand why the figure is 26**8 PLUS 26**3, instead of 26**8 TIMES 26**3, or simply 26**11?
As I believe you are, I’m assuming that the filenames must be exactly eight alpha uppercase characters, and the extension must be exactly three uppercase alpha characters. That would be 3,670,344,486,987,776 combinations, not 208,827,099,728.
—Michael B.
Thursday, May 31, 2007 12:30 AM by Michael A Bolton
Hi Michael,
Since I explicitly stated “Using only the characters ‘A’ – ‘Z’…for a filename with an 8-letter filename and a 3-letter extension…” then you are not making an assumption. I intentionally limited the initial domain space as an introductory example using information I thought most readers would be familiar with (and I assumed readers would tacitly understand the number of characters and filename/extension lengths is far greater on modern operating systems). The problem space is actually much larger when taking into consideration variable filename and extension lengths, and the Unicode character repertoire as explained above and in the article.
So, within these constraints I used 268 + 263 because the base filename and the filename extension are 2 separate components and are handled differently by the Windows operating system file system. (Note that I also explicitly stated the testing would be conducted on a “Windows platform.”)
If for some reason we suspected there is interaction or no distinction between the filename component and the extension component (as on a Unix platform) then we would have to test each unique 8-letter filename with each unique 3-letter extension which would equal 2611.
However, since I stated the test environment is Windows, our in-depth system knowledge of the Windows OS file system tells us the extension is handled as a separate namespace or component from the base filename component. (For example, we cannot save a filename as CON.txt, but we can save a filename as myFile.CON.) Thus, we know that the base filename component and the extension component of the filename are independent, so testing all combinations on a Windows OS would result in approximately 3,670,135,659,905,624 redundant tests (if we could do exhaustive testing).
One could argue that with nominal valid data we could combine the 17576 (263) unique 3-character extension combinations with combinations of the 8 character base filename components to effectively reduce the overall number of tests.
However, I choose to separate testing the extensions and test each extension combination as a unique component independent of the base filename component. So, by using a nominal (known good) base filename component, we would simply add the additonal 17576 tests for each unique 3-letter extension combination. (Thus, 268 + 263.)
(But, I can understand how someone could misconstrue my statement “…the total number of possible character combinations…” when taken out of context from the rest of the paragraph, and that sentence could have been worded more precisely to stand on its own.)
ADDENDUM:
(Of course, the math experts will immediately realize that 268 + 263 does not equal 208,827,099,728. It actually equals 208,827,082,152. It seems that somebody’s fingers got a little happy with the M+ button on the calculator when running the initial numbers.
But, the purpose of the article is to examine the use and value of intelligent random test data in testing; not to debate or over-analyze the number of filename combinations on Windows or Unix platforms. So, I hope the readers understand the point that the number of combinations is quite large (+/- 17576) and utilizing intelligent random data (for positive or negative testing) in conjunction with ‘real-world’ and known problematic static data is a good way to add variablity to a test and expand coverage of potential possibilities since we know we cannot test all probable combinations for a given platform.)
Thursday, May 31, 2007 5:24 AM by I.M.Testy