Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Is the stack distance between test case and method correlated with test effectiveness?
Niedermayr R., Wagner S.  EASE 2019 (Proceedings of the Evaluation and Assessment on Software Engineering, Copenhagen, Denmark, Apr 15-17, 2019)189-198.2019.Type:Proceedings
Date Reviewed: Mar 15 2021

In general, it is not algorithmically possible to always prove a program’s correctness or incorrectness. So, in practice, a program’s correctness is usually judged by testing the program on test cases from some test suite. How can we gauge the effectiveness of a test suite? Practitioners use metrics like code coverage, that is, the proportion of the code that is executed in at least some of the test cases. However, this metric is not well correlated with test suite effectiveness: some test suites have high code coverage, but fail to detect faults. A more effective technique is mutation testing, where we add faults into different parts of the code and see if the tests detect these faults. However, this technique is very time-consuming and thus rarely used in practice. It is therefore desirable to come up with a measure of test suite effectiveness that is more accurate than code coverage and more practical than mutation testing.

The authors take into account that, for each test, some methods are invoked directly while other methods are invoked indirectly via a chain of method calls. They show, crudely speaking, that the length of this chain--which they call “stack distance”--is highly correlated with test suite effectiveness in detecting the method’s faults. To be more precise: for this correlation to appear, we need to slightly modify the definition of the length, for example, we should not count calls to external libraries (since these libraries are supposed to be already tested), we should not count calls to constructor methods (such methods rarely contain faults), and so on.

An even better prediction of possibly faulty methods comes if we also take into account different versions of code coverage and other easy-to-compute characteristics, and train a neural network to predict the method’s faultiness based on all of this information. This leads to a practically useful alternative to mutation testing.

The paper is intended for specialists and practitioners in software engineering who are familiar with the basics of statistics. I recommend it to both theoreticians and practitioners.

Reviewer:  V. Kreinovich Review #: CR147215 (2106-0156)
Bookmark and Share
  Featured Reviewer  
 
Testing And Debugging (D.2.5 )
 
 
Validation (D.2.4 ... )
 
 
Software/ Program Verification (D.2.4 )
 
 
Software Engineering (D.2 )
 
Would you recommend this review?
yes
no
Other reviews under "Testing And Debugging": Date
Software defect removal
Dunn R., McGraw-Hill, Inc., New York, NY, 1984. Type: Book (9789780070183131)
Mar 1 1985
On the optimum checkpoint selection problem
Toueg S., Babaoglu O. SIAM Journal on Computing 13(3): 630-649, 1984. Type: Article
Mar 1 1985
Software testing management
Royer T., Prentice-Hall, Inc., Upper Saddle River, NJ, 1993. Type: Book (9780135329870)
Mar 1 1994
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy