Search-Based Cost-Effective Test Case Selection within a Time Budget: An Empirical Study

Dipesh Pradhan1,2,Shuai Wang1, Shaukat Ali1, Tao Yue1

1Simula Research Laboratory, Oslo, Norway; 2Department of Informatics, University of Oslo, Oslo, Norway;
Contact: {dipesh, shuai, shaukat, tao}


Due to limited time and resources available for execution, test case selection always remains crucial for cost-effective testing. It is even more prominent when test cases require manual steps, e.g., operating physical equipment. Thus, test case selection must consider complicated trade-offs between cost (e.g., execution time) and effectiveness (e.g., fault detection capability). Based on our industrial collaboration within the Maritime domain, we identified a real-world and multi-objective test case selection problem in the context of robustness testing, where test case execution requires human involvement in certain steps, such as turning on the power supply to a device. The high-level goal is to select test cases for execution within a given time budget, where test engineers provide weights for a set of objectives, depending on testing requirements, standards, and regulations. To address the identified test case selection problem, we defined a fitness function including one cost measure, i.e., Time Difference (TD) and three effectiveness measures, i.e., Mean Priority (MPR), Mean Probability (MPO) and Mean Consequence (MC) that were identified together with test engineers. We further empirically evaluated eight multi-objective search algorithms, which include three weight-based search algorithms (e.g., Alternating Variable Method) and five Pareto-based search algorithms (e.g., Strength Pareto Evolutionary Algorithm 2 (SPEA2)) using two weight assignment strategies (WASs). Notice that Random Search (RS) was used as a comparison baseline. We conducted two sets of empirical evaluations: 1) Using a real world case study that was developed based on our industrial collaboration; 2) Simulating the real world case study to a larger scale to assess the scalability of the search algorithms. Results show that SPEA2 with either of the WASs performed the best for both the studies. Overall, SPEA2 managed to improve on average 32.7%, 39% and 33% in terms of MPR, MPO and MC respectively as compared to RS.

For implementation:

As mentioned in the paper [1], all the selected search algorithms and quality indicator are implemented based on jMetal [1]. The source codes related with the five Pareto-based search algorithms, two weight-based search algorithms and random search along with one quality indicator (Hypervolume) can be downloaded from jMetal's website [2].

For the experiment data: Due to the confidential issues from our industrial partners, we cannot share the original data for the industrial case studies. However, we provide the data generated by each algorithm for the real world case study (1 test suite having 165 test cases), and artificial problems (10 test suites having 100 ... 1000 test cases with an increment of 100 test cases). There are 75 problems for 1 test suite, and we rerun each problem 100 times for each weight assignment strategy (WAS), i.e., Fixed Weights and Randomly-Assigned Weights. Therefore, for one test suite there are 15,000 files for each algorithm (7,500 files for each WAS).

To conclude, for 9 algorithms (8 search algorithms and 1 random search): 1) For the real world case study, there are 135,000 files such that there are 67,500 files for each WAS; 2) For the artificial problems, there are 1,350,000 files such that there are 670,500 files for each WAS. Note that each file has 100 rows and five columns with values for the five objectives. More instruction on how to interpret the file can be looked at the readme.txt file after downloading the zip file.

Results for the experiment can be downloaded from here.
Source code for random number generator can be downloaded from here.

Data related with the experiment

Download The results for the real world case study

Download The results for the artificial problems