Due to limited time and resources available for execution, test case selection always remains crucial for cost-effective testing. It is even more prominent when test cases require manual steps, e.g., operating physical equipment. Thus, test case selection must consider complicated trade-offs between cost (e.g., execution time) and effectiveness (e.g., fault detection capability). Based on our industrial collaboration within the Maritime domain, we identified a real-world and multi-objective test case selection problem in the context of robustness testing, where test case execution requires human involvement in certain steps, such as turning on the power supply to a device. The high-level goal is to select test cases for execution within a given time budget, where test engineers provide weights for a set of objectives, depending on testing requirements, standards, and regulations. To address the identified test case selection problem, we defined a fitness function including one cost measure, i.e., Time Difference (TD) and three effectiveness measures, i.e., Mean Priority (MPR), Mean Probability (MPO) and Mean Consequence (MC) that were identified together with test engineers. We further empirically evaluated eight multi-objective search algorithms, which include three weight-based search algorithms (e.g., Alternating Variable Method) and five Pareto-based search algorithms (e.g., Strength Pareto Evolutionary Algorithm 2 (SPEA2)) using two weight assignment strategies (WASs). Notice that Random Search (RS) was used as a comparison baseline. We conducted two sets of empirical evaluations: 1) Using a real world case study that was developed based on our industrial collaboration; 2) Simulating the real world case study to a larger scale to assess the scalability of the search algorithms. Results show that SPEA2 with either of the WASs performed the best for both the studies. Overall, SPEA2 managed to improve on average 32.7%, 39% and 33% in terms of MPR, MPO and MC respectively as compared to RS.