halmos icon indicating copy to clipboard operation
halmos copied to clipboard

--early-exit doesn't work as expected

Open 0xkarmacoma opened this issue 2 years ago • 1 comments

Describe the bug

--early-exit is supposed to print the first counterexample found and stop halmos, but we can actually still have multiple counterexamples reported

To Reproduce

// SPDX-License-Identifier: MIT
pragma solidity ^0.8.15;
import "forge-std/Test.sol";

contract Test60 is Test {
    function test_manyCexes(uint256 x) external {
        uint256 i = 0;
        if (x & 1 == 0) { ++i; }
        if (x & 2 == 0) { ++i; }
        if (x & 4 == 0) { ++i; }
        if (x & 8 == 0) { ++i; }

        assert(x < 64);
    }
}
halmos --function test_manyCexes --early-exit

Running 1 tests for test/60_manyCexes.t.sol:Test60
Counterexample: 
    p_x_uint256 = 0x8000000000000000000000000000000000000000000000000000000000000000 (57896044618658097711785492504343953926634992332820282019728792003956564819968)
Counterexample: 
    p_x_uint256 = 0x8000000000000000000000000000000000000000000000000000000000000008 (57896044618658097711785492504343953926634992332820282019728792003956564819976)
Counterexample: 
    p_x_uint256 = 0x8000000000000000000000000000000000000000000000000000000000000004 (57896044618658097711785492504343953926634992332820282019728792003956564819972)
Counterexample: 
    p_x_uint256 = 0x800000000000000000000000000000000000000000000000000000000000000c (57896044618658097711785492504343953926634992332820282019728792003956564819980)
...

0xkarmacoma avatar Dec 24 '23 00:12 0xkarmacoma

@karmacoma-eth I've fixed the issue (#284) by making the following changes:

  1. Introduced an early_exit flag variable to keep track of whether an early exit is requested.

  2. In the future_callback function, when a counterexample is found and the --early-exit flag is set, the early_exit flag is set to True.

  3. The future_callback function now checks the early_exit flag at the beginning, and if it is True, it immediately returns without processing the result. This prevents any further counterexamples from being processed after an early exit is requested.

  4. The thread pool is shut down after all the submitted futures have completed or been canceled, ensuring a clean termination.

Explanation of the issue

The previous implementation of the --early-exit flag had a couple of issues:

It used a busy-waiting loop with time.sleep(1) to wait for either the first counterexample to be found or all the futures to complete before shutting down the thread pool and terminating the execution. This approach was inefficient and didn't actively stop the execution of the remaining paths when the first counterexample was found. The busy-waiting loop allowed multiple counterexamples to accumulate during the waiting period, especially if the time.sleep(1) interval was not sufficient to process the first counterexample. As a result, even with the --early-exit flag set, multiple counterexamples could be reported.

The updated code addresses these issues by introducing an early_exit flag variable to keep track of whether an early exit is requested. When a counterexample is found and the --early-exit flag is set, the early_exit flag is set to True. This flag is then checked in the future_callback function, and if it is True, the callback immediately returns without processing further results. This ensures that no additional counterexamples are processed after an early exit is requested.

With these changes, Halmos now terminates immediately after finding the first counterexample when the --early-exit flag is set, avoiding the reporting of multiple counterexamples.

Here's the updated output for the example contract you provided:

(.venv) 86fb4ce7caf3% python3 -m halmos --function test_manyCexes --early-exit
[⠊] Compiling...
[⠢] Compiling 28 files with 0.8.25
[⠔] Solc 0.8.25 finished in 2.15s
Compiler run successful!
Running 1 tests for test/Test60.t.sol:Test60
Counterexample:
p_x_uint256 = 0x8000000000000000000000000000000000000000000000000000000000000008 (57896044618658097711785492504343953926634992332820282019728792003956564819976)
[FAIL] test_manyCexes(uint256) (paths: 32, time: 0.08s, bounds: [])
Symbolic test result: 0 passed; 1 failed; time: 0.09s

As you can see, with the --early-exit flag set, the tool now reports only the first counterexample found and terminates early.

I have opened a pull request (#284), please let me know if you have any further questions or concerns.

saurabhchalke avatar May 05 '24 03:05 saurabhchalke