viv-utils icon indicating copy to clipboard operation
viv-utils copied to clipboard

feat: identify functions only referenced from library code

Open fariss opened this issue 1 year ago • 7 comments

Follow up on https://github.com/mandiant/capa/issues/989. Marking this as a draft, because I believe we should a test for this function.

@mr-tz @williballenthin let me know if you have a test binary, otherwise I can compile one.

fariss avatar May 27 '24 21:05 fariss

Feel free to grab any of the files from capa testfiles. Using zlib routines might be a good place to start because they're obvious and common.

williballenthin avatar May 28 '24 06:05 williballenthin

let me know if you have a test binary, otherwise I can compile one.

tests/data/038476f1705f3ac1237ac57f4c1753e0aa085dd7cda5669d4e93399cf7a565af.exe_ contains a few functions to test with:

  • 0x40ca70
  • 0x40bf40
  • 0x40b06c

mr-tz avatar May 30 '24 07:05 mr-tz

Once we have the logic and tests working correctly, we may want to find a way to cache the results and/or do all analysis in a single pass.

Recursive functions that are called in a loop tend to have O(n**2) worst case runtime. We can avoid this by ensuring each function is only evaluated once.

We could cache the result in the viv workspace, like we do in the flirt analyzer. And/or, we could build the global call graph once and extract all the library functions in a single traversal, saving the results in an intermediate object.

Anyways, let's get the core behavior specified and tested before we optimize the implementation.

williballenthin avatar May 30 '24 08:05 williballenthin

tests/data/038476f1705f3ac1237ac57f4c1753e0aa085dd7cda5669d4e93399cf7a565af.exe_ contains a few functions to test with:

  • 0x40ca70
  • 0x40bf40
  • 0x40b06c

Great, thanks. I will use this binary as a test bed.

Here are some tests I will implement. Let me know if you can think of other test cases.
from fixtures import sample_038476

from viv_utils.flirt import is_only_called_from_library_functions


def test_invalid(sample_038476):
    """
    test an invalid function address
    """
    # this is an an address that is not a function
    func_addr = 0x400000
    assert is_only_called_from_library_functions(sample_038476, func_addr) == False


def test_not_called(sample_038476):
    """
    test a function that is not called by any another function
    """
    # this is a function that is not called by any other function.
    func_addr = 0x400000
    assert is_only_called_from_library_functions(sample_038476, func_addr) == False


def test_positive(sample_038476):
    """
    test a library function
    """
    # this is an existing library function
    func_addr = 0x400000
    assert is_only_called_from_library_functions(sample_038476, func_addr) == True


def test_negative(sample_038476):
    """
    test a function called by both library and non-library functions,
    where at least one caller is not a library a function
    """
    # this should be a function, where mixed callers, where
    # at least one caller is neither a library call or is not
    # called only from library calls
    func_addr = 0x400000
    assert is_only_called_from_library_functions(sample_038476, func_addr) == True


def test_circular(sample_038476):
    """
    test a function with a circular function call graph
    """
    # two functions calling each other in a loop
    func_addr1 = 0x400001 # calls 0x400002
    func_addr2 = 0x400002  # calls 0x400001
    assert is_only_called_from_library_functions(sample_038476, func_addr1) == True
    assert is_only_called_from_library_functions(sample_038476, func_addr2) == True

fariss avatar May 31 '24 00:05 fariss

The test ideas look great! I recommend to use descriptive names for each - ones we can make sense of by just reading the name (as much as possible).

We should also test the transitive call and labeling (A (lib) -> B -> C) as discussed above.

mr-tz avatar May 31 '24 09:05 mr-tz

I included some tests in 06f2fba0834f4b9e6efcb1b0d65c73ce6389bbf1. For references, there are the functions I used in testing (from tests/data/038476f1705f3ac1237ac57f4c1753e0aa085dd7cda5669d4e93399cf7a565af.exe_):

0x40CAA3

grafik

0x408155 this is the main (entry) function
0x407660

grafik

0x40B06C

grafik

fariss avatar Jun 01 '24 03:06 fariss

I've researched this a bit further and while we could do some more advanced computation based on graph algorithms I think we can get away with the current approach plus the additional check to verify code lies within a certain range (start/end of library functions).

mr-tz avatar Jun 11 '24 11:06 mr-tz