Separate finding modules from generating mutants?

Open sourcefrog opened this issue 1 year ago • 0 comments

In 24.3, we do one pass over source files, parsing them using syn and identifying (1) other source files that we need to recurse into and (2) mutants from this source file.

Possibly these should be separated into separate passes that each walk over the syn AST:

Parse a file into an AST and find just the mod statements that point to other files we need to read. Queue those other files and repeat until we have all the source files loaded into memory.
Walk each source file and generate mutants from it.

That is to say the two fields that are currently in Discovered would be generated separately: https://github.com/sourcefrog/cargo-mutants/blob/442412650fc9770f3bc84467e1b104f5ef51b7a6/src/visit.rs#L33-L36

Why?

It would make the AST walk code somewhat simpler by separating concerns: not only less code in the visitor but also simpler interactions with the code that calls it
The mutant-generation code would be a pure function of a source tree already in memory
Probably it would be easier to test each part, e.g. we could write unit tests for mutations that just work off a string of source code, without needing a whole tree
It would probably also make it simpler to generate mutants or parse files in parallel on multiple threads.
Maybe this makes ownership simpler: the per-file mutant generator and all the mutants generated from it could reference and not outlive the source file they point into.

Why not?

We would walk each AST twice which would have some CPU cost.
Maybe it's just not worth it as a refactor.

Generally all the time taken walking the tree and generating mutants should be pretty trivial compared to the time taken actually running tests, so perhaps it's not important to make changes to optimize speed.

Maybe we should benchmark just --list --json on some large tree.

Apr 13 '24 16:04 sourcefrog