bazel-remote icon indicating copy to clipboard operation
bazel-remote copied to clipboard

Add support for debugging cache misses

Open buchgr opened this issue 7 years ago • 6 comments

I think we could add support for debugging cache misses between two machines to the remote cache.

  1. Start the cache in debug mode
  2. It tells you to run the first build.
  3. You tell it once finished.
  4. It tells you to run the second build.
  5. You tell it once finished.
  6. It then finds the actions that had the same inputs but produced (one or more) different outputs and prints them in a human readable format (i.e. the command, the environment, the input/output file names with hashes).

This can detect and help fix three kinds of errors:

  1. Find non-determinism in the build, either on the same or different machines.
  2. Tell you exactly the command(s) that produced different outputs on different machines. Once one knows which commands produce different outputs it's comes down to checking if the versions of the tools match etc.

Thoughts?

cc: @nicolov @jgavris @BenTheElder

buchgr avatar Aug 14 '18 08:08 buchgr

SGTM

BenTheElder avatar Aug 14 '18 17:08 BenTheElder

👍 Sounds great!

jgavris avatar Aug 14 '18 17:08 jgavris

I suggest that the comparison should be done based on the output paths, rather than the inputs. By matching outputs across builds, we can:

  1. Compare the input files, both contents and paths. For example, globs might pick up different files on different machines - ask me how I know.

  2. If the inputs match, compare the output contents to find non-determinism in the execution itself.

nicolov avatar Aug 16 '18 07:08 nicolov

@nicolov great idea with output paths!

buchgr avatar Aug 16 '18 12:08 buchgr

cc @petemounce

buchgr avatar Aug 20 '18 16:08 buchgr

I am planning to start working on this this weekend. While this tool would use lots of code from the remote cache, I am thinking it should probably be its own binary. Thoughts?

buchgr avatar Aug 23 '18 12:08 buchgr