[New feature] Tool to decorrupt and repair cache files to be used for extraction
This tool should be able to restore cache files into a state where they can be extracted. There are several reasons to do this:
- Protected maps - maps deliberately corrupted and obfuscated to prevent tag extraction
- Converted maps - maps modified by a program such as Pearl or Combustion to work with a different version of the game, but this modification has broken various references and pointers, resulting in unexpected and undefined behavior
- Edited maps - maps inadvertently corrupted from tags being modified and/or "rebuilt" with a tool such as Eschaton, resulting in inconsistencies in hidden values as well as invalid pointers and references that aren't immediately apparent from simply loading the map in-game
This tool should not be expected to restore the cache file to its original state before it was corrupted. Instead, it should restore it to a state where the map's tags can be accessed and extracted using a tool with, ideally, the same behavior as it would have in-game.
Also, restored cache files should not be loaded in tools (besides invader-extract and invader-indexer), nor should they be playable in-game, as they are not technically "cache files" by definition, anymore. Therefore, rather than storing the restored cache file on disk, a diff of some sort should be stored, instead. If needed, this diff can be a binary format, too, as it's not expected for users to need to manually edit these files.
There are several forms of map corruption that can be easily detected and fixed (from least insane to most insane):
- Tag path obfuscation - tags have been edited to contain invalid characters as well as share the same path as other tags of the same class, or the tag path pointer is invalid
- If the tag is indexed or uses external data, use the resource file to determine the original path
- Use a dictionary of known tags indexed by path, class, extracted tag size, and extracted tag checksum (invader-extract does not extract pointers or tag IDs, so data should be the same between tags; use a strong algorithm such as SHA-256)
- Analyzing the tag data already in the map (check scripts, strings, etc.)
- Tag class obfuscation - tags have been edited to have incorrect tag classes
- The scenario tag is always directly referenced in the tag data cache header
- Object tags and shader tags have a 16-bit enum that determines their true tag class
- Check the tag references and cross-reference them with the tag definitions (note that some tag references are bullshit - see next bullet point)
- Bullshit tag references - tags are being referenced where they shouldn't
- Some things (such as object references in the scenario tag) can be put in the correct place
- Some things (such as light references in the model tag in place of shaders) can be replaced with an empty shader
- Some things (such as weapons spawning scenery or bipeds like in log mods) cannot be repaired and will be dereferenced
- Inconsistent tag data - tags have hidden values that don't correspond to the non-hidden values
- Hidden values that correspond to non-hidden values which only correspond to a SINGLE hidden value can probably be repaired by recalculating the non-hidden value to match the hidden value
- Example: Actor's inverse combat perception time
- Hidden values that correspond to non-hidden values which correspond to MULTIPLE hidden values cannot be repaired as it would result in the other hidden values not being consistent
- Example: Shader environment's U and V scale
- Hidden values that correspond to non-hidden values which only correspond to a SINGLE hidden value can probably be repaired by recalculating the non-hidden value to match the hidden value
- Incorrect tag block/reflexive count - tag reflexives and data arrays have a size value that does not actually equal the size of the array
- Use heuristics to determine the size of the array (i.e. what is the lowest and highest index used?)
- NOTE: Fixing this should be OPTIONAL as the above option is destructive, such as when removing unused objects which can result in the game crashing if they're spawned in multiplayer by the server
Basically, if the game can use a tag without crashing or having undefined behavior, then there exists an equivalent HEK tag file that can generate a non-corrupted version of that tag UNLESS it is the result of changing multiple hidden values that correspond to a non-hidden value.
Vaporeon suggested to me, earlier, about using dictionary-based deprotection. Essentially, it compares a tag across all other tags in tags directories to find a match. This can even be used alongside heuristics.
We may want to first compile each tag into a map and then extract it, since tags are modified on building. Therefore, this will rely on an accurate invader-build and invader-extract.
It's also important to note that tool.exe, on occasion, has different values from invader-build when doing floating-point based arithmetic. We may want to have some leeway (i.e. rounding all floats to the nearest 0.00001th) for this.