First XML DB build does not record hashing failures
I am looking for an alternative to FCIV which is more robust and capable of recording errors to an output file (akin go the fciv.err file produced by FCIV), for the purposes of identifying corrupted files which are not yet part of a hashed integrity database.
FCIV was useful in this respect in that when it encounters a file that produces a Cyclic Redundancy Check error, that a record of this gets added to the fciv.err file output. However, it was limited in that it would fail and halt if a file name had unexpected characters or a directory path was too long.
When running PsFCIV on a directory for the first time with known bad files, the generated XML file will add a <FILE_ENTRY> element detailing the <name>, <Size>, and <Timestamp>, and hashes such as <MD5> if successful.
For corrupted files, it instead produces these errors within the PowerShell window which fail to identify which file produced the error:
Exception calling "HashFile" with "2" argument(s): "Data error (cyclic redundancy check).
"
At C:\Program Files\WindowsPowerShell\Modules\PsFCIV\1.1\PsFCIV.psm1:62 char:17
+ ... $hashBytes = [PsFCIV.Support.CryptUtils]::HashFile($file, ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [], MethodInvocationException
+ FullyQualifiedErrorId : IOException
Exception calling "FormatBytes" with "2" argument(s): "Value cannot be null.
Parameter name: inArray"
At C:\Program Files\WindowsPowerShell\Modules\PsFCIV\1.1\PsFCIV.psm1:67 char:21
+ ... $object.$hash = [PsFCIV.Support.CryptUtils]::FormatBytes( ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [], MethodInvocationException
+ FullyQualifiedErrorId : ArgumentNullException
The result is that a <FILE_ENTRY> will be added to the XML DB as a new entry without any hashes.
This is difficult to parse for identifying bad files.
The tool in its current state is suitable for identifying changes in data compared to a previously built database, but not as well suited to discovering corrupted files.
Would it be possible to modify the functionality so that when it encounters a hashing failure, that it records an explicit failure to the XML DB instead of omitting the hash object? Or alternatively, writing its own fciv.err file.
Can you clarify what do you mean under "known bad files" and "corrupted files"? Files are physically corrupted? If it is the case, what is the desired behavior when PSFCIV finds such file?
By known bad files, I'm referring to files that have been physically corrupted. Outside of file hashing use cases (previously using 7zip or FCIV), they are known to be bad if they do not function correctly (e.g. in video playback where are unplayable sections), or cannot be copied to another location (Windows Explorer returning "Can't read from the source file or disk").
For my use case of identifying corrupted data to be restored from backup or otherwise salvaged, FCIV was useful because its fciv.err output would record all errors it encounters with error codes. It works by creating or appending an fciv.err file starting with the initial Command Line instruction
Here's a sanitised example of what that output may look like:
********************************************************************************
Command Line: fciv -add D: -r -md5
HashAndStore --> d:\DumpStack.log.tmp :
Error msg : Access is denied.
Error code : 5
HashAndStore --> d:\fciv.err :
Error msg : The process cannot access the file because it is being used by another process.
Error code : 20
HashAndStore --> d:\Sample-Game\data.bin :
Error msg : Data error (cyclic redundancy check).
Error code : 17
HashAndStore --> d:\Music\Musician - Track Name?.mp3 :
Error msg : The filename, directory name, or volume label syntax is incorrect.
Error code : 7b
HashAndStore --> d:\Music\File Name with non-unicode characters.mp3 :
Error msg : The system cannot find the file specified.
Error code : 2
d:\Music\Folder Name with Accented Characters\*
Error msg : The system cannot find the path specified.
Error code : 3
HashAndStore --> d:\Music\Very Long Filename that exceeds Windows built-in MAX_PATH limit of 256 or 260 characters - Older Windows applications particularly CMD applications such as FCIV cannot handle long path or file names when the full length exceeds this character limit.mp3 :
Error msg : The system cannot find the path specified.
Error code : 3
For my use case, every result with Error code 17 returning "Data error (cyclic redundancy check)" was useful. Other error codes resulting from the dated application not handling longer paths or filenames with special characters, were not useful, which is where PsFCIV has come in useful as it can process such files.
I think the request in its simplest form is that PsFCIV during or upon completing processing, should have an error output to a file (similar to fciv.err) whenever it encounters data it cannot successfully process, including the file name and error message. This wouldn't be limited to data errors, it could handle things such as files that were moved during initial file/directory enumeration, or are currently being used by another process.
I'd like to add that since I first opened this issue, I've learned how to process PsFCIV's XML output using XSL transformations into a file listing only the <FILE_ENTRY> elements that lacked <MD5> elements, which achieves my goal of identifying all corrupted files needing treatment. I am very appreciative of PsFCIV and am thankful for it as I've been looking for a solution to this problem for a while.