Rescue extract-xiso from oblivion
I think that this tool still has a lot of potential, and doesn't deserve to be abandoned, so I decided to rewrite traverse_xiso, the most difficult function to read and understand in the code, and split it into two pieces to make it easier to understand.
Judging from the changelog on top of the file, it appears that it once used recursion, but it has been rewritten as the spaghetti code is it now to avoid sporadic stack overflows with highly unoptimized ISOs written by some tools.
It looks like whomever is making xiso creation tools out there has never heard of a binary tree and is sticking every directory entry off the right subnode (at least on all the iso's I've found so far). This causes extremely deep recursion for iso's with lots of files (and also causes these iso's, when burned to DVD, to behave as a linked list for file lookups, thus providing worst case lookup performance at all times).
Not only do I find this annoying and extremely bad programming, I've noticed that it is causing sporadic stack overflows with my (formerly) otherwise good tree traversal code.
In order to combat such bad implementations, I've re-implemented the traverse_xiso() routine to get rid of any non-directory recursion. Additionally, I've made a few extra tweaks to conserve even more memory. I can see now that I'm going to need to write xiso creation as well and do it right. (rev to v1.5 beta) NOT FOR RELEASE
I honestly don't think the cause of the overflows was the recursion: each iteration uses very little memory and the same ISOs are processed on the OG Xbox anyways. Probably it was some other bug that manifested itself only in some cases.
I tried to keep the memory usage as low as possible, by freeing up unused memory early, but it will probably use more memory than before. I guess that's the tradeoff for code legibility.
Anyways, the code has been tested to work and produce functionally identical ISOs. They are not hash-identical to the previous ones because of the order the files are processed. As as side note, it looks like that now it generates node trees identical to retail, so I guess it is more accurate? Anyways, if someone knows what tools produce such linked list ISOs, it would be awesome to test them and see if the stack overflow problem is fixed.
Of course it needs extensive testing, but I think that with traverse_xiso rewritten, it would be much easier to update the tool and/or solve bugs.
The next step would be to review all the open issues, and see if they still apply. In particular, I'm using 64-bit builds and they produce files identical to 32-bit ones, so maybe this particular issue is solved.
the same ISOs are processed on the OG Xbox anyways
They are processed differently though.
My understanding (but it has been a while since I've looked at the XISO filesystem driver):
- For file searches (such as opening an explicit filepath), the tree is used.
- For listing files (such as
opendir/readdirstyle APIs), the directory is processed like an array.
I converted the PR to draft because I want to make some more checks just to make sure, and eventually solve some other problems.
I think I'm done for now
the same ISOs are processed on the OG Xbox anyways
They are processed differently though.
My understanding (but it has been a while since I've looked at the XISO filesystem driver):
* For file searches (such as opening an explicit filepath), the tree is used. * For listing files (such as `opendir` / `readdir` style APIs), the directory is processed like an array.
I implemented what I called the "discover strategy", basically the linear listing of files in a directory. It is used when listing or extracting by default.
The rewrite mode, instead uses both strategies: first processes the tree, then lists the directories sequentially, hoping to find "hidden" files.
I wanted a single strategy for all cases, but unfortunately it's not possible unless we generate the tree also for listing and extracting.
The rewriting is done in two separate steps: first tree discovery, then linear search. Doing both at the same time would fail spectacularly on malformed ISOs, and introduce a lot of overhead even in normal circumstances.
This means that:
- If there are hidden files after a node that's already beyond the end of a directory, it won't be found.
- Every tree starting from an hidden node is now unveiled, but only processed linearly.
I don't know if you like it, or if changing the default strategy for file listing and extraction is the right thing to do, or even if it is useful at all. Let me know your feedback.
I added FreeBSD and OpenBSD GitHub actions, just to make sure it still compiles under these systems that we technically support. The binaries haven't been tested, though.
Thanks, this fix is really appreciated. I'm going to start using your branch and will report if I find another other decoding issues.
Please use the latest version for your tests. I force pushed the repo because I rewrote it to be a little bit less hacky.
@rapperskull if nobody reviews soon, I recommend to mention other @XboxDev members on GitHub, or ping them on Discord. I might skim over it, but I haven't done any Xbox stuff in a while, so I don't plan to do a proper review (and I certainly won't merge).
@rapperskull if nobody reviews soon, I recommend to mention other @XboxDev members on GitHub, or ping them on Discord. I might skim over it, but I haven't done any Xbox stuff in a while, so I don't plan to do a proper review (and I certainly won't merge).
Thanks for the tip. Besides reviews, I need extensive testing. The are probably a lot of edge cases that break, and I'm finding and fixing bugs by trying to process the ISOs that are mentioned in other issues. For example I spotted the missing files bug thanks to Furious Karting.
I added a check to use the Windows-1252 charset if available, thus avoiding the UTF-8 conversion entirely, with a bit of saved memory. There's one downside though: in create mode the OS will use the best fit approach when dealing with characters outside the CP1252 charset, so Ł becomes L in the name of the file we're trying to list. When we then try to open the file, we fail, because of course the file name contains Ł, not L.