Christopher Foo
Christopher Foo
Yeah, the thing is ridiculously slow. You won't expect ungzip performance because it's not just ungzip. When it's extracting, it has to parse each WARC record because it's human-readable and...
Oops, another case that wasn't tested. As a workaround, maybe try something like `your_file_object.name = None`?
Hello, thanks for taking a look! When writing the notdef threshold algorithm, I knew it was a too simple heuristic and doesn't perform well in some cases. But the line...
Oh, I'd like to add that this is an issue because my WARC file failed to derive proper CDX files on Internet Archive: https://archive.org/details/delcampe_20140126 .
> Warcat supports both compressing and decompressing `warc.zst` files as of [version 0.3.0](https://github.com/chfoo/warcat-rs/releases/tag/v0.3.0) but reading the encoder I can't work out whether it's compressing each warc record with a different...