wgetpaste Add check that input source is an uncompressed text file.

Independent of #55; if this is merged first, i'll rebase and update that PR accordingly.

At the moment, wgetpaste seems to only handle uncompressed text files, so check for this and note it in the --help text.

Oct 31 '24 10:10 flexibeast

The check ! file "${f}" | grep -q 'ASCII' is overly restrictive. If unicode but non-ASCII character appear in the first million bytes of the file, this check would declare the file as "non plaintext". This essentially limits small plain text files to English only.

Feb 27 '25 14:02 nvinson

Sure. So should it be grep -q 'text', or will that result in false positives?

Feb 27 '25 22:02 flexibeast

Sure. So should it be grep -q 'text', or will that result in false positives?

I don't think it would, but you would still have the false negative scenario to deal with. Consider the example:

cp /bin/sh text
file text | grep -q 'text' && echo "is a text file"

The file command only checks the first 1 million bytes, so if the first non-Unicode character doesn't appear within the first one million bytes, file will report that the file is a text file.

All this said, the root issue isn't whether the file is compressed or not, but whether it contains NULL bytes (0x00). If it does, then the upload is corrupted as those bytes are stripped out. Otherwise, the upload is probably safe.

Put simply, wgetpaste needs to find some way to escape NULL bytes when returning from the strip_ansi() function.

Feb 27 '25 23:02 nvinson

Put simply, wgetpaste needs to find some way to escape NULL bytes when returning from the strip_ansi() function.

Would it be enough to pipe the output of each branch of the if through

sed 's/\o0/\\0/'

?

Feb 28 '25 02:02 flexibeast

maybe. I'd have to test it to be sure.

Feb 28 '25 05:02 nvinson