Add check that input source is an uncompressed text file.
Independent of #55; if this is merged first, i'll rebase and update that PR accordingly.
At the moment, wgetpaste seems to only handle uncompressed text files, so check for this and note it in the --help text.
The check ! file "${f}" | grep -q 'ASCII' is overly restrictive. If unicode but non-ASCII character appear in the first million bytes of the file, this check would declare the file as "non plaintext". This essentially limits small plain text files to English only.
Sure. So should it be grep -q 'text', or will that result in false positives?
Sure. So should it be
grep -q 'text', or will that result in false positives?
I don't think it would, but you would still have the false negative scenario to deal with. Consider the example:
cp /bin/sh text
file text | grep -q 'text' && echo "is a text file"
The file command only checks the first 1 million bytes, so if the first non-Unicode character doesn't appear within the first one million bytes, file will report that the file is a text file.
All this said, the root issue isn't whether the file is compressed or not, but whether it contains NULL bytes (0x00). If it does, then the upload is corrupted as those bytes are stripped out. Otherwise, the upload is probably safe.
Put simply, wgetpaste needs to find some way to escape NULL bytes when returning from the strip_ansi() function.
Put simply, wgetpaste needs to find some way to escape NULL bytes when returning from the strip_ansi() function.
Would it be enough to pipe the output of each branch of the if through
sed 's/\o0/\\0/'
?
maybe. I'd have to test it to be sure.