ComputerCraft icon indicating copy to clipboard operation
ComputerCraft copied to clipboard

Make wget automatically determine the file name.

Open Luca0208 opened this issue 7 years ago • 9 comments

Sometimes when using wget it is pretty annoying having to write the file name again. This PR makes that unnecessary by automatically determining the file name. This is done by stripping all trailing slashes and afterwards taking everything after the last remaining slash. This is either the domain name if used like this: https://domain.tld or the file name if used liked this: https://domain.tld/folder/file.ext

Luca0208 avatar Apr 04 '18 17:04 Luca0208

Thank you! This is something I've been meaning to do for a while. Instead of using a loop, it might be a little cleaner to use gsub and match though:

-- Trims everything after ? or # (means foo#bar or foo?bar=qux reduce to foo), 
-- then trims trailing `/`
sUrl = sUrl:gsub( "[#?].*", "" ):gsub( "/+$", "" )

-- Find everything from the last / to the end of the string
return sUrl:match( "/([^/]+)$" ) or sUrl

Note I've only done some limited testing, so this may be more susceptible to breaking.

SquidDev avatar Apr 04 '18 17:04 SquidDev

foo#bar should be reduced to bar, however I'm not sure if foo?bar=s should be reduced to foo. Sometimes these URL Parameters matter.

As for the pattern I tested them against a bunch of edge cases and they seem to work. The or sUrl in the last line isn't even needed because the pattern will match the / of "http://" or "https://" which is required for a valid URL.

Luca0208 avatar Apr 04 '18 17:04 Luca0208

That's a good point. I don't have access to CC in MC right now, but both CCEmuX and CCEmuRedux also emit ? in filenames, = seems to be allowed(However I'm on a mac right now, not sure how it behaves on windows)

Luca0208 avatar Apr 04 '18 18:04 Luca0208

On NTFS (the most commonly used Windows filesystem) it will fail.

dmarcuse avatar Apr 04 '18 18:04 dmarcuse

Ok it will now strip anchors and url parameters and I also applied the suggestion of SquidDev to use patterns and gsub/find

Luca0208 avatar Apr 04 '18 18:04 Luca0208

On my machine (Windows), fs.exists(" ") always returns true, as it's parsed as a directory. Which means wget google.com will result in File already exists instead of URL malformed.

I think it would be better to move the http.checkURL call before the file name extraction. This way one doesn't need the or " " at all, which makes it substantially less ugly. It might produce a tiny bit of delay, as this may require a DNS lookup, but it should be imperceptible (<50ms).

SquidDev avatar Apr 05 '18 08:04 SquidDev

Ok in that case I will put the http.checkURL before the filename extraction. Edit: Newest commit should fix this

Luca0208 avatar Apr 05 '18 08:04 Luca0208

I just noticed that the help page of wget should be changed to reflect this change. Going to do that this evening.

Luca0208 avatar Apr 05 '18 09:04 Luca0208