Make wget automatically determine the file name.
Sometimes when using wget it is pretty annoying having to write the file name again. This PR makes that unnecessary by automatically determining the file name. This is done by stripping all trailing slashes and afterwards taking everything after the last remaining slash. This is either the domain name if used like this: https://domain.tld or the file name if used liked this: https://domain.tld/folder/file.ext
Thank you! This is something I've been meaning to do for a while. Instead of using a loop, it might be a little cleaner to use gsub and match though:
-- Trims everything after ? or # (means foo#bar or foo?bar=qux reduce to foo),
-- then trims trailing `/`
sUrl = sUrl:gsub( "[#?].*", "" ):gsub( "/+$", "" )
-- Find everything from the last / to the end of the string
return sUrl:match( "/([^/]+)$" ) or sUrl
Note I've only done some limited testing, so this may be more susceptible to breaking.
foo#bar should be reduced to bar, however I'm not sure if foo?bar=s should be reduced to foo. Sometimes these URL Parameters matter.
As for the pattern I tested them against a bunch of edge cases and they seem to work.
The or sUrl in the last line isn't even needed because the pattern will match the / of "http://" or "https://" which is required for a valid URL.
That's a good point. I don't have access to CC in MC right now, but both CCEmuX and CCEmuRedux also emit ? in filenames, = seems to be allowed(However I'm on a mac right now, not sure how it behaves on windows)
On NTFS (the most commonly used Windows filesystem) it will fail.
Ok it will now strip anchors and url parameters and I also applied the suggestion of SquidDev to use patterns and gsub/find
On my machine (Windows), fs.exists(" ") always returns true, as it's parsed as a directory. Which means wget google.com will result in File already exists instead of URL malformed.
I think it would be better to move the http.checkURL call before the file name extraction. This way one doesn't need the or " " at all, which makes it substantially less ugly. It might produce a tiny bit of delay, as this may require a DNS lookup, but it should be imperceptible (<50ms).
Ok in that case I will put the http.checkURL before the filename extraction.
Edit: Newest commit should fix this
I just noticed that the help page of wget should be changed to reflect this change. Going to do that this evening.