[Feat] Introduce Gallery-dl's --download-archive Mechanism
Proposal
Gallery-dl is a powerful project comparable to YT-dlp, even surpassing it in many nuanced aspects. Unlike YT-dlp, which typically doesn't output images and whose metadata .json files lack clarity, Gallery-dl excels. I use it extensively for downloading social media content with exceptional ease.
My customized download command example:
gallery-dl.exe --filename "[{date}]{filename}.{extension}" --write-metadata --retries 14 --write-log "[G-DL]#bludvice\instagram.com/aestherotic_\log.txt" --download-archive "[G-DL]#bludvice\instagram.com/aestherotic_\archive_file.dat" -D "[G-DL]#bludvice\instagram.com/aestherotic_" "https://www.instagram.com/aestherotic_/" --proxy "http://127.0.0.1:7897" --cookies "www.instagram.com_cookies.txt"
The --download-archive mechanism enables blazing-fast skipping of already downloaded items. Compared to TDL's current --skip-same, it's significantly more efficient – the difference isn't marginal. Seriously, ask any AI how powerful this mechanism is! Highly recommend adopting it! 🚀
Background
Improve efficiency
Workarounds
All methods
Good suggestion, gallery-dl is an A+ piece of software. The only issue with their implementation is they use defaults that are unique, but usually you can't ID what you downloaded with the minimal info that is saved in the db (often just a file ID). I use a custom config for each site to make the entries meaningful so I can make changes as needed when something gets out of sync.
So I second this proposal but request that entries are meaningful to humans, at least date+file ID (but I like to have at least part of the file name/title in there as well). On the scale most of us are working, a few mb difference in db size probably isn't hurting anything.