castget icon indicating copy to clipboard operation
castget copied to clipboard

Change in feed tries to overwrite every file in spool directory

Open tonywhitmore opened this issue 5 years ago • 4 comments

I've experienced a few times when something changes in a feed that is otherwise up-to-date. (I don't know what that "something" is but it has happened to me on multiple feeds.) Running castget against that feed prompts that a file already exists. Deleting that file and re-running castget then downloads the previously duplicated file, and errors on the next file in the feed. I've pasted an example below. I don't know what it is that causes castget to try to download the files again - perhaps there is a modification date in the feed?

One workaround is to move all the files out of the spool directory, run castget and then move all the files back in. This works until the "something" happens again.

I'm not sure what the best expected behaviour here would be. Ideally there would be a way for castget determine if a file has already been downloaded and not try to re-download it, even if something in the feed relating to that file has changed. Alternatively a "force-overwrite" option would do it.

tony@azal:~$ castget BabbleOn
Updating channel BabbleOn...
Enclosure file /home/tony/podcasts/BabbleOn/906915376-hollywoodbabbleon-371-caped-commentaries-9.mp3 already exists.
tony@azal:~$ rm /home/tony/podcasts/BabbleOn/906915376-hollywoodbabbleon-371-caped-commentaries-9.mp3
tony@azal:~$ castget BabbleOn
Updating channel BabbleOn...
Enclosure file /home/tony/podcasts/BabbleOn/893438968-hollywoodbabbleon-370-caped-commentaries-8.mp3 already exists.
tony@azal:~$ rm /home/tony/podcasts/BabbleOn/893438968-hollywoodbabbleon-370-caped-commentaries-8.mp3
tony@azal:~$ castget BabbleOn
Updating channel BabbleOn...
Enclosure file /home/tony/podcasts/BabbleOn/890189104-hollywoodbabbleon-369-caped-commentaries-7.mp3 already exists.
tony@azal:~$ castget BabbleOn
Updating channel BabbleOn...
Enclosure file /home/tony/podcasts/BabbleOn/886485790-hollywoodbabbleon-368-caped-commentaries-6.mp3 already exists.

etc. etc.

tonywhitmore avatar Oct 19 '20 10:10 tonywhitmore

Hard to say why this is happening but castget's strategy for determining whether it has already seen a file is a bit naive --- maybe too naive.

It just looks at the URL, makes a record of it and ignores that URL if it ever sees it again. Perhaps some feeds change the URL regularly. If this is the case, I am not sure how to detect duplicates (except by redownloading the file and, for example, comparing MD5 sums).

Is the URL of the feed in your example http://feeds.feedburner.com/HollywoodBabbleOnPod? I will subscribe to it myself and see if I can spot what goes wrong :)

mlj avatar Oct 25 '20 22:10 mlj

Hi, yes, that's the feed URL although I've had it happen on a few others too. From memory:

http://feeds.feedburner.com/RichardHerringLSTPodcast https://audioboom.com/channels/4929797.rss (WonkHE)

I did wonder about the URL changing but not the filename too. That seems plausible, perhaps they make an edit to the episode, upload a new file and the platform they are using generates a new URL which replaces the old file in the feed. I am not using castget's filename re-writing on BabbleOn or WonkHE but am on RichardHerringLSTPodcast.

The next time it happens I will try to figure out what has changed in the feed too. Thanks!

tonywhitmore avatar Oct 26 '20 17:10 tonywhitmore

For what it's worth, I've started seeing this issue as well, with multiple feeds.

A couple examples:

  • http://textfiles.libsyn.com/rss
  • https://feeds.megaphone.fm/replyall

hisaac avatar Dec 14 '20 01:12 hisaac

This has just happened again with the Serial feed. It seems that they have updated the URLs in their feed, adding another redirecting service. So because the URL has changed, castget tries to download the file again, but the output filename is the same (whether using castget's rewriting capabilities or not) so castget returns an error. I'm guessing that it stops processing at the first error it finds in a feed?

So, a more accurate description of this bug might be that castget doesn't cope well when URLs are changed in a feed. I am not sure what the best behaviour here would be thought - it could be a command line switch to give the user the option to force overwrite duplicate files, or one to add URLs that produce a filename clash to the XML log file (either silently or with a warning).

tonywhitmore avatar Jan 09 '21 19:01 tonywhitmore