crates.io icon indicating copy to clipboard operation
crates.io copied to clipboard

Set appropriate cache-related headers for database dumps.

Open smarnach opened this issue 6 years ago • 1 comments

In #1800 we introduced database dumps that can be downloaded from https://static.crates.io/db-dump.tar.gz. The dumps are updated every 24 hours. However, CloudFront may cache them for up to 24 hours, so in the worst case users will see a new dump only shortly before the next dump is generated.

We can fix this by setting appropriate caching headers for the dump. Here are some ideas:

  • We could set an "expires" header to, say, 24.5 hours after the dump was created. This would give some wiggle room for different dump creation times, but it would ensure that the new dump will become available roughly half an hour after it was created. However, the dump frequency is configured in the Heroku scheduler, so if we decide to set a different frequency, we would need to remember to update the code as well, so we should at least introduce a command line parameter to enqueue-job if we decide to use this option. Another downside is that if a dump job fails, we will have a dump with an expiry in the past, so it won't be cached anymore.

  • It's probably possible to set the "etag" header together with a low TTL in the "cache-control" header. I believe this will result in CloudFront frequently asking S3 whether a version with a different etag is available, but it will only retransfer the dump if it has actually changed. This option has the advantage of being indepent of the dump frequency, but it needs further investigation whether things really work the way I seem to remember.

There may be other options as well – we can discuss this here on the issue.

Related: #1871, #1826, #1915

smarnach avatar Nov 22 '19 20:11 smarnach

@jtgeibel @carols10cents This issue may contribute to the database dumps being older than expected, which may be the root cause for the emailed report mentioned in Friday's meeting.

smarnach avatar Mar 02 '20 02:03 smarnach

we've implemented explicit cache invalidation a while ago, so this is probably not strictly needed anymore :)

Turbo87 avatar Jun 25 '24 20:06 Turbo87