django-cloudinary-storage issue with exists for raw files

For some reason https://github.com/klis87/django-cloudinary-storage/blob/master/cloudinary_storage/storage.py#L83 doesnt work anymore for raw files, at least according to tests. For example below url for some reason gets 404 https://res.cloudinary.com/dri8awewt/raw/upload/v1/static/bd5f98a5-4524-408f-882f-fe31786c9ab7 during tests, but when I check cloudinary, this file exists, but it has a different version, but in the past v1 worked anyway...

@tiagocordeiro @bufke any ideas?

Sep 09 '20 21:09 klis87

Having the same issue, seems like Cloudinary bug to me as any other version that is greater than 1 or is 0 works, only v1 seems to be magically broken:

$ curl --head https://res.cloudinary.com/hcgihb9sq/raw/upload/v1/static/staticfiles.json
HTTP/2 404 
content-disposition: inline
content-transfer-encoding: binary
content-type: image/gif
x-cld-error: Resource not found - static/staticfiles.json
x-request-id: 2e276708f9b86a797938a120e3ffa6fd
date: Wed, 03 Aug 2022 11:18:04 GMT
strict-transport-security: max-age=604800
pragma: no-cache
cache-control: private, no-transform, max-age=0, no-cache
server-timing: fastly;dur=1;start=2022-08-03T11:18:04.134Z;desc=hit,rtt;dur=28
server: Cloudinary
timing-allow-origin: *
access-control-allow-origin: *
access-control-expose-headers: X-Cld-Error,Content-Length,Content-Disposition,Server-Timing
accept-ranges: bytes
content-length: 0

$ curl --head https://res.cloudinary.com/hcgihb9sq/raw/upload/v2/static/staticfiles.json
HTTP/2 200 
content-type: application/json
etag: "b6fe247c5dc66ac31be0d4d08fb8f955"
last-modified: Wed, 03 Aug 2022 11:11:46 GMT
date: Wed, 03 Aug 2022 11:18:11 GMT
vary: Accept-Encoding
strict-transport-security: max-age=604800
cache-control: public, no-transform, immutable, max-age=2592000
server-timing: fastly;dur=2;cpu=1;start=2022-08-03T11:18:11.358Z;desc=hit,rtt;dur=28
server: Cloudinary
timing-allow-origin: *
access-control-allow-origin: *
accept-ranges: bytes
access-control-expose-headers: Content-Length,ETag,Server-Timing,Vary
content-length: 5259

Workaround is to set the otherwise undocumented force_version setting to false in Django's settings.py:

CLOUDINARY = {
    "force_version": False
}

Unfortunately this setting affects the entire Cloudinary SDK not only cloudinary_storage which might or might not be an issue for your use case.

Another alternative would be allowing the cloudinary library to have a different default version from v1: https://github.com/cloudinary/pycloudinary/blob/master/cloudinary/utils.py#L762

Aug 03 '22 11:08 salomvary

Also, quite shockingly, at some point after the upload the v1 URL starts working. Computers are hard!

$ curl --head https://res.cloudinary.com/hcgihb9sq/raw/upload/v1/static/staticfiles.json
HTTP/2 200 
content-type: application/json
etag: "b6fe247c5dc66ac31be0d4d08fb8f955"
last-modified: Wed, 03 Aug 2022 11:11:46 GMT
date: Wed, 03 Aug 2022 11:50:46 GMT
vary: Accept-Encoding
strict-transport-security: max-age=604800
cache-control: public, no-transform, immutable, max-age=2592000
server-timing: fastly;dur=192;cpu=1;start=2022-08-03T11:50:46.576Z;desc=miss,rtt;dur=27,cloudinary;dur=101;start=2022-08-03T11:50:46.622Z
server: Cloudinary
timing-allow-origin: *
access-control-allow-origin: *
accept-ranges: bytes
access-control-expose-headers: Content-Length,ETag,Server-Timing,Vary
content-length: 5259

Aug 03 '22 11:08 salomvary

All right, after a lengthy conversation with Cloudinary's super helpful support we go to the bottom of this.

Here is what happens when using collectstatic and perhaps during other means of managing raw files:

For each file django-cloudinary-storage does a HEAD request to check whether the asset exists on the CDN (eg. HEAD https://res.cloudinary.com/dfue9w4ce/raw/upload/v1/some-file.json
If the file does not exists, Cloudinary CDN caches the 404 response
django-cloudinary-storage uploads some-file.json
GET https://res.cloudinary.com/dfue9w4ce/raw/upload/v1/some-file.json will keep responding with 404 because Cloudinary cached 404 (until the cache expires)

Here is the response I've got from Cloudinary:

As far as I can see, this is expected because 40x responses are cached for at least 60 seconds, with the cache time increasing in case of repeated requests to the failing URL; the maximum cache for such errors is 24 hours in the case of repeated requests over a long period.
We're currently developing a new feature for our CDN integrations which can invalidate the cache of 40x errors at the time the missing asset is uploaded, but currently it's expected that the cache must naturally expire or be manually cleared after the error is resolved.

To avoid this case completely, we recommend that you do not make any requests to access a file until after it has been uploaded, rather than 'polling' or preemptively requesting possibly-non-existent assets. We also recommend that replacing existing assets is done by overwriting rather than deleting and then replacing (as the latter can cache 404s if they occur between the deletion and replacement)

If you're performing a 'sync' or other similar operation, we recommend that you use the Admin API or Search API to load a list of existing assets, then upload those that are missing/changed.

You can also use the upload or admin API call responses to keep track of each asset's current version to include in your URLs; doing that will also bypass any CDN or other third-party cache, and because the exact value for each asset came from the API, it's also impossible to request that exact URL before the file exists.

Based on this I think it would probably be better to use the admin API for checking the existence of files.

Aug 08 '22 19:08 salomvary