cachecontrol icon indicating copy to clipboard operation
cachecontrol copied to clipboard

TooManyRedirects with PyPI json API

Open jayvdb opened this issue 5 years ago • 0 comments

I've been working with a cache of PyPI json records for a while, and have two resources which now causes TooManyRedirects because of PyPI package normalisation.

One is https://pypi.org/project/django-coverage-plugin/

The JSON is at

https://pypi.org/pypi/django-coverage-plugin/json

However I am occasionally using django_coverage_plugin

i.e. django_coverage_plugin which redirects to https://pypi.org/pypi/django-coverage-plugin/json , for which I have a cache entry eb708b277cfec19dff1c796663031b09f5fc8ba511d43b56dad8fcc5 created today:

cc=4,��response��body��<html>
 <head>
  <title>301 Moved Permanently</title>
 </head>
 <body>
  <h1>301 Moved Permanently</h1>
  The resource has been moved to /pypi/django-coverage-plugin/json; you should be redirected automatically.


 </body>
</html>�headers��Connection�keep-alive�Content-Length�230�Access-Control-Allow-Headers�MContent-Type, If-Match, If-Modified-Since, If-None-Match, If-Unmodified-Since�Access-Control-Allow-Methods�GET�Access-Control-Allow-Origin�*�Access-Control-Expose-Headers�X-PyPI-Last-Serial�Access-Control-Max-Age�86400�Cache-Control�max-age=900, public�Content-Security-Policy�Wbase-uri 'self'; block-all-mixed-content; connect-src 'self' https://api.github.com/repos/ *.fastly-insights.com sentry.io https://api.pwnedpasswords.com https://2p66nmmycsj3.statuspage.io; default-src 'none'; font-src 'self' fonts.gstatic.com; form-action 'self'; frame-ancestors 'none'; frame-src 'none'; img-src 'self' https://warehouse-camo.cmh1.psfhosted.org/ www.google-analytics.com *.fastly-insights.com; script-src 'self' www.googletagmanager.com www.google-analytics.com *.fastly-insights.com https://cdn.ravenjs.com; style-src 'self' fonts.googleapis.com; worker-src *.fastly-insights.com�Content-Type�text/html; charset=UTF-8�Location�1https://pypi.org/pypi/django-coverage-plugin/json�Referrer-Policy�origin-when-cross-origin�Server�nginx/1.13.9�Accept-Ranges�bytes�Date�Fri, 17 Jan 2020 03:56:29 GMT�X-Served-By�%cache-iad2133-IAD, cache-sin18040-SIN�X-Cache�HIT, MISS�X-Cache-Hits�1, 0�X-Timer�S1579233390.755222,VS0,VE228�Vary�Accept-Encoding�Strict-Transport-Security�,max-age=31536000; includeSubDomains; preload�X-Frame-Options�deny�X-XSS-Protection�1; mode=block�X-Content-Type-Options�nosniff�!X-Permitted-Cross-Domain-Policies�none�status�-�version
�reason�Moved Permanently�strict�decode_content¤vary��Accept-Encoding�gzip, deflate

But I also have an older cache entry 17e4bde404ebd1e71cb2a45d038b6c02900991906af7a0956d110822

cc=4,��response��body��<html>
 <head>
  <title>301 Moved Permanently</title>
 </head>
 <body>
  <h1>301 Moved Permanently</h1>
  The resource has been moved to /pypi/django_coverage_plugin/json; you should be redirected automatically.


 </body>
</html>�headers��Connection�keep-alive�Content-Length�230�Access-Control-Allow-Headers�MContent-Type, If-Match, If-Modified-Since, If-None-Match, If-Unmodified-Since�Access-Control-Allow-Methods�GET�Access-Control-Allow-Origin�*�Access-Control-Expose-Headers�X-PyPI-Last-Serial�Access-Control-Max-Age�86400�Cache-Control�max-age=900, public�Content-Security-Policy�8base-uri 'self'; block-all-mixed-content; connect-src 'self' https://api.github.com/repos/ *.fastly-insights.com sentry.io https://2p66nmmycsj3.statuspage.io; default-src 'none'; font-src 'self' fonts.gstatic.com; form-action 'self'; frame-ancestors 'none'; frame-src 'none'; img-src 'self' https://warehouse-camo.cmh1.psfhosted.org/ www.google-analytics.com *.fastly-insights.com; script-src 'self' www.googletagmanager.com www.google-analytics.com *.fastly-insights.com https://cdn.ravenjs.com; style-src 'self' fonts.googleapis.com; worker-src *.fastly-insights.com�Content-Type�text/html; charset=UTF-8�Location�1https://pypi.org/pypi/django_coverage_plugin/json�Referrer-Policy�origin-when-cross-origin�Server�nginx/1.13.9�Accept-Ranges�bytes�Date�Sat, 11 Jan 2020 12:42:12 GMT�X-Served-By�%cache-iad2137-IAD, cache-sin18050-SIN�X-Cache�HIT, MISS�X-Cache-Hits�1, 0�X-Timer�S1578746532.880195,VS0,VE228�Vary�Accept-Encoding�Strict-Transport-Security�,max-age=31536000; includeSubDomains; preload�X-Frame-Options�deny�X-XSS-Protection�1; mode=block�X-Content-Type-Options�nosniff�!X-Permitted-Cross-Domain-Policies�none�status�-�version
                                                                                                                                                                 �reason�Moved Permanently�strict�decode_content¤vary��Accept-Encoding�gzip, deflate

The other package I encountered this with is an old record for jupyter-console redirecting to jupyter_console and a new jupyter_console entry for a redirect to back to jupyter-console.

Maybe the backend is changing whether it uses the normalised name or the literal name in the json path name, or it depends on the version of the uploader, or maybe it was gamma rays. I havent found any consistency yet.

https://pypi.org/pypi/setuptools_scm/json redirects _ to - but https://pypi.org/pypi/backports.ssl_match_hostname/json does not.

This is likely to be a problem for pip if it uses the JSON api.

Anyways, as these cycles are occurring between cache entries which are not being refreshed, repeating the loop multiple times is a bit silly (default requests redirect max is 30). Seems like it would be appropriate to detect the cycle early and possibly invalid the cache entries so the server can resolve the problem, or at least re-affirm the problem still exists. This could also be solved by invalidating any redirect cache entry if it is older than any of the redirect cache entries encountered whilst handling the current request.

jayvdb avatar Jan 24 '20 14:01 jayvdb