basic_auth_header uses the wrong flavor of base64

Open Gallaecio opened this issue 3 years ago • 0 comments

I have reason to believe that basic_auth_header is wrong in using urlsafe_b64encode (which replaces +/ with -_) instead of b64encode.

The first specification of HTTP basic auth according to Wikipedia is HTTP 1.0, which does not mention any special flavor of base64, and points for a definition of base64 to RFC-1521, which describes regular base64. The latest HTTP basic auth specification according to Wikipedia is RFC-7617, which similarly does not specify any special flavor of base64, and points to section 4 of RFC-4648, which also describes the regular base64.

I traced the origin of this bug, and it has been there at least since the first Git commit of Scrapy.

>>> from w3lib.http import basic_auth_header

Actual:

>>> basic_auth_header('aa~aa¿', '')
b'Basic YWF-YWG_Og=='

Expected:

>>> basic_auth_header('aa~aa¿', '')
b'Basic YWF+YWG/Og=='

I believe this bug only affects ASCII credentials that include the >, ? or ~ characters in certain positions.

For richer encodings like UTF-8, which is what basic_auth_header uses (~~and makes sense as a default, but it should be configurable~~ rightly so), many more characters can be affected (e.g. ¿ in the example above).

Aug 02 '22 12:08 Gallaecio