urllib3 icon indicating copy to clipboard operation
urllib3 copied to clipboard

[v2] Create a BaseHTTPConnection APIs

Open sethmlarson opened this issue 5 years ago • 5 comments

  • API should only implement methods like connect, request, close, etc that other urllib3 components use.
  • Create a BaseHTTPSConnection API as well with set_cert(...) and extra kwargs in __init__
  • Have urllib3.HTTPConnection and HTTPSConnection inherit from BaseHTTPConnection and BaseHTTPSConnection

Sub-issues

  • https://github.com/urllib3/urllib3/issues/2648

sethmlarson avatar Sep 25 '20 03:09 sethmlarson

Went through how HTTP[S]ConnectionPool uses HTTP[S]Connection and found the following methods and parameters in use:

import socket
from urllib3.util import Timeout
from urllib3.response import BaseHTTPResponse

class BaseHTTPConnection:
    http_version: str
    sock: socket.socket
    auto_open: bool  # Whether the HTTPConnection was opened by proxy?
    timeout: Timeout
    is_verified: bool

    def connect(self) -> None: ...
    def request(self, method, target, headers, body=None, chunked=False) -> None: ...
    def getresponse(self) -> BaseHTTPResponse: ...
    def close(self) -> None: ...


class BaseHTTPSConnection(BaseHTTPConnection):
    ssl_version: int
    def set_cert(self) -> None: ...

Also was wondering if we can move the conn.sock timeout section before getresponse() into HTTPConnection?


        # >>> Can all this block go into HTTPConnection.getresponse()?
        if conn.sock:
            # In Python 3 socket.py will catch EAGAIN and return None when you
            # try and read into the file pointer created by http.client, which
            # instead raises a BadStatusLine exception. Instead of catching
            # the exception and assuming all BadStatusLine exceptions are read
            # timeouts, check for a zero timeout before making the request.
            if read_timeout == 0:
                raise ReadTimeoutError(
                    self, url, f"Read timed out. (read timeout={read_timeout})"
                )
            if read_timeout is Timeout.DEFAULT_TIMEOUT:
                conn.sock.settimeout(socket.getdefaulttimeout())
            else:  # None or a value
                conn.sock.settimeout(read_timeout)

        # Receive the response from the server
        try:
            httplib_response = conn.getresponse()
        # ...

sethmlarson avatar Dec 16 '20 03:12 sethmlarson

FYI: Need to go back and look at the changes made in https://github.com/urllib3/urllib3/pull/2251 after this lands to update usages of HTTPConnection to BaseHTTPConnection.

sethmlarson avatar Jun 02 '21 21:06 sethmlarson

  • There are also set_tunnel and tls_in_tls_required (for HTTPSConnection), or are they intended to be encapsulated in one of the other methods?

  • You wrote def getresponse(self) -> BaseHTTPResponse: ..., but currently getresponse returns an httplib HTTPResponse and the connection pool converts it to urllib3 HTTPResponse. Is the idea to change it such that the getresponse implementation returns a BaseHTTPResponse directly itself?

bluetech avatar Jun 20 '21 18:06 bluetech

Thanks for raising these questions @bluetech, despite this issue looking like it's completely fleshed out it's almost certainly not 100% covered. Lots of discovery left to do.

  • Yes set_tunnel() is required on HTTPConnection and tls_in_tls_required for HTTPSConnection.
  • I think that getresponse() definitely needs to change to return BaseHTTPResponse instead of http.client.HTTPResponse.

Let me know if there's other uncertainties and keep this thread up to date with your findings.

sethmlarson avatar Jun 21 '21 14:06 sethmlarson

Removing the bounty from this parent issue and going to create sub-issues so this work is easier to scope for individuals.

sethmlarson avatar Jun 21 '22 15:06 sethmlarson