aiohttp icon indicating copy to clipboard operation
aiohttp copied to clipboard

`StreamReader.read` reads less than it should

Open antonioromero opened this issue 3 years ago • 1 comments

Describe the bug

According with the documentation of aiohttp.StreamReader.read. It should read as many bytes as indicated.

Read up to n bytes. https://docs.aiohttp.org/en/stable/streams.html?highlight=streamread#aiohttp.StreamReader.read

For large payloads, it reads less than expected. Even the number of reading bytes will not be the same, but always insufficient.

To Reproduce

import asyncio

import aiohttp


async def main(url, n):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            html = await response.content.read(n)
            print(html)
            print(len(html))


if __name__ == '__main__':

    url = 'https://www.google.com/maps/place/Eiffelturm/@48.85837,2.2899966,17z/data=!3m1!4b1!4m5!3m4!1s0x47e66e2964e34e2d:0x8ddca9ee380ef7e0!8m2!3d48.8583701!4d2.2944813'  # noqa
    n = 1_000_000
    asyncio.run(main(url, n))

Expected behavior

This call should read the full content.

Logs/tracebacks

-

Python Version

python 3.9.6

aiohttp Version

3.8.1

multidict Version

6.0.2

yarl Version

1.7.2

OS

Ubuntu (Linux)

Related component

Client

Additional context

No response

Code of Conduct

  • [X] I agree to follow the aio-libs Code of Conduct

antonioromero avatar May 30 '22 12:05 antonioromero

    async def readexactly(self, n: int) -> bytes:
        if self._exception is not None:
            raise self._exception

        blocks = []  # type: List[bytes]
        while n > 0:
            block = await self.read(n)
            if not block:
                partial = b"".join(blocks)
                raise asyncio.IncompleteReadError(partial, len(partial) + n)
            blocks.append(block)
            n -= len(block)

        return b"".join(blocks)

Alternatively, why not use StreamReader.readexactly in a try-except clause, catching the partial from the exception? Or better yet use the above canonical form in your own implementation to avoid exception overhead. TCP read implementations tend to always have this restriction due to congestion, and having readexactly-type semantics would be very counter-productive when (almost in all cases) you don't know the size of the receiving data.

import asyncio

import aiohttp


async def main(url, n):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            try:
                html = await response.content.readexactly(n)
            except asyncio.exceptions.IncompleteReadError as exc:
                print(exc.partial[-32:])


if __name__ == '__main__':

    url = 'https://www.google.com/maps/place/Eiffelturm/@48.85837,2.2899966,17z/data=!3m1!4b1!4m5!3m4!1s0x47e66e2964e34e2d:0x8ddca9ee380ef7e0!8m2!3d48.8583701!4d2.2944813'  # noqa
    n = 1_000_000
    asyncio.run(main(url, n))

Returns a </html> and EOF correctly.

unazed avatar Jun 16 '22 18:06 unazed

I ran into the same issue, think this should be a workaround:

async def main(url, n):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            html = bytes()
            while len(html) < n:
                chunk = await r.content.read(n)
                if not chunk:
                    break
                html += chunk
            print(html)
            print(len(html))

Alekky09 avatar Jul 18 '23 16:07 Alekky09