`StreamReader.read` reads less than it should
Describe the bug
According with the documentation of aiohttp.StreamReader.read. It should read as many bytes as indicated.
Read up to n bytes. https://docs.aiohttp.org/en/stable/streams.html?highlight=streamread#aiohttp.StreamReader.read
For large payloads, it reads less than expected. Even the number of reading bytes will not be the same, but always insufficient.
To Reproduce
import asyncio
import aiohttp
async def main(url, n):
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
html = await response.content.read(n)
print(html)
print(len(html))
if __name__ == '__main__':
url = 'https://www.google.com/maps/place/Eiffelturm/@48.85837,2.2899966,17z/data=!3m1!4b1!4m5!3m4!1s0x47e66e2964e34e2d:0x8ddca9ee380ef7e0!8m2!3d48.8583701!4d2.2944813' # noqa
n = 1_000_000
asyncio.run(main(url, n))
Expected behavior
This call should read the full content.
Logs/tracebacks
-
Python Version
python 3.9.6
aiohttp Version
3.8.1
multidict Version
6.0.2
yarl Version
1.7.2
OS
Ubuntu (Linux)
Related component
Client
Additional context
No response
Code of Conduct
- [X] I agree to follow the aio-libs Code of Conduct
async def readexactly(self, n: int) -> bytes:
if self._exception is not None:
raise self._exception
blocks = [] # type: List[bytes]
while n > 0:
block = await self.read(n)
if not block:
partial = b"".join(blocks)
raise asyncio.IncompleteReadError(partial, len(partial) + n)
blocks.append(block)
n -= len(block)
return b"".join(blocks)
Alternatively, why not use StreamReader.readexactly in a try-except clause, catching the partial from the exception? Or better yet use the above canonical form in your own implementation to avoid exception overhead. TCP read implementations tend to always have this restriction due to congestion, and having readexactly-type semantics would be very counter-productive when (almost in all cases) you don't know the size of the receiving data.
import asyncio
import aiohttp
async def main(url, n):
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
try:
html = await response.content.readexactly(n)
except asyncio.exceptions.IncompleteReadError as exc:
print(exc.partial[-32:])
if __name__ == '__main__':
url = 'https://www.google.com/maps/place/Eiffelturm/@48.85837,2.2899966,17z/data=!3m1!4b1!4m5!3m4!1s0x47e66e2964e34e2d:0x8ddca9ee380ef7e0!8m2!3d48.8583701!4d2.2944813' # noqa
n = 1_000_000
asyncio.run(main(url, n))
Returns a </html> and EOF correctly.
I ran into the same issue, think this should be a workaround:
async def main(url, n):
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
html = bytes()
while len(html) < n:
chunk = await r.content.read(n)
if not chunk:
break
html += chunk
print(html)
print(len(html))