azure-functions-python-library
azure-functions-python-library copied to clipboard
Implement read1 method for InputStream
Description
InputStream inherits from BufferedIOBase which defines a read1 method to return an arbitrary amount of bytes instead of everything till EOF. libraries like pandas provide functions to read from a io stream and they seem to be calling read1 method internally on the stream. They throw an error when InputStream is directly passed
Reproducible Example
import pandas as pd
from azure.functions.blob import InputStream
def error():
i = InputStream(data=b'a,b,c,d\n1,2,3,4')
# This throws read1() UnsupportedOperation exception
df = pd.read_csv(i, sep=",")
def hack():
i = InputStream(data=b'a,b,c,d\n1,2,3,4')
def read1(self, size: int = -1) -> bytes:
return self.read(size)
setattr(InputStream, 'read1', read1)
# This works because we hacked read1 method into InputStream
with pd.read_csv(i, sep=",", chunksize=1) as reader:
for chunk in reader:
print(chunk)
if __name__ == "__main__":
hack()
error()
versions:
python=3.7
pandas=1.2.4
Use cases
- being able to read and process buffered streams of CSV-like files in pandas can be more memory efficient?
- makes
InputStreammore compatible with other libraries reading IOStreams
References
someone had implemented read1 in ABC of the python-worker repo in this PR. seems like details are lost in a forced push (This might be unrelated, not sure)