azure-functions-python-library icon indicating copy to clipboard operation
azure-functions-python-library copied to clipboard

Implement read1 method for InputStream

Open hardikpnsp opened this issue 4 years ago • 0 comments

Description

InputStream inherits from BufferedIOBase which defines a read1 method to return an arbitrary amount of bytes instead of everything till EOF. libraries like pandas provide functions to read from a io stream and they seem to be calling read1 method internally on the stream. They throw an error when InputStream is directly passed

Reproducible Example

import pandas as pd
from azure.functions.blob import InputStream

def error():
    i = InputStream(data=b'a,b,c,d\n1,2,3,4')
    # This throws read1() UnsupportedOperation exception
    df = pd.read_csv(i, sep=",")

def hack():
    i = InputStream(data=b'a,b,c,d\n1,2,3,4')

    def read1(self, size: int = -1) -> bytes:
        return self.read(size)

    setattr(InputStream, 'read1', read1)
    
    # This works because we hacked read1 method into InputStream
    with pd.read_csv(i, sep=",", chunksize=1) as reader:
        for chunk in reader:
            print(chunk)
            
if __name__ == "__main__":
    hack()
    error()

versions: python=3.7 pandas=1.2.4

Use cases

  • being able to read and process buffered streams of CSV-like files in pandas can be more memory efficient?
  • makes InputStream more compatible with other libraries reading IOStreams

References

someone had implemented read1 in ABC of the python-worker repo in this PR. seems like details are lost in a forced push (This might be unrelated, not sure)

hardikpnsp avatar Jun 23 '21 06:06 hardikpnsp