`Pipeline.run_batch()` fails on indexing
Describe the bug
-
Pipeline.run_batch()fails on indexing pipelines. - The failure is early and seems to occur in the
run_batchmethod itself - The failure seems trivial
Error message
Traceback (most recent call last):
File "/home/sara/work/haystack/examples/example2.py", line 40, in <module>
pipe.run_batch(file_paths=next(os.walk('examples'))[2])
File "/home/sara/work/haystack/haystack/pipelines/base.py", line 612, in run_batch
documents=flattened_documents,
UnboundLocalError: local variable 'flattened_documents' referenced before assignment
To Reproduce
import os
from haystack import Pipeline
from haystack.nodes import FileTypeClassifier, TextConverter
from haystack.document_stores import InMemoryDocumentStore
pipe = Pipeline()
pipe.add_node(name="classifier", component=FileTypeClassifier(supported_types=["py", "sh", "png", "yml"]), inputs=["File"])
pipe.add_node(name="py-converter", component=TextConverter(), inputs=["classifier.output_1"])
pipe.add_node(name="sh-converter", component=TextConverter(), inputs=["classifier.output_2"])
pipe.add_node(name="yml-converter", component=TextConverter(), inputs=["classifier.output_4"])
pipe.add_node(name="docstore", component=InMemoryDocumentStore(), inputs=["py-converter", "sh-converter", "yml-converter"])
docs_to_index = next(os.walk('examples'))[2] # Substitute the path to reproduce
print("Docs to index:")
for doc in docs_to_index:
print(f" - {doc}")
pipe.run_batch(file_paths=docs_to_index)
https://github.com/deepset-ai/haystack/blob/c91316e862c3fb751b3e8996ddd5f99b5563ae81/haystack/pipelines/base.py#L558-L616
This part seems dedicated to excluding indexing Pipelines from run_batch, using simple run.
I tried to define flattened_documents just before this condition: https://github.com/deepset-ai/haystack/blob/c91316e862c3fb751b3e8996ddd5f99b5563ae81/haystack/pipelines/base.py#L603
and the UnboundLocalError isn't raised.
But now, if the directory contains mixed types of files, I get other errors probably related to FileTypeClassifier. Reported in #2999
Just... wow. I wasn't aware of this catch in run_batch for indexing. Thank you so much for highlighting it :pray:
The fix you found is probably sufficient by the way: I guess this code path was just left untested :smiling_face_with_tear: Thank you for checking it out, feel free to open a PR for this little change alone.