RayPipeline fails with two Retrievers + Reader pipeline
Describe the bug
The RayPipeline fails with two Retrievers (like BM25&Embedding Retriever or two BM25 Retrievers, etc) + Reader pipeline. It seems the RayPipeline was never tested for parallel nodes (eg BM25Retriever and EmbeddingRetriever nodes needs to join their results to hand those over to the Reader node)
Error message
File ~/test-ray-haystack/haystack/pipelines/ray.py:271, in RayPipeline.run(self, query, file_paths, labels, documents, meta, params)
--> 266 output = self.graph.nodes[n_id]["component"].run(**input_dict)
267 inputs_for_join_node["inputs"].append(output)
268 input_dict = inputs_for_join_node
TypeError: 'RayServeSyncHandle' object is not callable
Expected behavior
The RayPipeline should work with any and all Haystack Pipelines, including parallel nodes.
Additional context
We have discussed this with @ZanSara on Slack. We agreed that as part of a refactoring I will remove RayPipeline.run and make Pipeline.run to be usable from RayPipeline - which should fix this issue and any other potential issues hidden by using the customer RayPipeline.run instead of Pipeline.run.
To Reproduce
Run RayPipeline with a standard BM25 & Embedding Retriever + Reader pipeline.
FAQ Check
- [ x ] Have you had a look at our new FAQ page?
System:
- OS: Linux Mint 20.2
- GPU/CPU:
- Haystack version (commit or version number): 1.6.0
- DocumentStore: Weaviate
- Reader: deepset/roberta-base-squad2
- Retriever: BM25 and EmbeddingRetriever (sentence-transformers/multi-qa-mpnet-base-dot-v1)
As agreed with @ZanSara, I will be providing a PR for this
Thanks for flagging this and volunteering a PR @zoltan-fedor 💪🏽