haystack-core-integrations
haystack-core-integrations copied to clipboard
Improve init arguments of `OpenSearchDocumentStore`
Is your feature request related to a problem? Please describe.
I think that the current implementation of OpenSearchDocumentStore should be more specific about which arguments it accepts in the __init__ method.
- Some specific keys extracted from the
kwargs(e.g."embedding_dim","method","settings") are not mentioned in the docstring. The docstring says that thekwargsare the ones that theOpenSearchclient takes. AFAIK (please correct me if I'm wrong) the ones extracted are not commonkwargsrequired by the client. For example, they are used when creating an index (client.indices.create()). - It'd be nice to have also a
max_chunk_bytesargument to pass to the internalbulkfunction. The default 100MB may be too big for certain Openserach instances. In these cases, the current implementation raises an error.
Describe the solution you'd like
I suggest adding these arguments explicitly to the __init__ and to improve its docstring.
e.g.
def __init__(
self,
*,
hosts: Optional[Hosts] = None,
index: str = "default",
embedding_dim: int = ...,
method: ...,
max_chunk_bytes: ...,
**kwargs,
):
Additional context Happy to make a PR if you agree on this proposal :)