haystack-core-integrations icon indicating copy to clipboard operation
haystack-core-integrations copied to clipboard

Improve init arguments of `OpenSearchDocumentStore`

Open EdAbati opened this issue 1 year ago • 0 comments

Is your feature request related to a problem? Please describe.

I think that the current implementation of OpenSearchDocumentStore should be more specific about which arguments it accepts in the __init__ method.

  • Some specific keys extracted from the kwargs(e.g. "embedding_dim", "method", "settings") are not mentioned in the docstring. The docstring says that the kwargs are the ones that the OpenSearch client takes. AFAIK (please correct me if I'm wrong) the ones extracted are not common kwargs required by the client. For example, they are used when creating an index (client.indices.create()).
  • It'd be nice to have also a max_chunk_bytes argument to pass to the internal bulk function. The default 100MB may be too big for certain Openserach instances. In these cases, the current implementation raises an error.

Describe the solution you'd like

I suggest adding these arguments explicitly to the __init__ and to improve its docstring.

e.g.

def __init__(
        self,
        *,
        hosts: Optional[Hosts] = None,
        index: str = "default",
        embedding_dim: int = ...,
        method: ...,
        max_chunk_bytes: ...,
        **kwargs,
    ):

Additional context Happy to make a PR if you agree on this proposal :)

EdAbati avatar Apr 18 '24 16:04 EdAbati