unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

feat: Add Mixedbread AI Integration

Open huangrpablo opened this issue 1 year ago • 10 comments

Hey guys! This is from Mixedbread AI. With mixedbread we're building SOTA models and tools to streamline retrieval. So far, we trained some of the most widely used open-source embeddings and reranking models (https://huggingface.co/mixedbread-ai).

We notice that Unstructured is providing embeddings integrations. We would love to partner up and contribute to this project. @MthwRobinson

huangrpablo avatar Jul 03 '24 09:07 huangrpablo

@vangheem @potter-potter would love to hear your thoughts on this!

juliuslipp avatar Jul 09 '24 09:07 juliuslipp

Thanks for the contribution @huangrpablo ! We'll review this as soon as we're able.

MthwRobinson avatar Jul 11 '24 14:07 MthwRobinson

CI for this PR is running on #3392

MthwRobinson avatar Jul 12 '24 17:07 MthwRobinson

Looks like potentially some dependency conflicts. Running make pip-compile from a Python 3.9 environment will likely fix that.

  • https://github.com/Unstructured-IO/unstructured/actions/runs/9911978833/job/27385814239?pr=3392

MthwRobinson avatar Jul 12 '24 17:07 MthwRobinson

From the CI on the "clone PR", you may also need to update the version in unstructured/__version__.py (see this job)

MthwRobinson avatar Jul 12 '24 22:07 MthwRobinson

@MthwRobinson just made the fixes. make pip-compile also resulted in the dependency changes of other connectors.

huangrpablo avatar Jul 13 '24 10:07 huangrpablo

@MthwRobinson hey, I put the dependencies of other integrations back to untouched. Could you have a look and run the CI again if it looks fine? Thanks!

huangrpablo avatar Jul 15 '24 10:07 huangrpablo

@MthwRobinson Hey, would love to hear any update on this!

huangrpablo avatar Jul 16 '24 08:07 huangrpablo

@MthwRobinson Hey, would love to hear any update on this!

@huangrpablo I'll take over for Robinson from here. I'm gonna check it out today.

potter-potter avatar Jul 16 '24 20:07 potter-potter

@potter-potter hey, any update on this?

huangrpablo avatar Jul 19 '24 07:07 huangrpablo

copying this over to https://github.com/Unstructured-IO/unstructured/pull/3513 so that its easier to run CI, etc.

potter-potter avatar Aug 12 '24 15:08 potter-potter