Matt Robinson

Results 28 issues of Matt Robinson

### Summary Adds a `UnstructuredURLLoader` that supports loading data from a list of URLs. ### Testing ```python from langchain.document_loaders import UnstructuredURLLoader urls = [ "https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-8-2023", "https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-9-2023" ] loader = UnstructuredURLLoader(urls=urls)...

### Summary Implements a basic Unstructured file loader for langchainjs. The loader posts the specified file to the Unstructured REST API. The API is not public yet, but you can...

### Summary Adds a document loader for handling markdown files. This document loader requires `unstructured>=0.4.16`. ### Testing ```python from langchain.document_loaders import UnstructuredMarkdownLoader loader = UnstructuredMarkdownLoader("README.md") loader.load() ```

### Summary Adds a loader for rich text files. Requires `unstructured>=0.5.12`. ### Testing The following test uses the example RTF file from the [`unstructured` repo](https://github.com/Unstructured-IO/unstructured/tree/main/example-docs). ```python from langchain.document_loaders import UnstructuredRTFLoader...

### Summary Updates the `UnstructuredURLLoader` to include a "elements" mode that retains additional metadata from `unstructured`. This makes `UnstructuredURLLoader` consistent with other unstructured loaders, which also support "elements" mode. Patched...

### Summary Adds support for passing in an Unstructured API key to the Unstructured loaders. Currently, the Unstructured API does not require an API key, but it will in the...

# ODF File Loader Adds a data loader for handling Open Office ODT files. Requires `unstructured>=0.6.3`. ### Testing The following should work using the `fake.odt` example doc from the [`unstructured`...

### Submit Multiple Files to the Unstructured API Enables batching multiple files into a single Unstructured API requests. Support for requests with multiple files was added to both `UnstructuredAPIFileLoader` and...

# Unstructured Excel Loader Adds an `UnstructuredExcelLoader` class for `.xlsx` and `.xls` files. Works with `unstructured>=0.6.7`. A plain text representation of the Excel file will be available under the `page_content`...

### Summary Gives users the ability to specify a `file_loader_cls` for processing files in Google Drive that are not Google Documents or Google Sheets. Fixes #5791. See also [this Twitter...