Add integration to salesforce
Vectorflow should be able to ingest raw data from Salesforce.
Some open questions to explore prior to implementation:
- can this we done through the existing API or does it need a separate ingestion worker
- how to share credentials / do this securely
- what file formats can be expected
Thanks for raising this feature request. Ingesting data from Salesforce into VectorFlow is definitely something we should explore supporting. Here are some initial thoughts on your questions:
Regarding ingestion - It looks like the Salesforce REST API provides options for exporting data in JSON, XML, and CSV formats. I think the best approach would be to build a separate lightweight ingestion worker specifically for Salesforce data. This worker could handle authentication with Salesforce using OAuth, make API calls to export data, do any needed parsing/validation, and then pass the transformed data to VectorFlow's main ingestion pipeline.
For security - OAuth should allow secure authentication for the ingestion worker to access Salesforce. We can encrypt any credentials stored in configuration. Restricting the worker to only access the needed Salesforce data exports will also be important.
Suggested file formats - The Salesforce API supports JSON, XML and CSV. CSV may be the easiest to work with in VectorFlow if we can get full data exports. For more targeted exports, JSON or XML may be required. Some parsing would be needed in the ingestion worker before passing data to VectorFlow in a supported format like Parquet.
Possible Next Steps, which can be further worked on:
Exploring Salesforce OAuth authentication flows for the ingestion worker Test sample data exports from Salesforce API in JSON, XML and CSV Prototype basic ingestion worker to extract sample export, parse data, and write to Parquet Evaluation of how exported data maps to VectorFlow's expected input schema (Important)
How feasible would it be to use this: https://llamahub.ai/l/tools-salesforce?
I don't think we should have a separate salesforce worker. An endpoint, /salesforce in the existing API should do the trick. Can you choose what format (i.e. JSON or CSV) that the data is exported in?
Small note: I'd look into how airbyte solve this: https://github.com/airbytehq/airbyte/tree/f54bd550aae9b4bf19220b50af47da0adc3b4ff1/airbyte-integrations/connectors/source-salesforce
We are planning to add an Airbyte connector, maybe we can access the salesforce data through that
@asadnhasan are you still planning on working on this?
@david-vectorflow Yes, David I am still working on it.
@asadnhasan hey do you still have an interest in building this out?
Yes absolutely, I would love to contribute. Will come with something in 2-3 days.