vectorflow icon indicating copy to clipboard operation
vectorflow copied to clipboard

Add integration to salesforce

Open dgarnitz opened this issue 2 years ago • 8 comments

Vectorflow should be able to ingest raw data from Salesforce.

Some open questions to explore prior to implementation:

  • can this we done through the existing API or does it need a separate ingestion worker
  • how to share credentials / do this securely
  • what file formats can be expected

dgarnitz avatar Aug 15 '23 14:08 dgarnitz

Thanks for raising this feature request. Ingesting data from Salesforce into VectorFlow is definitely something we should explore supporting. Here are some initial thoughts on your questions:

Regarding ingestion - It looks like the Salesforce REST API provides options for exporting data in JSON, XML, and CSV formats. I think the best approach would be to build a separate lightweight ingestion worker specifically for Salesforce data. This worker could handle authentication with Salesforce using OAuth, make API calls to export data, do any needed parsing/validation, and then pass the transformed data to VectorFlow's main ingestion pipeline.

For security - OAuth should allow secure authentication for the ingestion worker to access Salesforce. We can encrypt any credentials stored in configuration. Restricting the worker to only access the needed Salesforce data exports will also be important.

Suggested file formats - The Salesforce API supports JSON, XML and CSV. CSV may be the easiest to work with in VectorFlow if we can get full data exports. For more targeted exports, JSON or XML may be required. Some parsing would be needed in the ingestion worker before passing data to VectorFlow in a supported format like Parquet.

Possible Next Steps, which can be further worked on:

Exploring Salesforce OAuth authentication flows for the ingestion worker Test sample data exports from Salesforce API in JSON, XML and CSV Prototype basic ingestion worker to extract sample export, parse data, and write to Parquet Evaluation of how exported data maps to VectorFlow's expected input schema (Important)

asadnhasan avatar Oct 11 '23 22:10 asadnhasan

How feasible would it be to use this: https://llamahub.ai/l/tools-salesforce?

I don't think we should have a separate salesforce worker. An endpoint, /salesforce in the existing API should do the trick. Can you choose what format (i.e. JSON or CSV) that the data is exported in?

dgarnitz avatar Oct 11 '23 23:10 dgarnitz

Small note: I'd look into how airbyte solve this: https://github.com/airbytehq/airbyte/tree/f54bd550aae9b4bf19220b50af47da0adc3b4ff1/airbyte-integrations/connectors/source-salesforce

mmabrouk avatar Oct 18 '23 12:10 mmabrouk

We are planning to add an Airbyte connector, maybe we can access the salesforce data through that

dgarnitz avatar Oct 20 '23 16:10 dgarnitz

@asadnhasan are you still planning on working on this?

david-vectorflow avatar Nov 08 '23 22:11 david-vectorflow

@david-vectorflow Yes, David I am still working on it.

asadnhasan avatar Nov 09 '23 06:11 asadnhasan

@asadnhasan hey do you still have an interest in building this out?

dgarnitz avatar Apr 04 '24 18:04 dgarnitz

Yes absolutely, I would love to contribute. Will come with something in 2-3 days.

syedzaidi-kiwi avatar Apr 06 '24 19:04 syedzaidi-kiwi