data-formulator icon indicating copy to clipboard operation
data-formulator copied to clipboard

Create new data loaders to different resources

Open Chenglong-MS opened this issue 8 months ago • 14 comments

Hi devs and users,

We have recently extended data formulator with the ability to directly connect to external data sources with the ExternalDataLoader class. One can extend the python data loader class to make it possible to directly load data from external data sources. The frontend will automatically populate the external classes and provided ability to complete user queries (for loading data views).

https://github.com/user-attachments/assets/6c56979b-2f6c-4d9b-8f19-22c1fd56368c

Instruction for extending data loader and example implementations of MySQL and Azure Data Explorer are provided here: https://github.com/microsoft/data-formulator/tree/main/py-src/data_formulator/data_loader.

Would like to use this issue to collect which data sources you would like to see, and hopefully there are some devs able to add some more data loaders for popular sources (Google Big Query, Amazon S3 for example).

Chenglong-MS avatar May 13 '25 23:05 Chenglong-MS

Hi, We tried the new release to connect with mysql DB. It was successful. One thing we noticed here is getting real time data from data base is challenging. Do you have any inputs on this?

karthikadevaraj avatar May 14 '25 17:05 karthikadevaraj

@karthikadevaraj interesting use case, a direct support update would be adding a button/function to refresh the dataset, and automatically update views from different sources. This would work but may not be super efficient.

A potentially better way (though requires a bit more dev) is to directly run queries against the external data source (as opposed to use duckdb as the middle person). This needs a new abstract class design for the external data loader, and changes some of the querying logic in table-routes.py

Chenglong-MS avatar May 14 '25 18:05 Chenglong-MS

Hi @Chenglong-MS , thanks for sharing the inputs.

karthikadevaraj avatar May 15 '25 00:05 karthikadevaraj

It would be great to have it scrape data from papers directly. All tables, lists, graphs, etc. That will save us a lot of time!

Emasoft avatar May 26 '25 09:05 Emasoft

It would be a good idea to add this parameterization to the SQL Server connection in order to support more modern database engines, for example: TrustServerCertificate=yes;Connection Timeout=30;Encrypt=no;

sromero-rentanacional avatar May 26 '25 22:05 sromero-rentanacional

Please support:

  • loading of local parquet files
  • loading of cloud parquet files from Azure blob via Container Name, SAS-Token, Account Name

KlausGlueckert avatar May 27 '25 10:05 KlausGlueckert

@Chenglong-MS I have added an initial PR . This would work well for S3 connections

sumanth-dhanya avatar May 28 '25 04:05 sumanth-dhanya

thanks @slackroo adding S3 data loader, I have also added Azure Blob reader (for parquet, json, csv) files. Recently code agents make adding a new data loader quite easy!

Chenglong-MS avatar May 30 '25 21:05 Chenglong-MS

Just added postgreSQL Loader to pull request https://github.com/microsoft/data-formulator/pull/163

jodur avatar Jun 04 '25 17:06 jodur

It would be great to make this able to work with data in a Power BI Dataset. Generate DAX Queries to do the analysis.

stajones avatar Sep 08 '25 20:09 stajones

It would be great to have mongoDB supported

Markusdasi avatar Oct 02 '25 12:10 Markusdasi

Clickhouse please

aliahmadaziz avatar Oct 20 '25 05:10 aliahmadaziz

We are releasing a new version of Data Formulator soon! We'll add new data connectors once finishing the new release :)

Chenglong-MS avatar Oct 20 '25 16:10 Chenglong-MS

Please add support for Google BigQuery.

hurairahmateen avatar Nov 05 '25 05:11 hurairahmateen