dataframe icon indicating copy to clipboard operation
dataframe copied to clipboard

Prepare a survey (or GitHub Discussion) about data sources

Open zaleslaw opened this issue 2 years ago • 12 comments

The draft list of data sources:

  1. SQL Databases based on JDBC
  2. XML
  3. Protobuf
  4. Parquet
  5. ORC
  6. SparkSQL
  7. different files on the FileSystem
  8. NoSQL databases (MongoDB, Cassandra, Ignite)
  9. Queues (Kafka)
  10. Amazon (S3)
  11. Arrow IPC (Feather v2)
  12. Apache Avro

zaleslaw avatar Jun 19 '23 08:06 zaleslaw

Probably move this to a discussion so people can upvote and leave others :)

Jolanrensen avatar Jun 19 '23 12:06 Jolanrensen

@Jolanrensen sorry, I want to have a Google Form. Add there some different questions. It's better for analysis.

zaleslaw avatar Jun 19 '23 12:06 zaleslaw

sure, but it might also be nice for the community to see which types of databases other people are interested in

Jolanrensen avatar Jun 19 '23 14:06 Jolanrensen

Nice to prepare the notebooks with the results:)

zaleslaw avatar Jun 19 '23 15:06 zaleslaw

@Jolanrensen will you share something?

zaleslaw avatar Jun 21 '23 11:06 zaleslaw

Maybe we should add Exposed to the list as a data source. It was suggested here first and seems to cover several DB types

Jolanrensen avatar Jun 22 '23 11:06 Jolanrensen

Also, for people wanting to do heavy operations with lots of large columns, we might want to provide interop with Multik as well

Jolanrensen avatar Jun 22 '23 11:06 Jolanrensen

Maybe something like Google Sheets

koperagen avatar Jun 27 '23 13:06 koperagen

Maybe something like Google Sheets

Like integration with their API? Could be easy, since we already have Excel support.

Jolanrensen avatar Jun 27 '23 14:06 Jolanrensen

Maybe something like Google Sheets

Like integration with their API? Could be easy, since we already have Excel support.

Yes, i think it might be a good step for building data processing pipelines. For example, read some data, transform with dataframe, write to a Google sheet. Or have a Google Sheet edited by a human and run dataframe processing on it when needed. Since we have Excel support, if this integrations proves to bring too little value, we can also consider to only have a tutorial. I mostly want to add it not because it's impossible to do now, but to bring attention to possible applications of our library

koperagen avatar Jun 27 '23 14:06 koperagen

XML would also probably need OpenAPI support, similar to JSON

Jolanrensen avatar Jun 27 '23 17:06 Jolanrensen

I would also add yaml in the list

belovrv avatar Jul 01 '23 17:07 belovrv