dffml icon indicating copy to clipboard operation
dffml copied to clipboard

source: datasets: Add common time-series datasets

Open programmer290399 opened this issue 3 years ago • 5 comments

Pain Point

Currently, we do not have any commonly used time-series datasets available in dffml

Proposed Solution

Write a dataset source (like we have iris dataset) to add the following basic datasets:

Univariate Datasets

Multivariate Datasets

programmer290399 avatar Mar 10 '22 00:03 programmer290399

Census data? https://registry.opendata.aws/

johnandersen777 avatar Apr 03 '22 23:04 johnandersen777

Hey, @pdxjohnny @programmer290399 I'd like to work on this issue. Can we use any dataset, or are there any specific requirements (aside from the ones you have written here)? Also, could you tell me what exactly we would be doing with these datasets?

TirelessClock avatar Apr 05 '22 15:04 TirelessClock

Hey @TirelessClock !!

This issue is a part of a GSoC project for this year, so I am not sure if this is up for grabs or how it would work, you may solve this issue partially, but before proceeding please clear this out with @pdxjohnny.

As far as the question about "which datasets are to be used" is concerned, we definitely want the ones we have listed above, I have linked to their respective sources, also take a look at how datasets work in DFFML, see the link to iris dataset above.

But we are open to any other datasets which are commonly used for benchmarking and research purposes. Before implementing one please make sure you're on the same page with the community members so that you don't end up doing work that we'd not be able to merge into the main branch.

I hope this makes it clear, for any further clarification and queries, join our gitter channel.

programmer290399 avatar Apr 06 '22 02:04 programmer290399

Hello @programmer290399 , Thanks for the reply! Yes, I completely understand that it's up for GSoC and it might not necessarily be up for grabs. I would still like to work on the project to solve at least partially, and at present am working my way through the iris dataset. I could use all the help you could give!

TirelessClock avatar Apr 06 '22 04:04 TirelessClock

Ozone Level Detection Data Set https://archive.ics.uci.edu/ml/datasets/Ozone+Level+Detection

mukund2201 avatar Apr 13 '22 19:04 mukund2201