DataStorehouse icon indicating copy to clipboard operation
DataStorehouse copied to clipboard

Python code to Generate Report on Validation and Credibility of Datasets

Open Gladwin001 opened this issue 2 years ago • 9 comments

Description about Issue

As users downloads dataset for their project, We try to give more understanding and clear overview about the datasets they are downloading in a Report format thus the user be feed with idea about how to use dataset for their own project in effective way.

Expected Behavior

we expect,

  1. More Statistical Analysis about Datasets
  2. How it's values are present and their Distributions over plot
  3. Check for corruption and Mismatch of data
  4. Suggestions to which kind of project the dataset will suit
  5. Suggestions on preprocessing of datasets for effective usage in project.

Expect to generate report with respect to it's format like CSV,JSON,txt etc...

Current Behavior

In Validation folder in Main.py we implement some of the previously mentioned, you can also view Report.txt for sample report we generated.

Contributions

You can Implement features one by one and then make a pull request to us. Expect your Valuable Contributions and collaborations

Gladwin001 avatar Oct 02 '23 06:10 Gladwin001

Ok from your issue description I understood that you want

  1. the code line which would give statistical information to the user regarding all the features of the dataset like mean, count, etc.

  2. Visualization of the features based on the target variable on a plot.

  3. Any kinds of missing values, format issues basically feature engineering to improve the dataset.

  4. On the basis of the features and the target, judging the projects for which the dataset would be useful.

So if I get your intentions right, can you please assign this issue to me:)

Ayushlion8 avatar Oct 02 '23 06:10 Ayushlion8

Thank you for your Volunteer @Ayushlion8 , You can try out with any single features at start

Gladwin001 avatar Oct 02 '23 07:10 Gladwin001

Ok @Gladwin001 you mean to say I have to do all sorts of feature engineering and data preprocessing on one independent feature

So from a dataset I'll choose one feature and write LOC for that and then add that file into one folder or directly create a PR for that..

Ayushlion8 avatar Oct 02 '23 08:10 Ayushlion8

Thank you for your Volunteer @Ayushlion8 , You can try out with any single features at start

I would suggest breaking this issue into small issues so it can be handled by 2 or 3 contributors.

I also interested in contributing to this issue.

VigneshRamanathan101 avatar Oct 02 '23 09:10 VigneshRamanathan101

@VigneshRamanathan101 and @Ayushlion8 you can break this issue into smaller issues and proceed

neokd avatar Oct 02 '23 09:10 neokd

@Ayushlion8 @VigneshRamanathan101 started on this before the issue was originally created. Feel free to work off what I've already done: https://github.com/neokd/DataStorehouse/pull/105

Bchass avatar Oct 02 '23 11:10 Bchass

@Ayushlion8 @VigneshRamanathan101 any updates on the issue?

neokd avatar Oct 04 '23 10:10 neokd

@neokd modifications are going on, will update you soon with the PR. Thanks for your patience :)

Ayushlion8 avatar Oct 06 '23 10:10 Ayushlion8

@Ayushlion8 Yeah sure

neokd avatar Oct 06 '23 12:10 neokd