DataStorehouse Python code to Generate Report on Validation and Credibility of Datasets

Description about Issue

As users downloads dataset for their project, We try to give more understanding and clear overview about the datasets they are downloading in a Report format thus the user be feed with idea about how to use dataset for their own project in effective way.

Expected Behavior

we expect,

More Statistical Analysis about Datasets
How it's values are present and their Distributions over plot
Check for corruption and Mismatch of data
Suggestions to which kind of project the dataset will suit
Suggestions on preprocessing of datasets for effective usage in project.

Expect to generate report with respect to it's format like CSV,JSON,txt etc...

Current Behavior

In Validation folder in Main.py we implement some of the previously mentioned, you can also view Report.txt for sample report we generated.

Contributions

You can Implement features one by one and then make a pull request to us. Expect your Valuable Contributions and collaborations

Oct 02 '23 06:10 Gladwin001

Ok from your issue description I understood that you want

the code line which would give statistical information to the user regarding all the features of the dataset like mean, count, etc.
Visualization of the features based on the target variable on a plot.
Any kinds of missing values, format issues basically feature engineering to improve the dataset.
On the basis of the features and the target, judging the projects for which the dataset would be useful.

So if I get your intentions right, can you please assign this issue to me:)

Oct 02 '23 06:10 Ayushlion8

Thank you for your Volunteer @Ayushlion8 , You can try out with any single features at start

Oct 02 '23 07:10 Gladwin001

Ok @Gladwin001 you mean to say I have to do all sorts of feature engineering and data preprocessing on one independent feature

So from a dataset I'll choose one feature and write LOC for that and then add that file into one folder or directly create a PR for that..

Oct 02 '23 08:10 Ayushlion8

Thank you for your Volunteer @Ayushlion8 , You can try out with any single features at start

I would suggest breaking this issue into small issues so it can be handled by 2 or 3 contributors.

I also interested in contributing to this issue.

Oct 02 '23 09:10 VigneshRamanathan101

@VigneshRamanathan101 and @Ayushlion8 you can break this issue into smaller issues and proceed

Oct 02 '23 09:10 neokd

@Ayushlion8 @VigneshRamanathan101 started on this before the issue was originally created. Feel free to work off what I've already done: https://github.com/neokd/DataStorehouse/pull/105

Oct 02 '23 11:10 Bchass

@Ayushlion8 @VigneshRamanathan101 any updates on the issue?

Oct 04 '23 10:10 neokd

@neokd modifications are going on, will update you soon with the PR. Thanks for your patience :)

Oct 06 '23 10:10 Ayushlion8

@Ayushlion8 Yeah sure

Oct 06 '23 12:10 neokd