Problem adding noise to my dataset example
- datafuzz version: 0.1.0a
- Python version: 3.5
- Operating System: Linux Ubuntu
Description
Describe what you were trying to get done. I was trying to add some noise to a dataset. Tell us what happened, what went wrong, and what you expected to happen. It didn't work and I got a bug, talking about Pandas dataframe objects
What I Did
from datafuzz.generators import DatasetGenerator
from datafuzz import DataSet, NoiseMaker, Duplicator
generator = DatasetGenerator({
'output': 'pandas',
'schema': {
'market': ['lemon','orange','pineapple','banana','kiwi','papaya','passion fruit','guava'],
'Channel': ['Organic/PPC Brand','PPC_Generic/email','Comparison_site','Comparison_site_preapproved', '3rd_Party', 'Rabbit'],
'name': 'faker.name',
'created date': range(2005, 2018),
'city': 'faker.city',
'Requested_Amount': range(1000, 25000, 1000)
},
'num_rows': 1000,
})
generator.generate()
dataset = generator.to_output()
noiser = NoiseMaker(
dataset,
noise=['add_nulls', 'random'],
columns=['market', 'Channel', 'Requested_Amount'],
percentage=30,
)
noiser.run_strategy()
print(dataset)
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/datafuzz/noise.py", line 58, in __init__
self.columns = self.get_numeric_columns(self.columns)
File "/usr/local/lib/python3.5/dist-packages/datafuzz/strategy.py", line 71, in get_numeric_columns
if self.dataset.data_type == 'pandas' and any([isinstance(c, str)
File "/usr/local/lib/python3.5/dist-packages/pandas/core/generic.py", line 3614, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'data_type'
Hi @springcoil ,
So excited you are working with datafuzz, sorry that the documentation is not yet complete, as I think this problem is easily solved and should be better documented somewhere!
Can you try making it a DataSet object, before you pass it to NoiseMaker?
So you can either change the output from the Generator to 'dataset' or run the following:
dataset_input = DataSet(dataset, output='pandas')
If you need it in pandas form after the noise process, you can run
dataset_input.to_output()
Let me know how it goes, and thanks again for giving it a try!
-kj