AnonymizedFaker fails when using custom Faker provider
Environment details
If you are already running RDT, please indicate the following details about the environment in which you are running it:
- RDT version: 1.11.0
- Python version: 3.11.9
- Operating System: Any
Problem description
Passing a custom provider to a transformer results in:
TransformerProcessingError: The 'my_providers.dummy' module does not contain a function named 'dummy'.
Refer to the Faker docs to find the correct function: https://faker.readthedocs.io/en/master/providers.html
What I already tried
I've created a dummy Faker provider using the example here: https://github.com/sdv-dev/SDV/issues/308#issuecomment-773290983
And have tried swapping transformers as in: https://github.com/sdv-dev/SDV/issues/1372
Placing this dummy provider directly in the Faker source folder faker/faker/providers/dummy works perfectly.
Sample code
import pandas as pd
from faker import Faker
from faker.config import PROVIDERS
from my_providers.dummy import Provider
fake = Faker()
fake.add_provider(Provider)
PROVIDERS.append("my_providers.dummy")
fake.get_providers()
<faker.providers.DynamicProvider at 0x12c216190>,
<faker.providers.user_agent.Provider at 0x10f50e810>,
<faker.providers.ssn.en_US.Provider at 0x10b1ee650>,
<faker.providers.sbn.Provider at 0x104083ad0>,
<faker.providers.python.Provider at 0x12c1ba050>,
<faker.providers.profile.Provider at 0x10365d590>,
<faker.providers.phone_number.en_US.Provider at 0x109ae4310>,
<faker.providers.person.en_US.Provider at 0x11c65e390>,
<faker.providers.passport.en_US.Provider at 0x109ae4d50>,
<faker.providers.misc.en_US.Provider at 0x12c19bfd0>,
<faker.providers.lorem.en_US.Provider at 0x10f481710>,
<faker.providers.job.en_US.Provider at 0x10f481990>,
<faker.providers.isbn.Provider at 0x12c19a0d0>,
<faker.providers.internet.en_US.Provider at 0x1099e1190>,
<faker.providers.geo.en_US.Provider at 0x1046a7e50>,
<faker.providers.file.Provider at 0x1046ad610>,
<faker.providers.emoji.Provider at 0x12c161350>,
<faker.providers.dummy_m.Provider at 0x10aaefc10>,
<faker.providers.date_time.en_US.Provider at 0x12c1607d0>,
<faker.providers.currency.en_US.Provider at 0x12c160810>,
<faker.providers.credit_card.en_US.Provider at 0x12c160e90>,
<faker.providers.company.en_US.Provider at 0x12c183d10>,
<faker.providers.color.en_US.Provider at 0x12c1608d0>,
<faker.providers.barcode.en_US.Provider at 0x12c161250>,
<faker.providers.bank.en_GB.Provider at 0x12c161650>,
<faker.providers.automotive.en_US.Provider at 0x12c161690>,
<faker.providers.address.en_US.Provider at 0x104667990>]
fake.dummy()
'bar'
from sdv.metadata import SingleTableMetadata
from sdv.single_table import GaussianCopulaSynthesizer
# making fake list of words
data = []
for _ in range(5):
data.append(fake.word())
df = pd.DataFrame(data=data)
df = df.rename(columns={0: "words"}).reset_index(drop=True)
# get metadata from df
metadata = SingleTableMetadata()
metadata.detect_from_dataframe(data=df)
metadata.update_column(column_name="words", sdtype="text")
{
"METADATA_SPEC_VERSION": "SINGLE_TABLE_V1",
"columns": {
"words": {
"sdtype": "text"
}
}
}
synthesizer = GaussianCopulaSynthesizer(metadata)
from rdt.transformers.pii import AnonymizedFaker
synthesizer.auto_assign_transformers(df)
synthesizer.update_transformers(
column_name_to_transformer={
"words": AnonymizedFaker(
provider_name="my_providers.dummy", function_name="dummy"
)
}
)
AttributeError: module 'faker.providers' has no attribute 'my_providers'
What works
Adding my custom provider to Faker's attribute namespace fixes the problem. The issue seems to stem from thecheck_provider_function check added in this commit: https://github.com/sdv-dev/RDT/commit/5e577fb39a328c70e3fc5fe7960e0d3511a20ab4#diff-c21909dc41931197bebb5afac4f76cd4c014fd9063d3d205ced9c5b2f4612ca6R55
faker.providers.my_providers = my_providers
attrgetter("my_providers")(faker.providers)
synthesizer.get_transformers()
{'words': AnonymizedFaker(provider_name='my_providers.dummy', function_name='dummy')}
Am I doing something silly?