RDT icon indicating copy to clipboard operation
RDT copied to clipboard

AnonymizedFaker fails when using custom Faker provider

Open rpc5102 opened this issue 1 year ago • 0 comments

Environment details

If you are already running RDT, please indicate the following details about the environment in which you are running it:

  • RDT version: 1.11.0
  • Python version: 3.11.9
  • Operating System: Any

Problem description

Passing a custom provider to a transformer results in:

TransformerProcessingError: The 'my_providers.dummy' module does not contain a function named 'dummy'.
Refer to the Faker docs to find the correct function: https://faker.readthedocs.io/en/master/providers.html

What I already tried

I've created a dummy Faker provider using the example here: https://github.com/sdv-dev/SDV/issues/308#issuecomment-773290983

And have tried swapping transformers as in: https://github.com/sdv-dev/SDV/issues/1372

Placing this dummy provider directly in the Faker source folder faker/faker/providers/dummy works perfectly.

Sample code

import pandas as pd

from faker import Faker
from faker.config import PROVIDERS
from my_providers.dummy import Provider

fake = Faker()

fake.add_provider(Provider)
PROVIDERS.append("my_providers.dummy")

fake.get_providers()
[<my_providers.dummy.Provider at 0x12c216010>,
<faker.providers.DynamicProvider at 0x12c216190>,
<faker.providers.user_agent.Provider at 0x10f50e810>,
<faker.providers.ssn.en_US.Provider at 0x10b1ee650>,
<faker.providers.sbn.Provider at 0x104083ad0>,
<faker.providers.python.Provider at 0x12c1ba050>,
<faker.providers.profile.Provider at 0x10365d590>,
<faker.providers.phone_number.en_US.Provider at 0x109ae4310>,
<faker.providers.person.en_US.Provider at 0x11c65e390>,
<faker.providers.passport.en_US.Provider at 0x109ae4d50>,
<faker.providers.misc.en_US.Provider at 0x12c19bfd0>,
<faker.providers.lorem.en_US.Provider at 0x10f481710>,
<faker.providers.job.en_US.Provider at 0x10f481990>,
<faker.providers.isbn.Provider at 0x12c19a0d0>,
<faker.providers.internet.en_US.Provider at 0x1099e1190>,
<faker.providers.geo.en_US.Provider at 0x1046a7e50>,
<faker.providers.file.Provider at 0x1046ad610>,
<faker.providers.emoji.Provider at 0x12c161350>,
<faker.providers.dummy_m.Provider at 0x10aaefc10>,
<faker.providers.date_time.en_US.Provider at 0x12c1607d0>,
<faker.providers.currency.en_US.Provider at 0x12c160810>,
<faker.providers.credit_card.en_US.Provider at 0x12c160e90>,
<faker.providers.company.en_US.Provider at 0x12c183d10>,
<faker.providers.color.en_US.Provider at 0x12c1608d0>,
<faker.providers.barcode.en_US.Provider at 0x12c161250>,
<faker.providers.bank.en_GB.Provider at 0x12c161650>,
<faker.providers.automotive.en_US.Provider at 0x12c161690>,
<faker.providers.address.en_US.Provider at 0x104667990>]
fake.dummy()

'bar'

from sdv.metadata import SingleTableMetadata
from sdv.single_table import GaussianCopulaSynthesizer

# making fake list of words
data = []

for _ in range(5):
    data.append(fake.word())

df = pd.DataFrame(data=data)
df = df.rename(columns={0: "words"}).reset_index(drop=True)

# get metadata from df
metadata = SingleTableMetadata()
metadata.detect_from_dataframe(data=df)
metadata.update_column(column_name="words", sdtype="text")
{
    "METADATA_SPEC_VERSION": "SINGLE_TABLE_V1",
    "columns": {
        "words": {
            "sdtype": "text"
        }
    }
}
synthesizer = GaussianCopulaSynthesizer(metadata)

from rdt.transformers.pii import AnonymizedFaker

synthesizer.auto_assign_transformers(df)

synthesizer.update_transformers(
    column_name_to_transformer={
        "words": AnonymizedFaker(
            provider_name="my_providers.dummy", function_name="dummy"
        )
    }
)

AttributeError: module 'faker.providers' has no attribute 'my_providers'

What works

Adding my custom provider to Faker's attribute namespace fixes the problem. The issue seems to stem from thecheck_provider_function check added in this commit: https://github.com/sdv-dev/RDT/commit/5e577fb39a328c70e3fc5fe7960e0d3511a20ab4#diff-c21909dc41931197bebb5afac4f76cd4c014fd9063d3d205ced9c5b2f4612ca6R55

faker.providers.my_providers = my_providers
attrgetter("my_providers")(faker.providers)

synthesizer.get_transformers()

{'words': AnonymizedFaker(provider_name='my_providers.dummy', function_name='dummy')}

Am I doing something silly?

rpc5102 avatar Apr 11 '24 14:04 rpc5102