factory_boy icon indicating copy to clipboard operation
factory_boy copied to clipboard

Feat/add unique feature arg in faker factory

Open arthurHamon2 opened this issue 5 years ago โ€ข 10 comments

Since Faker 4.9.0 version, they have added support to generate unique values. This is really helpful when dealing with unique constraint on a field generated by factory boy. The test can be flaky if you use faker without the "unique" feature on an ORM field with an unique constraint.

The usage with factory boy is simple as this:

class UserFactory(fatory.Factory):
            class Meta:
                model = User

            arrival = factory.Faker(
                'date_between_dates',
                date_start=datetime.date(2020, 1, 1),
                date_end=datetime.date(2020, 5, 31),
                unique=True  # The generated date is guaranteed to be unique inside the test execution.
            )

The unique keyword can be passed on every faker providers. If true the faker object passes through the Faker Unique Proxy, making sure the generated value has not been already generated before. Not that the default unique keyword value is false.

It's my first PR on this repository, be kind :) Also I have been using factory boy for almost 5 years and it saves me so much boilerplate in my tests, thanks to every creators, maintainers of this wonderful project ๐ŸŽ‰

arthurHamon2 avatar Nov 18 '20 13:11 arthurHamon2

I concur with @francoisfreitag's review: the idea is amazing! However, its relation with factory_boy's factory classes should be clarified.

Basically, as a user:

  • If I specifiy factory.Faker('foo', unique=True), I expect values to be unique within this factory instance
  • If I add a factory.Faker('username', unique=True) to UserFactory, then subclass it as class AdminFactory(UserFactory), is the uniqueness kept across both UserFactory and AdminFactory?

As a side note: for this reason, I find it easier to discuss new features as an issue first, and to focus on the technical implementation once the overall design is clear ๐Ÿ˜‰

rbarrois avatar Nov 23 '20 13:11 rbarrois

Thanks for your reviews. Indeed I did not notice that factory boy is using a global Faker object across all factories. I've addressed all your points @francoisfreitag but the clear feature. As the unique store is related to a factory instance, we want to be able to clear the store on the factory level, right ? @rbarrois You're right, I though it was a minor change and did not fully understand how it impacted factory boy. To make things clearer I would say that :

  • A global Faker object is created for "non" unique use cases
  • A local faker object is created by local type at the factory (instance) level (the uniqueness is then guaranteed across all objects created by the factory)
  • On inheritance, if the child factory does not override the unique faker declaration, the child factory shares the same unique store as the parent declaration. If the declaration is redefined, then it won't use the same unique store.

If you are okay with these assumptions @rbarrois I write them in the documentation.

arthurHamon2 avatar Nov 23 '20 13:11 arthurHamon2

@francoisfreitag I just added a commit that clear the unique store : https://github.com/FactoryBoy/factory_boy/pull/820/commits/0aaefde320d1cdec63ebf34dd9cc2247e2643688 Is it what you had in mind when you talked about Can a hook be added to reset the unique store (fake.unique.clear())? Similar in spirit to reset_sequence. ?

arthurHamon2 avatar Nov 23 '20 17:11 arthurHamon2

@arthurHamon2 Yep, that's a great idea:

  • A global Faker;
  • A per-factory Faker (with unique=True) for factories;
  • For inheritance, I think you could rely on factory._meta._get_counter_reference() (whose name could be changed) to decide whether to reuse a parent factory's unique faker or not;

I've got two points to dig into:

  • Synchronising the random seeds across all Faker setups;
  • Managing the initialisation of Faker instances โ€” setup time, sharing providers.

By the way, this is no small task you've taken up, leading deep into factory_boy's internals, congrats! Let us know where you need help with the code โ€” if you're interested in an in-depth introduction to the library's internals, I could even arrange that over visio in the next few days.

rbarrois avatar Nov 24 '20 09:11 rbarrois

@rbarrois an in-depth introduction over visio would be much appreciated, any time you want, let me know. I guess we can discuss about my last commit, trying to setup a per-factory unique faker store. I'm also working on an old abandoned feature : https://github.com/FactoryBoy/factory_boy/pull/822 We could chat about this too :)

arthurHamon2 avatar Nov 25 '20 14:11 arthurHamon2

@arthurHamon2 Awesome :) It might be easier getting in touch over email, my email address is in the project's setup.cfg ๐Ÿ˜›

rbarrois avatar Dec 03 '20 12:12 rbarrois

I don't know about you, but sounds like a lot (?) of extra complexity: per-factory fakers, syncing random generators (reproducible randomness), sharing providers, documenting it. And on the part of the users as well. They might not consider whether uniqueness gets inherited or not, if handled by factory-boy. They might not even need it. The general solution is apparently much more complex.

Maybe we should rather give user access to the global fakers? A way to iterate over them (to reset the seen values)? Maybe a way to override the faker call (for subclassing)? Or a way to obtain the faker that would be otherwise used (for sequences, if possible, to not bother with the locale)? The way I handle it now.

x-yuri avatar Feb 10 '21 14:02 x-yuri

I needed factoryboy to generate unique values and found this PR, any chance of making progress here?

Reading back the discussion: How important is it it have uniqueness limited to particular factories and/or attributes? If unique values are generated to be globally unique, they will also be unique within a particlar factory and/or attribute.

So then the question is: How important is it that duplicate values can be generated when using the same faker for different factories/attributes? IMHO that is really not essential: If it is important for a test that values are duplicated, then the test should be written to ensure the duplication, not rely on randomly generated duplication?

Or are there compelling performance arguments to want to limit the scope of uniqueness?

matthijskooijman avatar Oct 15 '21 13:10 matthijskooijman

I needed factoryboy to generate unique values and found this PR, any chance of making progress here?

I'm usign the following solution:

import factory


class UniqueFaker(factory.Faker):

    @classmethod
    def _get_faker(cls, locale=None):
        return super()._get_faker(locale=locale).unique

usage

class UserFactory(factory.Factory):
    class Meta:
        model = User

    name = UniqueFaker('name')

kwist-sgr avatar Jan 04 '22 13:01 kwist-sgr

Alternative in-class solution (in case the _get_faker API changes)

active = factory.LazyFunction(lambda: fake.unique.company())

Adding my 2ยข on the per-factory uniqueness option: it really doesn't seem necessary, as these are the possible cases:

  1. Need all values unique, across all factories
  2. Need all values unique within each factory, don't care about cross-factory uniqueness
  3. Don't care about uniqueness
  4. Need duplicate values between factories
  5. Need duplicate values within factories

The only out of the box solution currently supported is case 3. Adding a simple kwarg factory.Faker(..., unique=True) (as I believe this PR covers) immediately works for the first three cases. Additionally, it's easy.

Cases 4 and 5 are not currently supported - and adding per-factory uniqueness would not cover them. If testing with duplicate values is required, that's important to make explicit. Enforcing duplication is much more complex and variable than enforcing uniqueness, so it's best left to per-implementation LazyFunction or factory-calling functions.

Per-factory uniqueness only enables the case of "unique within factories, but occasionally and not repeatably allow duplicates across factories" which doesn't seem very useful to me (though I would welcome counterexamples).

So, what would be required at this point to get this PR ready to go?

@arthurHamon2 are you still around and interested in this?

tgross35 avatar Jun 13 '22 07:06 tgross35