Add a support for automated factory generation from a descriptive model
This issue will be used as the discussion basis for the automated factory generation feature.
The problem
Many libraries (ORMs, API schema languages, dataclasses.dataclass) provide a way to describe the fields of a class and their types.
In those cases, it is cumbersome to have to add all the declarations manually; it would be great if factory_boy could provide a set of default declarations from an introspection of the class:
The typical example would be:
@dataclasses.dataclass
class User:
id: int
username: str
fullname: str
is_admin: bool
class UserFactory(factory.DataclassFactory):
class Meta:
model = User
auto_declarations = True
>>> UserFactory()
User(id=42, username="john.doe", fullname="Jane Smith", is_admin=False)
Existing work
- A branch has been started 5 years ago: https://github.com/FactoryBoy/factory_boy/commit/4046c55710d5d7073018dcc76aa3e8e5a7f803eb
- A pull request has been restarted recently: #820
- A simple hack was written in #330
- A discussion on that topic occurred in #347
- Usage with marshmallow is covered in #277
Design constraints
Developer experience
- The provided API must be explicit: reading the code, one must know that some declarations have been automatically generated:
- Any explicit declaration on the factory must have precedence on the automated generation;
- It must be possible to restrict fields covered by the automated — either include only a subset, or exclude some fields — even if no explicit declaration exists (for instance to reuse a model-side default);
- It must be possible to use this feature with
make_factory; - Ideally, a bridge could be name with
factory.Fakerto use the field name as a hint (e.g callingfactory.Faker("user_name")for a field calledusername).
Library integration
- It should be easy to connect this feature with third party libraries in a project's code — either through an abstract
Factorysubclass, or through a customFactoryOptions; - Introspection can be added on top of an existing abstract factory bridge to a third party library (i.e project A provides
DjangoModelFactory; project B should be able to leverage it intoAutoDjangoModelFactory)
Open questions
- Should we integrate it directly into
DjangoModelFactory/SQLAlchemyModelFactory, or provide as extra classes? - How should "foreign keys" be handled? Should their factories be autogenerated, or do we require an explicit declaration there?
- How would a developer enrich the field name / faker generator mapping with their own custom providers and field naming conventions?
Oh, this feature would be awesome!
If I can add my two cents regarding foreign keys: my team and I are used to manually creating generators for our models (which we want to stop doing XD). What has worked well for us is to only generate the required fields (that applies to FKs too). That means that every optional/nullable field will be set to None by default, while required foreign objects will be created (until there's no more foreign models). And all that is possible to override in the call site. So, for example, if a foreign object is part of a test of mine and I want to use that in a factory, I can simply pass it in as an argument.
About the integration with the ORM factories, I would make it integrated, possibly with a Meta attribute to disable that.
Edit: the library below does something like this. It could be looked at as inspiration, an example, or previous experience in implementing this.
https://github.com/klen/mixer
Hi, I would like to share a very alpha and simple PoC to generate a factory for dataclasses.
https://gist.github.com/mgaitan/dcbe08bf44a5af696f2af752624ac11b
it respects defaults, support builtin types, basic relationships, list/tuples/set, enums and email as a particular case based on the attribute name.
I'm brand new to Factory Boy, and I just want to share that as a Django user with dozens of models and hundreds of fields, I'm quite surprised that Factory Boy doesn't have a way of inspecting a model/dataclass/etc and generate reasonable fakers for each field.
I've been reading docs for an hour or two, and it's only after I began working with the code I'm realizing I'm going to have to define Faker attributes for dozens of fields. I am pretty sad. Definitely not the turnkey kind of thing I was hoping for. Ah well.
Onwards with my explorations, but kinda bummed.
Don't have too much experience with it myself yet, but at first glance the Pydantic ecosystem seems to be good for this kind of stuff. Might need a little up front investment. Check out https://github.com/Goldziher/pydantic-factories