POC for faker splitting
This is a kind of POC for the issue mentioned at https://github.com/datafaker-net/datafaker/issues/157.
The PR attempts to introduce faker specific interfaces like BaseFaker, MovieFaker, SportFaker, VideoGameFaker.
It allows to have for a specific faker only a limited set of providers. At the same time for user there is almost nothing changed(old api still works)
e.g.
BaseFaker baseFaker = new Faker();
SportFaker sportFaker = new Faker();
VideoGameFaker videogameFaker = new Faker();
MoviewFaker moviewFaker = new Faker();
Faker fullFaker = new Faker();
IDE will continue show hints for available methods however for sport faker it will show only sport and base methods, for movie it will show only movie and base related methods, for fullFaker it will show all and etc. In case there is an entity which is potentially could be in several groups then with interface it is not an issue.
@bodiam @jpeffer @eliasnogueira since you were involved in https://github.com/datafaker-net/datafaker/issues/157 WDYT?
Codecov Report
Merging #340 (6927fc5) into main (65a5152) will decrease coverage by
57.06%. The diff coverage is71.23%.
@@ Coverage Diff @@
## main #340 +/- ##
=============================================
- Coverage 94.93% 37.86% -57.07%
+ Complexity 1938 468 -1470
=============================================
Files 203 86 -117
Lines 3866 1875 -1991
Branches 383 238 -145
=============================================
- Hits 3670 710 -2960
- Misses 100 1150 +1050
+ Partials 96 15 -81
| Impacted Files | Coverage Δ | |
|---|---|---|
| faker/src/main/java/net/datafaker/Bool.java | 0.00% <ø> (ø) |
|
| faker/src/main/java/net/datafaker/DateAndTime.java | 85.29% <ø> (ø) |
|
| ...er/src/main/java/net/datafaker/FakeCollection.java | 0.00% <ø> (ø) |
|
| ...aker/src/main/java/net/datafaker/FakeDuration.java | 0.00% <ø> (ø) |
|
| faker/src/main/java/net/datafaker/Faker.java | 30.66% <0.00%> (ø) |
|
| faker/src/main/java/net/datafaker/Number.java | 100.00% <ø> (ø) |
|
| faker/src/main/java/net/datafaker/Options.java | 100.00% <ø> (ø) |
|
| faker/src/main/java/net/datafaker/Time.java | 87.50% <ø> (ø) |
|
| ...r/src/main/java/net/datafaker/fileformats/Csv.java | 0.00% <ø> (ø) |
|
| ...rc/main/java/net/datafaker/fileformats/Format.java | 0.00% <ø> (ø) |
|
| ... and 233 more |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
If I understand this correctly, this code only hides the methods? Ideally I think we'd like to split the code into multiple modules, would this allow that?
As a side note, how about getting rid of those instance methods? I know it will break the API, but I see no benefits of having them over just calling the constructor. I'm thinking of deprecating them in 1.7.0, and remove them in 1.8.0. Thoughts?
Yes, it hides methods. In case of real code splitting it could be done with classes instead of interfaces which also could be an option however it will lead to breaking of api.
About instance what if in 1.7.0 we mark it as deprecated and in 1.8.0 remove them?
@snuyanzin this is in between about what I have in mind. Thanks for sharing it.
This would be basically an architectural decision. If we go with modules your approach is excellent.
I also agree about the instance methods.
One note from my side: we still can get the faker and getFaker from the AbstractProvider during the usage of the sub-specialized/group class.
Sub-specialized class: MovieFaker
Hierarchy now: BaseFaker -> MovieFaker
Example:
MovieFaker movieFaker = new Faker();
movieFaker.avatar().getFaker();
movieFaker.avatar().faker;
The class Avatar has the AbstractProvider as a superclass, but MovieFaker should be the superclass, based on a modular project. The hierarchy would be BaseFaker -> MovieFaker -> Avatar
This is the only change I can see to group the classes.
thanks for the feedback
I've now made a real split into 4 modules (just because of POC) some providers just skipped for the same reason.
At the same time it is possible to see sportfaker, basefaker, moviefaker and videogamesfaker. Each of them is compilable and passing tests. Others could be splitted in a similar way.
Now
movieFaker.avatar().getFaker();
gives MovieFaker
The class Avatar has the AbstractProvider as a superclass, but MovieFaker should be the superclass, based on a modular project. The hierarchy would be BaseFaker -> MovieFaker -> Avatar This is the only change I can see to group the classes.
Didn't get your point here
BaseFaker/Faker -> MovieFaker agree. However Avatar is a provider and is not in Fakers hierarchy.
If I understand this correctly, this code only hides the methods? Ideally I think we'd like to split the code into multiple modules, would this allow that?
As a side note, how about getting rid of those
instancemethods? I know it will break the API, but I see no benefits of having them over just calling the constructor. I'm thinking of deprecating them in 1.7.0, and remove them in 1.8.0. Thoughts?
@bodiam This is largely what I attempted to do with PR #221. I never received any feedback on it in the PR or under https://github.com/datafaker-net/datafaker/issues/188. Still happy to contribute if there is interest in adopting the approach, even if other changes are needed.
I never received any feedback on it in the PR
@jpeffer thanks for contribution however your PR (#221 ) is in draft. Normally in draft PRs are considered as not ready for feedback and not reviewed unless it is not requested explicitly by author
If I understand this correctly, this code only hides the methods? Ideally I think we'd like to split the code into multiple modules, would this allow that? As a side note, how about getting rid of those
instancemethods? I know it will break the API, but I see no benefits of having them over just calling the constructor. I'm thinking of deprecating them in 1.7.0, and remove them in 1.8.0. Thoughts?@bodiam This is largely what I attempted to do with PR #221. I never received any feedback on it in the PR or under #188. Still happy to contribute if there is interest in adopting the approach, even if other changes are needed.
My apologies, I thought it was a draft PR, which to my understanding is a work in progress. I wasn't aware you were looking for feedback, nor do I see any direct questions, so sorry for the misunderstanding. Happy to continue the conversation though!
thanks for the feedback
I've now made a real split into 4 modules (just because of POC) some providers just skipped for the same reason. At the same time it is possible to see
sportfaker,basefaker,moviefakerandvideogamesfaker. Each of them is compilable and passing tests. Others could be splitted in a similar way.Now
movieFaker.avatar().getFaker();gives
MovieFakerThe class Avatar has the AbstractProvider as a superclass, but MovieFaker should be the superclass, based on a modular project. The hierarchy would be BaseFaker -> MovieFaker -> Avatar This is the only change I can see to group the classes.
Didn't get your point here
BaseFaker/Faker->MovieFakeragree. HoweverAvataris a provider and is not in Fakers hierarchy.
@snuyanzin I didn't express myself correctly, sorry... I mean that I'm in a favour of a concrete provider instead the abstract one, and you did it when you split into modules.
@bodiam did you like this approach using different modules? I believe we are getting closer to a concrete example, thanks to @snuyanzin.
I would say that the next step would be the correlation between a group vs generators.
I tried to read up as much as possible through the comments on the PR and related issues. I've seen several approaches, but I believe I didn't see this pass by yet (correct me if I'm wrong):
What if we took an approach that both allows for backwards compatibility with the current setup and at the same time allows for more radical changes in the API (and the architecture)?
I think it's important to make (keep) it easy to switch from java-faker to Datafaker as that's a more well-known project still. We could support that by "preserving" the current API in a datafaker-classic module. This could then depend on the new setup with several modules (a bit like the datafaker-all module in the image @bodiam shared in #157). It would primarily be a delegate to the modularized fakers.
The benefit, besides an easy migration path from java-faker, would be that a new API for the modular setup can be a fairly drastic change from the current setup. It would for example open the door to using a new interface (DataFaker?), which is implemented by all specialized fakers, which could contain default methods for the low-level methods that are currently in Faker, like regexify(), nummerify() etc.
And a side-note about packaging. Since this is a PoC I can imagine that wasn't taken into account specifically. Currently all modules share the same base package. That would cause split package issues when the library is used in a application that uses Java modules. So I would suggest to add an extra level to the packages (e.g. classes in datafaker-movie would then use net.datafaker.movie as their base package).
The thing we can do is make Faker as deprecated class and use it as faker for everything to ease migration from java-faker.
At the same time under the hood it will be a combination of splitted fakers however from user's point of view it will be the same faker as in previous versions and this could be done in a separate mvn module.
The light version could be called LightFaker.
All default methods like numerify, templatify and others are already inherited from BaseFaker/LightFaker so it should not be an issue. Package renaming makes sense once everyone is ok with the general approach.
@jaapcoomans @snuyanzin I feel like both of you are describing things I largely attempted to accomplish in PR https://github.com/datafaker-net/datafaker/pull/221. I was trying to lay down a foundation that would more easily enable segregation of faker domains, while also addressing some concerns from the original java-faker design.
I did deprecate the original Faker, but I do like the idea of even moving it into a separate dependency as a bridge.
I'm not a fan of deprecating the Faker class, I'm more a fan of making the Faker class delegate all the calls to the correct module, making Faker the uber project, and the rest of the modules can be split.
I think it's important to make (keep) it easy to switch from java-faker to Datafaker as that's a more well-known project still. We could support that by "preserving" the current API in a
datafaker-classicmodule. This could then depend on the new setup with several modules (a bit like thedatafaker-allmodule in the image @bodiam shared in #157). It would primarily be a delegate to the modularized fakers.
So this I support indeed. We can call it classic, uber, -all, I don't mind, but something which doesn't break the API (or at least, not too much) and makes it easy to transition from Javafaker to Datafaker. Datafaker 2 can break the API in a much more extreme way if we need to, but that's a decision for later I think.
I did deprecate the original Faker, but I do like the idea of even moving it into a separate dependency as a bridge.
This is already done in this POC for non-uber
We can call it classic, uber, -all, I don't mind, but something which doesn't break the API (or at least, not too much) and makes it easy to transition from Javafaker to Datafaker. Datafaker 2 can break the API in a much more extreme way if we need to, but that's a decision for later I think.
The API will not be broken, that's why I mentioned it for uber-jar like
At the same time under the hood it will be a combination of splitted fakers however from user's point of view it will be the same faker as in previous versions and this could be done in a separate mvn module.
The only concern about making of Faker deprecated is to make people using more specific fakers... For the solution it is not critical, so it's ok to not mark it deprecated as well
This is already done in this POC for non-uber
Yes, but it does not address some of the API concerns that are resolved in mine. I think having multiple PRs all attempting to roughly address similar areas of concern are going to be problematic for determining a path forward. It would be good to have feedback on my own PR from a few months ago to understand why you felt a similar, but ultimately different approach was necessary. That might help us figure out how to blend the pros we see in the different approaches.
I'd prefer the idea of a classic / legacy dependency over the idea of an uber / all one. Classic / Legacy provides a clear break and works as an adapter between the old API and the new. It does not need to have new faker capabilities added to it. New functionality can simply be adopted through the use of the new, segregated modules. An uber / all dependency would unnecessarily increase maintenance. I think it also becomes less usable as the class becomes overly polluted with too many methods. Blob / god class antipatterns are probably something best avoided.
Could you please clarify what concerns are not resolved in this POC and resolved in another from user's point of view?
Mine addressed the needs outlined in https://github.com/datafaker-net/datafaker/issues/188, resolved a defect in the FakeValuesService, and laid a foundation for splitting out the fakers into modules, while also retaining full backwards compatibility with the old API.
That's why I was wondering what the motivation was for not building upon PR #221?
For me it is not clear how #221 allows to solve task when there are several modules e.g. basefaker, moviefaker, videogamefaker and now we want to create all faker to get functionality of everyone in one class... May be I miss something however currently from my point of view it does not simplify the task for that... Or may be could you clarify this point?
I don't think Datafaker should solve every user case. #188 is something which can be solved very easily by creating a small wrapper object around Faker which stores the seed. That's a very simple solution for a such use case, and I think it would be better if solved on the consumer's end (ie the user of the faker library), then jn the library itself.
Datafaker doesn't need the functionality of exposing the seed, therefor I think this functionality is best kept outside the library. We can't facilitate every scenario out there.
@bodiam I understand your position that DataFaker cannot and should not try to cover every use case. In this particular instance though, the goal is to support repeatable data generation, not simply provide access to the seed. I think that goal gets somewhat lost if we focus on seed access, rather than the functional goal of repeatability. Data generation is needed in the context of testing almost exclusively and repeatable data generation is a common need in testing, which is why you see this as core functionality in similar data generating libraries out there.
Ultimately, I thought contributions were welcome, so I attempted to address a few gaps in the original libraries design, that would help to support modularity and repeatable data generation. Being able to reuse generators, within generators, while supporting repeatability is another important goal that promotes reusability within the library. If you have other ideas on how to achieve these goals, beyond placing the burden on the consumer, that would satisfy my goal. That said, the contribution is there if you choose to use it, which meets this need. I believe they are valuable additions, but I respect whatever decision is made to move forward.
We absolutely appreciate all contributions, but my limited time doesn't allow me to dive into every discussion and PR unfortunately, there's a limit on how much time I can spend on this.
Regarding the repeatability, in my understanding, proving a seed to the faker will give you perfect repeated data generation, or at least, it should. Where are we lacking in that, cause of course I'm happy to address that.
Thanks everyone for the feedback Really appreciate it I made several steps into direction of splitting trying to make them more or less atomic and with much smaller change than in this PR. Currently it is splitted within the same maven artifact more examples about that could be found at https://github.com/datafaker-net/datafaker/issues/157#issuecomment-1256914943 Moreover now similar extraction of other providers groups could be done in simpler way (just look how it was done e.g. for Foor or Sport providers)
So I close this issue since now it is already a bit outdated