datafaker icon indicating copy to clipboard operation
datafaker copied to clipboard

POC for faker splitting

Open snuyanzin opened this issue 3 years ago • 21 comments

This is a kind of POC for the issue mentioned at https://github.com/datafaker-net/datafaker/issues/157. The PR attempts to introduce faker specific interfaces like BaseFaker, MovieFaker, SportFaker, VideoGameFaker. It allows to have for a specific faker only a limited set of providers. At the same time for user there is almost nothing changed(old api still works) e.g.

BaseFaker baseFaker = new Faker();
SportFaker sportFaker = new Faker();
VideoGameFaker videogameFaker = new Faker();
MoviewFaker moviewFaker = new Faker(); 
Faker fullFaker = new Faker();

IDE will continue show hints for available methods however for sport faker it will show only sport and base methods, for movie it will show only movie and base related methods, for fullFaker it will show all and etc. In case there is an entity which is potentially could be in several groups then with interface it is not an issue.

@bodiam @jpeffer @eliasnogueira since you were involved in https://github.com/datafaker-net/datafaker/issues/157 WDYT?

snuyanzin avatar Sep 11 '22 09:09 snuyanzin

Codecov Report

Merging #340 (6927fc5) into main (65a5152) will decrease coverage by 57.06%. The diff coverage is 71.23%.

@@              Coverage Diff              @@
##               main     #340       +/-   ##
=============================================
- Coverage     94.93%   37.86%   -57.07%     
+ Complexity     1938      468     -1470     
=============================================
  Files           203       86      -117     
  Lines          3866     1875     -1991     
  Branches        383      238      -145     
=============================================
- Hits           3670      710     -2960     
- Misses          100     1150     +1050     
+ Partials         96       15       -81     
Impacted Files Coverage Δ
faker/src/main/java/net/datafaker/Bool.java 0.00% <ø> (ø)
faker/src/main/java/net/datafaker/DateAndTime.java 85.29% <ø> (ø)
...er/src/main/java/net/datafaker/FakeCollection.java 0.00% <ø> (ø)
...aker/src/main/java/net/datafaker/FakeDuration.java 0.00% <ø> (ø)
faker/src/main/java/net/datafaker/Faker.java 30.66% <0.00%> (ø)
faker/src/main/java/net/datafaker/Number.java 100.00% <ø> (ø)
faker/src/main/java/net/datafaker/Options.java 100.00% <ø> (ø)
faker/src/main/java/net/datafaker/Time.java 87.50% <ø> (ø)
...r/src/main/java/net/datafaker/fileformats/Csv.java 0.00% <ø> (ø)
...rc/main/java/net/datafaker/fileformats/Format.java 0.00% <ø> (ø)
... and 233 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov-commenter avatar Sep 11 '22 09:09 codecov-commenter

If I understand this correctly, this code only hides the methods? Ideally I think we'd like to split the code into multiple modules, would this allow that?

As a side note, how about getting rid of those instance methods? I know it will break the API, but I see no benefits of having them over just calling the constructor. I'm thinking of deprecating them in 1.7.0, and remove them in 1.8.0. Thoughts?

bodiam avatar Sep 11 '22 12:09 bodiam

Yes, it hides methods. In case of real code splitting it could be done with classes instead of interfaces which also could be an option however it will lead to breaking of api.

About instance what if in 1.7.0 we mark it as deprecated and in 1.8.0 remove them?

snuyanzin avatar Sep 11 '22 13:09 snuyanzin

@snuyanzin this is in between about what I have in mind. Thanks for sharing it.

This would be basically an architectural decision. If we go with modules your approach is excellent. I also agree about the instance methods.

One note from my side: we still can get the faker and getFaker from the AbstractProvider during the usage of the sub-specialized/group class.

Sub-specialized class: MovieFaker Hierarchy now: BaseFaker -> MovieFaker

Example:

MovieFaker movieFaker = new Faker();

movieFaker.avatar().getFaker();
movieFaker.avatar().faker;

The class Avatar has the AbstractProvider as a superclass, but MovieFaker should be the superclass, based on a modular project. The hierarchy would be BaseFaker -> MovieFaker -> Avatar This is the only change I can see to group the classes.

eliasnogueira avatar Sep 11 '22 14:09 eliasnogueira

thanks for the feedback

I've now made a real split into 4 modules (just because of POC) some providers just skipped for the same reason. At the same time it is possible to see sportfaker, basefaker, moviefaker and videogamesfaker. Each of them is compilable and passing tests. Others could be splitted in a similar way.

Now

movieFaker.avatar().getFaker();

gives MovieFaker

The class Avatar has the AbstractProvider as a superclass, but MovieFaker should be the superclass, based on a modular project. The hierarchy would be BaseFaker -> MovieFaker -> Avatar This is the only change I can see to group the classes.

Didn't get your point here BaseFaker/Faker -> MovieFaker agree. However Avatar is a provider and is not in Fakers hierarchy.

snuyanzin avatar Sep 11 '22 23:09 snuyanzin

If I understand this correctly, this code only hides the methods? Ideally I think we'd like to split the code into multiple modules, would this allow that?

As a side note, how about getting rid of those instance methods? I know it will break the API, but I see no benefits of having them over just calling the constructor. I'm thinking of deprecating them in 1.7.0, and remove them in 1.8.0. Thoughts?

@bodiam This is largely what I attempted to do with PR #221. I never received any feedback on it in the PR or under https://github.com/datafaker-net/datafaker/issues/188. Still happy to contribute if there is interest in adopting the approach, even if other changes are needed.

jpeffer avatar Sep 11 '22 23:09 jpeffer

I never received any feedback on it in the PR

@jpeffer thanks for contribution however your PR (#221 ) is in draft. Normally in draft PRs are considered as not ready for feedback and not reviewed unless it is not requested explicitly by author

snuyanzin avatar Sep 12 '22 05:09 snuyanzin

If I understand this correctly, this code only hides the methods? Ideally I think we'd like to split the code into multiple modules, would this allow that? As a side note, how about getting rid of those instance methods? I know it will break the API, but I see no benefits of having them over just calling the constructor. I'm thinking of deprecating them in 1.7.0, and remove them in 1.8.0. Thoughts?

@bodiam This is largely what I attempted to do with PR #221. I never received any feedback on it in the PR or under #188. Still happy to contribute if there is interest in adopting the approach, even if other changes are needed.

My apologies, I thought it was a draft PR, which to my understanding is a work in progress. I wasn't aware you were looking for feedback, nor do I see any direct questions, so sorry for the misunderstanding. Happy to continue the conversation though!

bodiam avatar Sep 12 '22 11:09 bodiam

thanks for the feedback

I've now made a real split into 4 modules (just because of POC) some providers just skipped for the same reason. At the same time it is possible to see sportfaker, basefaker, moviefaker and videogamesfaker. Each of them is compilable and passing tests. Others could be splitted in a similar way.

Now

movieFaker.avatar().getFaker();

gives MovieFaker

The class Avatar has the AbstractProvider as a superclass, but MovieFaker should be the superclass, based on a modular project. The hierarchy would be BaseFaker -> MovieFaker -> Avatar This is the only change I can see to group the classes.

Didn't get your point here BaseFaker/Faker -> MovieFaker agree. However Avatar is a provider and is not in Fakers hierarchy.

@snuyanzin I didn't express myself correctly, sorry... I mean that I'm in a favour of a concrete provider instead the abstract one, and you did it when you split into modules.

@bodiam did you like this approach using different modules? I believe we are getting closer to a concrete example, thanks to @snuyanzin.

I would say that the next step would be the correlation between a group vs generators.

eliasnogueira avatar Sep 13 '22 19:09 eliasnogueira

I tried to read up as much as possible through the comments on the PR and related issues. I've seen several approaches, but I believe I didn't see this pass by yet (correct me if I'm wrong):

What if we took an approach that both allows for backwards compatibility with the current setup and at the same time allows for more radical changes in the API (and the architecture)? I think it's important to make (keep) it easy to switch from java-faker to Datafaker as that's a more well-known project still. We could support that by "preserving" the current API in a datafaker-classic module. This could then depend on the new setup with several modules (a bit like the datafaker-all module in the image @bodiam shared in #157). It would primarily be a delegate to the modularized fakers.

The benefit, besides an easy migration path from java-faker, would be that a new API for the modular setup can be a fairly drastic change from the current setup. It would for example open the door to using a new interface (DataFaker?), which is implemented by all specialized fakers, which could contain default methods for the low-level methods that are currently in Faker, like regexify(), nummerify() etc.

And a side-note about packaging. Since this is a PoC I can imagine that wasn't taken into account specifically. Currently all modules share the same base package. That would cause split package issues when the library is used in a application that uses Java modules. So I would suggest to add an extra level to the packages (e.g. classes in datafaker-movie would then use net.datafaker.movie as their base package).

jaapcoomans avatar Sep 15 '22 21:09 jaapcoomans

The thing we can do is make Faker as deprecated class and use it as faker for everything to ease migration from java-faker. At the same time under the hood it will be a combination of splitted fakers however from user's point of view it will be the same faker as in previous versions and this could be done in a separate mvn module. The light version could be called LightFaker.

All default methods like numerify, templatify and others are already inherited from BaseFaker/LightFaker so it should not be an issue. Package renaming makes sense once everyone is ok with the general approach.

snuyanzin avatar Sep 15 '22 22:09 snuyanzin

@jaapcoomans @snuyanzin I feel like both of you are describing things I largely attempted to accomplish in PR https://github.com/datafaker-net/datafaker/pull/221. I was trying to lay down a foundation that would more easily enable segregation of faker domains, while also addressing some concerns from the original java-faker design.

I did deprecate the original Faker, but I do like the idea of even moving it into a separate dependency as a bridge.

jpeffer avatar Sep 16 '22 01:09 jpeffer

I'm not a fan of deprecating the Faker class, I'm more a fan of making the Faker class delegate all the calls to the correct module, making Faker the uber project, and the rest of the modules can be split.

I think it's important to make (keep) it easy to switch from java-faker to Datafaker as that's a more well-known project still. We could support that by "preserving" the current API in a datafaker-classic module. This could then depend on the new setup with several modules (a bit like the datafaker-all module in the image @bodiam shared in #157). It would primarily be a delegate to the modularized fakers.

So this I support indeed. We can call it classic, uber, -all, I don't mind, but something which doesn't break the API (or at least, not too much) and makes it easy to transition from Javafaker to Datafaker. Datafaker 2 can break the API in a much more extreme way if we need to, but that's a decision for later I think.

bodiam avatar Sep 16 '22 05:09 bodiam

I did deprecate the original Faker, but I do like the idea of even moving it into a separate dependency as a bridge.

This is already done in this POC for non-uber

We can call it classic, uber, -all, I don't mind, but something which doesn't break the API (or at least, not too much) and makes it easy to transition from Javafaker to Datafaker. Datafaker 2 can break the API in a much more extreme way if we need to, but that's a decision for later I think.

The API will not be broken, that's why I mentioned it for uber-jar like

At the same time under the hood it will be a combination of splitted fakers however from user's point of view it will be the same faker as in previous versions and this could be done in a separate mvn module.

The only concern about making of Faker deprecated is to make people using more specific fakers... For the solution it is not critical, so it's ok to not mark it deprecated as well

snuyanzin avatar Sep 16 '22 05:09 snuyanzin

This is already done in this POC for non-uber

Yes, but it does not address some of the API concerns that are resolved in mine. I think having multiple PRs all attempting to roughly address similar areas of concern are going to be problematic for determining a path forward. It would be good to have feedback on my own PR from a few months ago to understand why you felt a similar, but ultimately different approach was necessary. That might help us figure out how to blend the pros we see in the different approaches.

I'd prefer the idea of a classic / legacy dependency over the idea of an uber / all one. Classic / Legacy provides a clear break and works as an adapter between the old API and the new. It does not need to have new faker capabilities added to it. New functionality can simply be adopted through the use of the new, segregated modules. An uber / all dependency would unnecessarily increase maintenance. I think it also becomes less usable as the class becomes overly polluted with too many methods. Blob / god class antipatterns are probably something best avoided.

jpeffer avatar Sep 16 '22 17:09 jpeffer

Could you please clarify what concerns are not resolved in this POC and resolved in another from user's point of view?

snuyanzin avatar Sep 16 '22 17:09 snuyanzin

Mine addressed the needs outlined in https://github.com/datafaker-net/datafaker/issues/188, resolved a defect in the FakeValuesService, and laid a foundation for splitting out the fakers into modules, while also retaining full backwards compatibility with the old API.

That's why I was wondering what the motivation was for not building upon PR #221?

jpeffer avatar Sep 16 '22 19:09 jpeffer

For me it is not clear how #221 allows to solve task when there are several modules e.g. basefaker, moviefaker, videogamefaker and now we want to create all faker to get functionality of everyone in one class... May be I miss something however currently from my point of view it does not simplify the task for that... Or may be could you clarify this point?

snuyanzin avatar Sep 16 '22 19:09 snuyanzin

I don't think Datafaker should solve every user case. #188 is something which can be solved very easily by creating a small wrapper object around Faker which stores the seed. That's a very simple solution for a such use case, and I think it would be better if solved on the consumer's end (ie the user of the faker library), then jn the library itself.

Datafaker doesn't need the functionality of exposing the seed, therefor I think this functionality is best kept outside the library. We can't facilitate every scenario out there.

bodiam avatar Sep 16 '22 21:09 bodiam

@bodiam I understand your position that DataFaker cannot and should not try to cover every use case. In this particular instance though, the goal is to support repeatable data generation, not simply provide access to the seed. I think that goal gets somewhat lost if we focus on seed access, rather than the functional goal of repeatability. Data generation is needed in the context of testing almost exclusively and repeatable data generation is a common need in testing, which is why you see this as core functionality in similar data generating libraries out there.

Ultimately, I thought contributions were welcome, so I attempted to address a few gaps in the original libraries design, that would help to support modularity and repeatable data generation. Being able to reuse generators, within generators, while supporting repeatability is another important goal that promotes reusability within the library. If you have other ideas on how to achieve these goals, beyond placing the burden on the consumer, that would satisfy my goal. That said, the contribution is there if you choose to use it, which meets this need. I believe they are valuable additions, but I respect whatever decision is made to move forward.

jpeffer avatar Sep 17 '22 02:09 jpeffer

We absolutely appreciate all contributions, but my limited time doesn't allow me to dive into every discussion and PR unfortunately, there's a limit on how much time I can spend on this.

Regarding the repeatability, in my understanding, proving a seed to the faker will give you perfect repeated data generation, or at least, it should. Where are we lacking in that, cause of course I'm happy to address that.

bodiam avatar Sep 17 '22 02:09 bodiam

Thanks everyone for the feedback Really appreciate it I made several steps into direction of splitting trying to make them more or less atomic and with much smaller change than in this PR. Currently it is splitted within the same maven artifact more examples about that could be found at https://github.com/datafaker-net/datafaker/issues/157#issuecomment-1256914943 Moreover now similar extraction of other providers groups could be done in simpler way (just look how it was done e.g. for Foor or Sport providers)

So I close this issue since now it is already a bit outdated

snuyanzin avatar Sep 24 '22 09:09 snuyanzin