create 100 million object cost lot of time
Describe the bug when i create 100 million object cost about 10 hour
To Reproduce for (int i = 0; i < 100000000; i++) { People person = new People(); person.setLevel(j); person.setName(faker.name().fullName()); person.setCompany(faker.company().industry() + faker.company().buzzword()); person.setNation(faker.nation().nationality()); person.setPlace(faker.address().fullAddress()); person.setUniversity(faker.university().name()); person.setBlood(faker.name().bloodGroup()); person.setJob(faker.job().title()); person.setPhoneNum(faker.phoneNumber().cellPhone()); person.setBirthDay(DateUtils.get8DateString(faker.date().birthday().getTime())); }
Expected behavior create bigdata could be faster than now
Versions:
- OS: Linux 64GB 1T
- JDK 1.8
- Faker Version 1.0.2
Option-1
you might want to run a profiler to see where the bottleneck is. (eg. visualVM, yourkit, jprofiler)
the problem could be DateUtils.get8DateString. if so, this ticket is in the wrong project. Date formatting is not always fast
Option-2
you can generate in parallel with something like this:
ExecutorService executor = Executors
.newFixedThreadPool(
Runtime.getRuntime().availableProcessors());
java.util.function.Consumer<Person> personSink = ...;
for (int i = 0; i < 100_000_000; i++) {
executor.submit(() -> personSink.accept(generateOnePerson()));
}
// signal we're done submitting jobs
executor.shutdown();
// bounded waiting for all jobs
executor.awaitTermination(30, TimeUnit.SECONDS);
want to contribute for academic purpose
hi @icytek may be a bit too late but anyway there is a port of java-faker to jdk8 with lots of improvements including performance https://github.com/datafaker-net/datafaker
I've just checked timing for your code (except person.setBirthDay(DateUtils.get8DateString(faker.date().birthday().getTime())); since I do not have DateUtils#get8DateString)
No code changes are required except imports
For me it took about 40 min to generate 100M. (Linux64, jdk1.8)
Also it could be parallelized as proposed by @wcarmon
just for fun i added this as a benchmark to https://github.com/datafaker-net/datafaker After a number of optimizations done in datafaker (versions 1.2.0-1.7.0)
it takes less than 10 min to generate these 100 million objects (jdk 17.0.5) in one thread
(except person.setBirthDay(DateUtils.get8DateString(faker.date().birthday().getTime())); since I do not have DateUtils#get8DateString)
However there is its own setBirthday in datafaker which is included in benchmark