java-faker icon indicating copy to clipboard operation
java-faker copied to clipboard

create 100 million object cost lot of time

Open moderafasas opened this issue 4 years ago • 4 comments

Describe the bug when i create 100 million object cost about 10 hour

To Reproduce for (int i = 0; i < 100000000; i++) { People person = new People(); person.setLevel(j); person.setName(faker.name().fullName()); person.setCompany(faker.company().industry() + faker.company().buzzword()); person.setNation(faker.nation().nationality()); person.setPlace(faker.address().fullAddress()); person.setUniversity(faker.university().name()); person.setBlood(faker.name().bloodGroup()); person.setJob(faker.job().title()); person.setPhoneNum(faker.phoneNumber().cellPhone()); person.setBirthDay(DateUtils.get8DateString(faker.date().birthday().getTime())); }

Expected behavior create bigdata could be faster than now

Versions:

  • OS: Linux 64GB 1T
  • JDK 1.8
  • Faker Version 1.0.2

moderafasas avatar Oct 08 '21 10:10 moderafasas

Option-1

you might want to run a profiler to see where the bottleneck is. (eg. visualVM, yourkit, jprofiler)

the problem could be DateUtils.get8DateString. if so, this ticket is in the wrong project. Date formatting is not always fast

Option-2

you can generate in parallel with something like this:

    ExecutorService executor = Executors
        .newFixedThreadPool(
            Runtime.getRuntime().availableProcessors());

    java.util.function.Consumer<Person> personSink = ...;

    for (int i = 0; i < 100_000_000; i++) {
      executor.submit(() -> personSink.accept(generateOnePerson()));
    }

    // signal we're done submitting jobs
    executor.shutdown();  
 
    // bounded waiting for all jobs
    executor.awaitTermination(30, TimeUnit.SECONDS);

wcarmon avatar Oct 08 '21 23:10 wcarmon

want to contribute for academic purpose

mssoni2 avatar Oct 24 '21 23:10 mssoni2

hi @icytek may be a bit too late but anyway there is a port of java-faker to jdk8 with lots of improvements including performance https://github.com/datafaker-net/datafaker

I've just checked timing for your code (except person.setBirthDay(DateUtils.get8DateString(faker.date().birthday().getTime())); since I do not have DateUtils#get8DateString) No code changes are required except imports

For me it took about 40 min to generate 100M. (Linux64, jdk1.8)

Also it could be parallelized as proposed by @wcarmon

snuyanzin avatar Apr 22 '22 07:04 snuyanzin

just for fun i added this as a benchmark to https://github.com/datafaker-net/datafaker After a number of optimizations done in datafaker (versions 1.2.0-1.7.0)

it takes less than 10 min to generate these 100 million objects (jdk 17.0.5) in one thread (except person.setBirthDay(DateUtils.get8DateString(faker.date().birthday().getTime())); since I do not have DateUtils#get8DateString) However there is its own setBirthday in datafaker which is included in benchmark

snuyanzin avatar Nov 28 '22 08:11 snuyanzin