Performance Problem with random
While diagnosing why some of our tests are so slow, I discovered the culprit for us is with random(). We have a function:
public static function randomApiUser(): ApiUser
{
$user = self::random()->object();
return new ApiUser($user);
}
And it was making tests take five times as long as when we didn't use it.
After poking around, I noticed that rather than getting X random records, it's getting ALL of them and then getting random ones from that.
public function randomRange(int $min, int $max, array $attributes = []): array
{
if ($min < 0) {
throw new \InvalidArgumentException(\sprintf('$min must be positive (%d given).', $min));
}
if ($max < $min) {
throw new \InvalidArgumentException(\sprintf('$max (%d) cannot be less than $min (%d).', $max, $min));
}
$all = \array_values($this->findBy($attributes));
\shuffle($all);
if (\count($all) < $max) {
throw new \RuntimeException(\sprintf('At least %d "%s" object(s) must have been persisted (%d persisted).', $max, $this->getClassName(), \count($all)));
}
return \array_slice($all, 0, \random_int($min, $max)); // @phpstan-ignore-line
}
I think it would be more performant if the code used something like this to just get random records from the database (with the appropriate attribute filters):
SELECT *
FROM table_name
ORDER BY RAND()
LIMIT 1;
I am willing to do a PR for this if you would like.
Hi @amyAMeQ
thanks for the report, you're totally right about this, it could be improved.
Problem is: I think it is not the same syntax between mysql (ORDER BY RAND()) and postgresql (ORDER BY RANDOM())
We're about to release Foundry 2.0, certainly the next week. I'd rather this to be fixed in 2.0 than in 1.x
@nikophil - OK. I will pull the 2.0 branch and play locally. Hoping I can find a way to do it in a database agnostic way.
you'll find an upgrade guide here: https://github.com/zenstruck/foundry/blob/1.x/UPGRADE-2.0.md (not 100% up to date)
BTW don't hesitate to report any problems you'd find in 2.x branch :sweat_smile:
Thanks for this report @amyAMeQ, I actually remember when writing that code that this could be a problem but decided to wait until someone reported. Here we are ;)
Hoping I can find a way to do it in a database agnostic way
sadly that's not possible. Doctrine does not ship any way to proceed a random. You'll have to add some new DQL function to do this, but we don't want to do this is in Foundry.
then we need to do it with raw SQL, but ordering by rand is slightly different whether if you use MySQL or Postgres.... or Mongo which we actually support! IMO, this would introduce too much complexity....
The first solution that comes in my mind, which is a trade-off, may be to use the query builder (maybe we'll need to split orm and odm behaviors), and select all ids, and then only take few randomly. This is not ideal, but it would prevent some unneeded complexity.
Aside from ordering by RAND(), I think a different (and possibly more database agnostic) approach would be to first COUNT() the number of rows, then pick a random number between 1 and that number and then finally using LIMIT <random number>,1 to get that specific row.
At least for MySQL the first argument of LIMIT is the offset, the second is the number of rows to return.
This would potentially be much faster than a heavy ORDER operation.
But again, I have not looked into MongoDB.
--
Ok sadly for Postgres the syntax is similar but different:
MySQL: LIMIT 10,1
Postgres: LIMIT 1 OFFSET 10
But maybe there is a clean way we can implement this. The thing is that the functionality is working as is, just doesn't perform very well. So maybe we can add a configuration option that enables this performance optimization, but with the side note that this requires some user setup (i.e. configuring Doctrine). Then it would be backwards compatible but people that really need this, can decide to benefit from the performance optimization.
hey @mdeboer
this sounds like a good idea, do you think it is doable with doctrine/persistence only? we're abstracting away the DBMS (Mongo included) thanks to doctrine/persistence, this would be awesome that we could do this in a db agnostic way
@nikophil I will have a look this weekend! 👍🏻
I just saw your edit: isn't doctrine/persistence + dbal helps with this? here is the prototype of ObjectRepositroy::findBy():
public function findBy(
array $criteria,
?array $orderBy = null,
?int $limit = null,
?int $offset = null
);
it actually has limit + offset parameters, I think your approach is ok for any DBs!
Ah yes, I had a suspicion it was in there as it is such a trivial task, but wasn't sure. That will make things a lot easier.
But I will spend some time on it over the weekend, to see what I can come up with. Probably do a benchmark as well.
A benchmark which can be reused from time to time to ensure there is not a performance issue before a release would be definitely very useful!
I mean, it could be a lot of queries but this could dramatically reduce memory usage.
I mean, it could be a lot of queries
how I understand it, it would be one extra query per "random" call, but I think this is worth it!
Oh right. I forgot we only return a single entity here.
hum it could be challenging for randomRange() maybe 🤔
Another thing to consider is that for example we use this in our tests and we usually seed the RNG to get random yet consistent results. This would not be possible with ORDER BY RAND() as far as I know.
I'm pulling up the code now to have a look at random() and randomRange() :)
Alright, so fixing this for random() (one random entity) is easy and yields a nice improvement. But I haven't spend much time working on randomSet() and randomRange(), they seem a bit harder to implement at first glance.
Most time went to setting up phpbench but I got it working. Writing benchmarks is pretty similar to writing either unit or integration tests now.
Anyway, here are the results for random() I got with ORM so far. Comparing the original implementation in the 2.x repository with my implementation I explained before (using the random offset):
PHPBench (1.4.1) running benchmarks... #standwithukraine
with configuration file: /home/maarten/Projects/Forks/foundry/phpbench.json
with PHP version 8.4.5, xdebug ❌, opcache ❌
comparing [actual vs. before]
\Zenstruck\Foundry\Tests\Benchmark\ORM\GenericEntityFactoryBench
bench_random # 1........................I9 - [Mo2.111ms vs. Mo2.020ms] +4.48% (±3.89%)
bench_random # 50.......................I9 - [Mo2.464ms vs. Mo3.504ms] -29.68% (±6.08%)
bench_random # 500......................I9 - [Mo2.738ms vs. Mo11.773ms] -76.75% (±9.23%)
bench_random # 1000.....................I9 - [Mo3.515ms vs. Mo35.876ms] -90.20% (±12.87%)
Subjects: 1, Assertions: 0, Failures: 0, Errors: 0
+---------------------------+--------------+------+------+-----+------------------+-----------------+-----------------+
| benchmark | subject | set | revs | its | mem_peak | mode | rstdev |
+---------------------------+--------------+------+------+-----+------------------+-----------------+-----------------+
| GenericEntityFactoryBench | bench_random | 1 | 1 | 10 | 36.310mb +0.00% | 2.111ms +4.48% | ±3.89% -2.19% |
| GenericEntityFactoryBench | bench_random | 50 | 1 | 10 | 37.108mb +0.00% | 2.464ms -29.68% | ±6.08% -59.30% |
| GenericEntityFactoryBench | bench_random | 500 | 1 | 10 | 45.293mb -1.22% | 2.738ms -76.75% | ±9.23% -74.94% |
| GenericEntityFactoryBench | bench_random | 1000 | 1 | 10 | 59.239mb -10.32% | 3.515ms -90.20% | ±12.87% -47.94% |
+---------------------------+--------------+------+------+-----+------------------+-----------------+-----------------+
Depending on the number of items in the repository, performance is increased by 30 to 90%! 😍 Also the stability in speed is improved considerable (50-75%).
You can find my changes here: https://github.com/mdeboer/foundry/tree/feat/speedup-random
If you like I can make a PR 👍🏻
Hi @mdeboer
this is very cool! We were actually talking with @kbond the we would like to add a CI check about performances, using PHPBench, maybe based on this example
Would you mind to create two PRs? one which just fixes the random stuff, and another one with your work on PHPBench?
If you're willing to introduce the GitHub Action, this is super cool, but we can also handle this ourselves, based on your work
Alright, so fixing this for random() (one random entity) is easy and yields a nice improvement. But I haven't spend much time working on randomSet() and randomRange(), they seem a bit harder to implement at first glance.
We can work on randomRange() another time.
Another thing to consider is that for example we use this in our tests and we usually seed the RNG to get random yet consistent results. This would not be possible with ORDER BY RAND() as far as I know
makes totally sense.
thanks!
about randomRange(): I'd really like that we keep only relying on doctrine/persistence.
Based on this constraint, I can only see half solutions, both being trade-offs:
- we can use what you've done with
random()and perform only one query, and change the limit... This would be kinda fake random, but I think this is ok-ish for testing purpose - we can perform as many queries as requested objects, at the price of performance (but still be better than the current implementation), but this would give real randomness
Hey @nikophil that sounds great! I will create twos separate PRs in that case.
Will probably leave the GitHub action to you - I have experience with CI but not with GH and not too much time unfortunately.
yeah, this is perfectly understandable, what you've already done will really help, and have a perfect timing :)
Alright, got some time tonight to do it 👍🏻
There we go, I created two PRs. I see that at least one of them has some psalm issues but one of the reasons being that phpbench is not installed there. But alright, maybe best to discuss that there and not pollute this issue.
@amyAMeQ just curious, is this still an issue for you?
I think randomRange() is still problematic:
https://github.com/zenstruck/foundry/blob/2.x/src%2FPersistence%2FRepositoryDecorator.php#L219
See https://github.com/zenstruck/foundry/issues/612#issuecomment-2781526431