`contain()`ing an association with strategy=`join` with joins in the query builder affects outer query
Description
Hi,
I feel like this may have been already talked about / be a known query builder "gotcha", but I can't seem to find issues or mentions about it in the docs. Using matching/innerJoinWith inside a contain is limiting the results of the base query, if the contained association uses a join strategy (the default for hasOne / belongsTo). The provided examples use hasOne.
What I did
A hasOne B B belongsTo C
$this->A->find()->contain(['B' => fn($q) => $q->innerJoinWith('C')])
What I got
Only items from A that have one associated B
What I expected
Every row from A, regardless of whether or not they have associated Bs
Workarounds
Changing strategy to select, lazy loading after query with loadInto, rewriting into a subquery, etc.
$this->A->find()->contain(['B' => [
'queryBuilder' => fn($q) => $q->innerJoinWith('C'),
'strategy' => 'select',
])
None of the workarounds seem ideal to me as they require you to be actively aware of the implementation when containing the association (in the examples the innerJoinWith is evident, but in real life scenarios you're likely to use finders).
Solutions
- if the difference in behavior between strategies is intended, maybe we could mention it in the documentation. I feel like it's easy to stumble upon this bug as you're composing queries with just the default association settings. Also, I'd like to know if I missed any better workarounds.
- if it's not intended, a possible solution may be having the EagerLoader detect whether or not the query builder adds any join clauses, and automatically change the strategy of the contain to
select.
Test/reproduce
git-apply the following patch https://gist.github.com/andres-ml/988d4202d607a63556a842eb0ef79869 into a fresh cakephp/app, then run ./vendor/bin/phpunit tests/TestCase/Model/
CakePHP Version
4.4.15
PHP Version
No response
This is likely due to differences in how the SQL is generated. Containing an association does behave differently because of how the semantics of the ORM have worked for a long time.
The SQL generated for the 'wrong' results is roughly
SELECT ...
FROM users
LEFT JOIN subscriptions
INNER JOIN products
Which ends up implicitly making both joins inner. I think this might be the source of your wrongness.