graphql-php icon indicating copy to clipboard operation
graphql-php copied to clipboard

Optimize `Deferred` execution

Open spawnia opened this issue 2 months ago • 9 comments

Resolves https://github.com/webonyx/graphql-php/issues/1803

Follow up to https://github.com/webonyx/graphql-php/pull/1790

spawnia avatar Dec 05 '25 14:12 spawnia

Summary

This PR optimizes memory usage for deferred execution while preserving the DataLoader pattern that was broken in #1790.

Problem

PR #1790 attempted to optimize memory but broke DataLoader batching - batch loads executed incrementally before all IDs were collected, causing null values with 600+ items.

Solution

Reduces memory usage by replacing closure-based queue storage with direct object/array storage:

  1. Store executor on promise instance - $executor property instead of closure context
  2. Queue promise objects directly - enqueue($this) instead of enqueue(function() {...})
  3. Use arrays for waiting handlers - [$promise, $onFulfilled, $onRejected, $state, $result] instead of closures
  4. Explicit cleanup - $task->executor = null and unset($task) enable GC during long queues

Tests Added

  • testDataLoaderPatternWithManyDeferredObjects() - 600 items, verifies single batch load
  • testNestedDataLoaderPattern() - 750 items, reproduces #1803 scenario

Verification

  • [x] Tests pass with optimization
  • [x] Tests fail when #1790 changes are cherry-picked (confirms they catch the regression)
  • [x] DataLoader pattern preserved - all IDs collected before first batch executes

spawnia avatar Dec 05 '25 14:12 spawnia

@shadowhand @zimzat @joshbmarshall Can you give this a review?

spawnia avatar Dec 09 '25 15:12 spawnia

I loaded it up on the system that was having problems with v15.27.0

It still has issues with v15.27.0 tag, but this is working for me at 1954f81fe437d910c0283307d8f17d125b4a3a11

For my query, the memory usage is identical for this pull request as for master branch. But I think it's because I have nested resolvers:

event date > event > event dates

I think it's waiting for all the event dates to be queued before resolving, which is how I'd expect it to be. I don't think that can be managed!

joshbmarshall avatar Dec 10 '25 00:12 joshbmarshall

@spawnia This looks promising; I don't see any issues with load order or breaking batches.

Below are some extra thoughts based on my own trial-error:

The different signatures in the queue feels concerning. I had started experimenting with the idea of switching out the anonymous functions with invokable classes. Your implementation of $this->executor wins out over the cost of instantiating a new class, while a new class for the waiting queue item as a replacement for the array is a little more efficient there, so the two patterns combined reduces peak usage slightly further.

Using SplFixedArray for $this->waiting[] = SplFixedArray would also get half way to the same peak memory reduction.

https://gist.github.com/zimzat/310e9121225ef18640fad2f4ab675ad4

zimzat avatar Dec 11 '25 04:12 zimzat

@zimzat I did follow up on your suggestion to use SplFixedArray, seems like a useful optimization.

Can you elaborate on what specifically you find concerning? The methods in SyncPromise are not marked with @api, so they are not supposed to be part of the public API.

spawnia avatar Dec 11 '25 16:12 spawnia

Can you elaborate on what specifically you find concerning?

The implicit array shape versus an explicit type check of a named and structured class. It's not a big deal, especially here and internally to the class (like you've said, this isn't technically a public API). The if (x) elseif (y) elseif (z) felt verbose compared to the simplicity of the queue always being callable(). ¯\_(ツ)_/¯

zimzat avatar Dec 14 '25 20:12 zimzat

@zimzat I did another round which should hopefully alleviate your concern. Benchmarks look good too, what do you think?

spawnia avatar Dec 15 '25 10:12 spawnia

@spawnia That looks nice; a simple signature, self-contained handling, and still a massive drop in memory usage in the simple many deferred (no then) scenario. Thanks!

zimzat avatar Dec 15 '25 11:12 zimzat

Memory Benchmark Results

Ran memory benchmarks comparing master against the optimized implementation using a benchmark script adapted from @zimzat's test.

Results

Metric Master Optimized Change
Single Deferred 888 bytes 144 bytes -84%
Nested Deferred (with then) 1,272 bytes 528 bytes -58%
1,000 Deferreds 896 KB 144 KB -84%
5,000 nested Deferreds 6.5 MB 2.8 MB -57%
Peak memory 13.9 MB 9.3 MB -33%
Final memory 517 KB 263 KB -49%

Key Optimization: Garbage Collection

During benchmarking, we found that RootSyncPromise::runQueuedTask() must null out $executor after invocation to allow immediate garbage collection of the closure and its captured variables:

public function runQueuedTask(): void
{
    // Clear reference to allow immediate garbage collection
    $executor = $this->executor;
    $this->executor = null;
    // ...
}

Without this, closures stay alive until the promise object is freed, causing significant memory accumulation when many deferred operations are queued. This pattern is now documented in the code with an explanatory comment.

spawnia avatar Dec 15 '25 16:12 spawnia

Benchmark Results Summary

Ran benchmarks comparing this branch against master on PHP 8.3.6.

Key Changes

  • Promise state changed from strings ("pending", "fulfilled", "rejected") to integers (0, 1, 2)
  • Extracted SyncPromiseQueue to a dedicated class
  • Moved $executor handling from SyncPromise constructor to Deferred
  • Added comprehensive DeferredBench for tracking deferred execution performance

Memory Improvements

Benchmark Master Branch Change
benchManyNestedDeferreds 19.39 MB 17.67 MB -8.9%
bench1000Chains 10.08 MB 9.72 MB -3.6%
benchManyDeferreds 6.45 MB 6.09 MB -5.6%

Execution Time

Benchmark Master Branch Change
benchManyNestedDeferreds 17,801 μs 16,651 μs -6.5%
bench1000Chains 4,042 μs 3,919 μs -3.0%
benchManyDeferreds 673 μs 922 μs +37%
benchChain100 97 μs 145 μs +50%
benchChain5 6.0 μs 10.1 μs +68%
benchSingleDeferred 0.9 μs 1.2 μs +38%

Summary

The branch achieves significant memory savings (up to 9%) for heavy deferred workloads while also improving execution time for the most demanding benchmarks (benchManyNestedDeferreds, bench1000Chains).

The microbenchmarks with simple promise chains show some CPU overhead, likely due to structural changes that benefit memory at a small cost to simple cases. For real-world GraphQL queries with many nested deferreds, the branch performs better overall.

spawnia avatar Dec 17 '25 15:12 spawnia

Next steps for me:

  • try this in my real-world projects
  • try this in nuwave/lighthouse

spawnia avatar Dec 17 '25 15:12 spawnia

Lookin' good!

shmax avatar Dec 17 '25 16:12 shmax

I tested this in a private project with a couple of thousand GraphQL tests, no issues. Can't say anything about real life impact though.

mfn avatar Dec 17 '25 19:12 mfn

Lighthouse and a large project of mine work.

spawnia avatar Dec 18 '25 07:12 spawnia

Released with https://github.com/webonyx/graphql-php/releases/tag/v15.29.0.

spawnia avatar Dec 18 '25 16:12 spawnia

very early at the moment, but a number of my sites have gone down because of this update. I've reverted back to v15.28.0 for now which gets them going. I'll have to get you details but this is causing issues

joshbmarshall avatar Dec 18 '25 21:12 joshbmarshall

My sentry is reporting:

Call to undefined method GraphQL\Deferred::runQueue()

vendor/overblog/dataloader-php/lib/promise-adapter/src/Adapter/WebonyxGraphQLSyncPromiseAdapter.php in Overblog\PromiseAdapter\Adapter\WebonyxGraphQLSyncPromiseAdapter::await at line 115

    public function await($promise = null, $unwrap = false)
    {
        if (null === $promise) {
            Deferred::runQueue();
            SyncPromise::runQueue();
            $this->cancellers = [];
            return null;
        }
        $promiseAdapter = $this->getWebonyxPromiseAdapter();

joshbmarshall avatar Dec 18 '25 21:12 joshbmarshall