Significantly higher latency under load when HTTP keep-alive is enabled
Environment:
- Dell Latitude-5420
- Ubuntu 25.04
- kernel: 6.14.0-15-generic
- PHP 8.4.5 (cli) (built: Mar 17 2025 20:35:32) (NTS)
- amphp/http-server: 3.4.3
Benchmark tools:
- wrk debian/4.1.0-4build2 [epoll] Copyright (C) 2012 Will Glozer
- hey
Description: When running load tests against a minimal “hello world” PHP script served by amphp/http-server, enabling HTTP keep-alive causes per-request latency to jump to ~40 ms, whereas disabling keep-alive (i.e. forcing Connection: close) yields ~150–200 µs response times. Under keep-alive the PHP worker’s CPU usage is effectively zero.
Context: I’m currently developing a Symfony Runtime integration for amphp, and during performance testing I observed these elevated response times. At first I assumed the issue was in my own implementation, but switching to the official hello-world.php example reproduced the same behavior, indicating the problem lies within amphp/http-server itself.
Steps to reproduce:
- run
php examples/hello-world.php - Run wrk with keep-alive (default):
wrk --latency -t 1 -c 1 http://localhost:1337/
Running 10s test @ http://localhost:1337/
1 threads and 1 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 41.12ms 1.78ms 43.20ms 98.35%
Req/Sec 24.30 4.98 30.00 57.00%
Latency Distribution
50% 41.03ms
75% 41.17ms
90% 42.00ms
99% 42.98ms
243 requests in 10.01s, 42.40KB read
Requests/sec: 24.27
Transfer/sec: 4.24KB
- Run wrk disabling keep-alive:
wrk --header "Connection: Close" --latency -t 1 -c 1 http://localhost:1337/
Running 10s test @ http://localhost:1337/
1 threads and 1 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 150.09us 28.03us 0.87ms 92.33%
Req/Sec 5.52k 434.76 5.90k 90.10%
Latency Distribution
50% 143.00us
75% 151.00us
90% 169.00us
99% 277.00us
55473 requests in 10.10s, 7.88MB read
Requests/sec: 5492.54
Transfer/sec: 799.22KB
- Run wrk with keep-alive (default):
hey -c 1 -z 10s http://127.0.0.1:1337/
Summary:
Total: 10.0128 secs
Slowest: 0.0430 secs
Fastest: 0.0013 secs
Average: 0.0412 secs
Requests/sec: 24.2690
Total data: 3159 bytes
Size/request: 13 bytes
Response time histogram:
0.001 [1] |
0.005 [0] |
0.010 [0] |
0.014 [0] |
0.018 [0] |
0.022 [0] |
0.026 [0] |
0.030 [0] |
0.035 [0] |
0.039 [0] |
0.043 [242] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
Latency distribution:
10% in 0.0409 secs
25% in 0.0410 secs
50% in 0.0411 secs
75% in 0.0419 secs
90% in 0.0420 secs
95% in 0.0421 secs
99% in 0.0429 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0000 secs, 0.0013 secs, 0.0430 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs
req write: 0.0000 secs, 0.0000 secs, 0.0001 secs
resp wait: 0.0006 secs, 0.0002 secs, 0.0014 secs
resp read: 0.0405 secs, 0.0002 secs, 0.0421 secs
Status code distribution:
[200] 243 responses
- Run hey disabling keep-alive:
hey -c 1 -z 10s -disable-keepalive http://127.0.0.1:1337/
Summary:
Total: 10.0008 secs
Slowest: 0.0025 secs
Fastest: 0.0002 secs
Average: 0.0003 secs
Requests/sec: 3447.0364
Total data: 448149 bytes
Size/request: 13 bytes
Response time histogram:
0.000 [1] |
0.000 [32562] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.001 [841] |■
0.001 [272] |
0.001 [696] |■
0.001 [88] |
0.002 [9] |
0.002 [2] |
0.002 [1] |
0.002 [0] |
0.003 [1] |
Latency distribution:
10% in 0.0002 secs
25% in 0.0002 secs
50% in 0.0003 secs
75% in 0.0003 secs
90% in 0.0003 secs
95% in 0.0004 secs
99% in 0.0010 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0001 secs, 0.0002 secs, 0.0025 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs
req write: 0.0000 secs, 0.0000 secs, 0.0009 secs
resp wait: 0.0001 secs, 0.0001 secs, 0.0012 secs
resp read: 0.0000 secs, 0.0000 secs, 0.0010 secs
Status code distribution:
[200] 34473 responses
Actual behavior: With HTTP keep-alive:
- Latency per request ~40 ms
- PHP worker idle (≈0% CPU)
Without HTTP keep-alive:
- Latency per request ~150–200 µs
- PHP worker fully utilized
Expected behavior: Enabling HTTP keep-alive should either reduce or have minimal impact on per-request latency and maintain high throughput; overall request handling performance should be similar whether the connection is reused or not.
Hi @s2x!
Thanks for the detailed report. I just wanted to note this is on my radar and I plan to look further this coming weekend.
Testing on my system (macOS, M4 Max) running 8.4.9-dev ZTS DEBUG, I'm not able to replicate the latency you're seeing.
Any further information you could provide?
Test server command: php -dzend.assertions=-1 examples/hello-world.php
wrk with keep-alive: wrk --latency -t 1 -c 1 http://localhost:1337/
Running 10s test @ http://localhost:1337/
1 threads and 1 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 101.23us 12.54us 472.00us 79.06%
Req/Sec 9.84k 399.72 10.30k 79.21%
Latency Distribution
50% 100.00us
75% 104.00us
90% 113.00us
99% 148.00us
98927 requests in 10.10s, 18.49MB read
Requests/sec: 9794.85
Transfer/sec: 1.83MB
wrk without keep-alive: wrk --header "Connection: Close" --latency -t 1 -c 1 http://localhost:1337/
Running 10s test @ http://localhost:1337/
1 threads and 1 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 210.81us 120.84us 3.90ms 95.25%
Req/Sec 1.04k 837.83 3.03k 66.67%
Latency Distribution
50% 184.00us
75% 204.00us
90% 248.00us
99% 668.00us
4817 requests in 10.05s, 606.83KB read
Socket errors: connect 0, read 4817, write 0, timeout 0
Requests/sec: 479.38
Transfer/sec: 60.39KB
hey with keep-alive: hey -c 1 -z 10s http://127.0.0.1:1337/
Summary:
Total: 10.0004 secs
Slowest: 0.0018 secs
Fastest: 0.0001 secs
Average: 0.0001 secs
Requests/sec: 8238.5621
Response time histogram:
0.000 [1] |
0.000 [82310] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.000 [70] |
0.001 [6] |
0.001 [1] |
0.001 [0] |
0.001 [0] |
0.001 [0] |
0.001 [0] |
0.002 [0] |
0.002 [1] |
Latency distribution:
10% in 0.0001 secs
25% in 0.0001 secs
50% in 0.0001 secs
75% in 0.0001 secs
90% in 0.0001 secs
95% in 0.0001 secs
99% in 0.0002 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0000 secs, 0.0001 secs, 0.0018 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs
req write: 0.0000 secs, 0.0000 secs, 0.0003 secs
resp wait: 0.0001 secs, 0.0001 secs, 0.0007 secs
resp read: 0.0000 secs, 0.0000 secs, 0.0002 secs
Status code distribution:
[200] 82389 responses
hey without keep-alive: hey -c 1 -z 10s -disable-keepalive http://127.0.0.1:1337/
Summary:
Total: 10.0002 secs
Slowest: 0.0083 secs
Fastest: 0.0002 secs
Average: 0.0004 secs
Requests/sec: 2319.9600
Response time histogram:
0.000 [1] |
0.001 [22986] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.002 [207] |
0.003 [1] |
0.003 [1] |
0.004 [1] |
0.005 [1] |
0.006 [1] |
0.007 [0] |
0.008 [0] |
0.008 [1] |
Latency distribution:
10% in 0.0002 secs
25% in 0.0003 secs
50% in 0.0004 secs
75% in 0.0005 secs
90% in 0.0007 secs
95% in 0.0008 secs
99% in 0.0010 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0002 secs, 0.0002 secs, 0.0083 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs
req write: 0.0000 secs, 0.0000 secs, 0.0003 secs
resp wait: 0.0002 secs, 0.0001 secs, 0.0007 secs
resp read: 0.0000 secs, 0.0000 secs, 0.0001 secs
Status code distribution:
[200] 23200 responses
Hey, thanks for your quick interest in the topic.
I ran a few more tests with different versions of php and amphp. I always get the same results, slower with keep-alive. I don't have any extensions installed, such as libuv, libevent.
The only thing that's different about my environment is that it runs tests on the NTS version of PHP, and from what I can see you have ZTS. However, it shouldn't make much difference. Tomorrow I'll try to compile PHP from the sources in ZTS mode and see if it changes anything.
It seems to be some kind of Linux-related problem. To rule out problems with my computer, I ran a simple test on GCP on e2-micro machines. The results are slightly different but still show the same problem.
wrk -t 1 -c 1 http://127.0.0.1:1337/
Running 10s test @ http://127.0.0.1:1337/
1 threads and 1 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 4.99ms 167.60us 8.62ms 97.25%
Req/Sec 200.96 2.50 212.00 95.00%
2001 requests in 10.00s, 383.00KB read
Requests/sec: 200.05
Transfer/sec: 38.29KB
wrk -t 1 -c 1 -H "connection: close" http://127.0.0.1:1337/
Running 10s test @ http://127.0.0.1:1337/
1 threads and 1 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 401.45us 77.97us 1.52ms 90.81%
Req/Sec 1.79k 76.96 1.91k 70.37%
4819 requests in 10.02s, 607.08KB read
Socket errors: connect 0, read 4819, write 0, timeout 0
Requests/sec: 481.01
Transfer/sec: 60.60KB
As you can see, on GCP I have 5ms instead of 40ms locally, but without keep-alive it is still faster.
Unfortunately, I don not have access to Mac machines to test this.
Hey, Thanks for your reply — it helped me realize I should look into the differences between Mac and Linux environments. After a quick search online, I figured out what was causing the issue and managed to get much better response times. So the task can be considered done now.
For anyone facing a similar problem, here are the details:
It turns out that the TCP stack probably works differently on Mac than it does on Linux. The strange delays weren’t caused by bad code, but by the Nagle algorithm. To prevent these delays, you need to use the NO_DELAY option on the socket.
You can easily do this by adding BindContext in the expose function:
$server = new SocketHttpServer(...);
$server->expose("0.0.0.0:1337", (new Socket\BindContext())->withTcpNoDelay());
result with NO_DELAY set to true
Running 10s test @ http://127.0.0.1:1337/
1 threads and 1 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 43.05us 15.01us 1.06ms 97.45%
Req/Sec 23.32k 1.10k 24.43k 88.12%
Latency Distribution
50% 41.00us
75% 43.00us
90% 51.00us
99% 71.00us
234375 requests in 10.10s, 43.81MB read
Requests/sec: 23205.99
Transfer/sec: 4.34MB
result with default settings
Running 10s test @ http://127.0.0.1:1337/
1 threads and 1 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 40.92ms 2.15ms 43.66ms 99.18%
Req/Sec 24.44 5.04 30.00 51.00%
Latency Distribution
50% 41.01ms
75% 41.10ms
90% 41.19ms
99% 42.11ms
245 requests in 10.03s, 47.06KB read
Requests/sec: 24.42
Transfer/sec: 4.69KB
The only thing that could be added is perhaps a small mention in the documentation that such a situation may occur.
Nagle has reared his head to ruined some benchmarks before. Real-world applications won't be as affected I think. Perhaps even benefit.
Perhaps the best thing to do would be to enable it in the examples with a comment as to why and add a note about it in the readme.