fix!: better superglobals handling in worker mode
Closes #767.
@withinboredom @TimWolla would you mind reviewing the proposed fix?
I need to do more testing to ensure that the behavior is what is expected, but at least this fixes the segfault. We'll have to double-check if this doesn't introduce a memory leak or some weird issues.
Valgrind does not appear to work well here.
I does point out some conditional jumps that depend on uninitialized in non-go code values, though. But also quite a few invalid reads in go code, which might or might not be expected.
The following test script also appears to log the incorrect information:
The cause probably is that you can't naively memcpy the zval, because the reference counting of strings is then incorrect, allowing the code to perform in-place modification instead of doing copy-on-write. I didn't test, because I find the DX of the development container to be pretty unwieldy.
Thanks for your detailed review @TimWolla. Regarding the Docker container, it is entirely optional. When you have a local installation of PHP, you can just run this command to compile the binary: https://frankenphp.dev/docs/compile/#compile-the-go-app
I pushed a new version that uses ZVAL_COPY as suggested by @TimWolla. This should fix the leak (to be double-checked).
This behaves almost as expected but I identified two issues:
- [x]
$_SERVER['FRANKENPHP_WORKER']and probably other custom environment variables passed to the worker script are lost inside the closure. We should probably merge the variables from the worker script in the closure to prevent this issue. - [X]
$_SERVER['FRANKENPHP_WORKER']is NULL terminated, that's an existing, unrelated bug. ~~I'll fix it.~~ Fixed in https://github.com/dunglas/frankenphp/pull/809
The behavior is now the expected one. If you don't mind, cloud you please review again?
Actually, merging the $_SERVER superglobals of the worker and the request wasn't expected, we even have a test (the failing one) for that.
As showcased by the test, it's entirely possible to do that userland if needed.
Should we keep the existing behavior of not merging (after a second thought, I think that it's more flexible to not merge)?
I'm still seeing a leak:
b57e622520ad# ../caddy/frankenphp/frankenphp php-server --worker worker-getopt.php
2024/05/22 13:44:51.696 WARN admin admin endpoint disabled
2024/05/22 13:44:51.696 WARN http.auto_https server is listening only on the HTTP port, so no automatic HTTPS will be applied to this server {"server_name": "php", "http_port": 80}
2024/05/22 13:44:51.696 INFO tls.cache.maintenance started background certificate maintenance {"cache": "0xc000345a80"}
2024/05/22 13:44:51.713 INFO FrankenPHP started 🐘 {"php_version": "8.3.8-dev"}
2024/05/22 13:44:51.714 INFO http.log server running {"name": "php", "protocols": ["h1", "h2", "h3"]}
2024/05/22 13:44:51.714 INFO Caddy serving PHP app on :80
2024/05/22 13:44:51.723 WARN tls storage cleaning happened too recently; skipping for now {"storage": "FileStorage:/root/.local/share/caddy", "instance": "00d35d99-1901-4bf0-aa0b-e1f662db2acd", "try_again": "2024/05/23 13:44:51.723", "try_again_in": 86399.999999265}
2024/05/22 13:44:51.723 INFO tls finished cleaning storage units
[Wed May 22 13:44:53 2024] Script: '-'
/usr/local/src/php/Zend/zend_string.h(174) : Freeing 0x00007fe08c2021e0 (40 bytes), script=-
Last leak repeated 12 times
[Wed May 22 13:44:53 2024] Script: '-'
/usr/local/src/php/Zend/zend_string.h(174) : Freeing 0x00007fe08c202230 (40 bytes), script=-
Last leak repeated 40 times
[Wed May 22 13:44:53 2024] Script: '-'
/usr/local/src/php/Zend/zend_string.h(174) : Freeing 0x00007fe08c2026e0 (48 bytes), script=-
Last leak repeated 12 times
[Wed May 22 13:44:53 2024] Script: '-'
/usr/local/src/php/Zend/zend_hash.c(291) : Freeing 0x00007fe08c25f000 (56 bytes), script=-
Last leak repeated 4 times
[Wed May 22 13:44:53 2024] Script: '-'
/usr/local/src/php/Zend/zend_hash.c(1311) : Freeing 0x00007fe08c260c00 (2560 bytes), script=-
=== Total 73 memory leaks detected ===
2024/05/22 13:44:53.542 ERROR unexpected termination, restarting {"worker": "/go/src/app/testdata/worker-getopt.php", "exit_status": 2}
and then calling curl -v http://localhost/worker-getopt.php?some_query_parameters.
I also might've done something wrong while starting FrankenPHP, because of the exit(2) being triggered.
And the test script with the file_put_contents from yesterday’s review still outputs the wrong data for the after:.
I've an idea for the after:. I'll take a look. The crash and the leak are weird, I'm not sure of why it behaves like this.
Maybe related: https://github.com/dunglas/frankenphp/pull/81
@TimWolla @withinboredom would you mind testing/reviewing this patch one more time? The current implementation looks ok-ish to me.
@dunglas For the test script in https://github.com/dunglas/frankenphp/pull/796#pullrequestreview-2067722841 I'm still seeing the wrong data being written to the log file:
before: worker-getopt.php
before: worker-getopt.php
before: worker-getopt.php
before: worker-getopt.php
before: worker-getopt.php
before: worker-getopt.php
before: worker-getopt.php
before: worker-getopt.php
before: worker-getopt.php
before: worker-getopt.php
before: worker-getopt.php
before: worker-getopt.php
before: worker-getopt.php
before: worker-getopt.php
before: worker-getopt.php
before: worker-getopt.php
before: worker-getopt.php
before: worker-getopt.php
before: worker-getopt.php
before: worker-getopt.php
before: worker-getopt.php
before: worker-getopt.php
before: worker-getopt.php
before: worker-getopt.php
inside: /worker-getopt.php?some_query_parameters 127.0.0.1
after : /worker-getopt.php?some_query_parameters 127.0.0.1
before: /worker-getopt.php?some_query_parameters 127.0.0.1
In the last commit, I fixed the crash without changing the existing behavior (the docs change is now irrelevant, I'll revert that). I think it's good enough for now.
In the last commit, I fixed the crash without changing the existing behavior (the docs change is now irrelevant, I'll revert that). I think it's good enough for now.
I can confirm the crash is fixed, but internal functions see an empty $_SERVER now. That's certainly better than a crash, but would still be inconsistent with userland’s PoV.
I cleaned the code and updated the docs. I think I've found a way to keep the userland and internal structures in sync but I didn't create a test case. @TimWolla would you mind checking if this now works with the Tideways extension?
I think I've found a way to keep the userland and internal structures in sync but I didn't create a test case.
I can confirm that internal functions called after frankenphp_handle_request() see the same $_SERVER as userland initially saw.
However what is not kept in sync is manually writing to $_SERVER either within the callback or after frankenphp_handle_request() finishes and before calling the internal function.
$_SERVER['dummy'] = 'value';
Dumping var_dump(count($_SERVER)) returns 42 in my test case, but an internal function
ZEND_FUNCTION(native_debug)
{
zend_is_auto_global(ZSTR_KNOWN(ZEND_STR_AUTOGLOBAL_SERVER));
HashTable *server = Z_ARRVAL_P(&PG(http_globals)[TRACK_VARS_SERVER]);
printf("%d\n", zend_hash_num_elements(server));
zval *request_uri = zend_hash_str_find(server, "REQUEST_URI", strlen("REQUEST_URI"));
printf("%s\n", Z_STRVAL_P(request_uri));
}
prints 41 (because dummy is missing). Likewise modifying an entry using:
$_SERVER['REQUEST_URI'] = 'dummy';
is not visible to the native function.
I'm not sure if as a user I would prefer a crash or silently incorrect data. Your call.
@TimWolla are the two data structures synced automatically in a normal script? I may be wrong, but I don't think that we do anything special FrankenPHP-side that would cause this desync.
@TimWolla according to my tests, this behaves the same even when not in worker mode, so it looks "expected" to me, at least let's say that it's not related to FrankenPHP.
@dunglas Can confirm with the cli-server. Sorry for the confusion. This looks good to me then!