frankenphp icon indicating copy to clipboard operation
frankenphp copied to clipboard

fix!: better superglobals handling in worker mode

Open dunglas opened this issue 1 year ago • 11 comments

Closes #767.

dunglas avatar May 16 '24 16:05 dunglas

@withinboredom @TimWolla would you mind reviewing the proposed fix?

I need to do more testing to ensure that the behavior is what is expected, but at least this fixes the segfault. We'll have to double-check if this doesn't introduce a memory leak or some weird issues.

dunglas avatar May 17 '24 16:05 dunglas

Valgrind does not appear to work well here.

I does point out some conditional jumps that depend on uninitialized in non-go code values, though. But also quite a few invalid reads in go code, which might or might not be expected.

TimWolla avatar May 21 '24 07:05 TimWolla

The following test script also appears to log the incorrect information:

The cause probably is that you can't naively memcpy the zval, because the reference counting of strings is then incorrect, allowing the code to perform in-place modification instead of doing copy-on-write. I didn't test, because I find the DX of the development container to be pretty unwieldy.

TimWolla avatar May 21 '24 07:05 TimWolla

Thanks for your detailed review @TimWolla. Regarding the Docker container, it is entirely optional. When you have a local installation of PHP, you can just run this command to compile the binary: https://frankenphp.dev/docs/compile/#compile-the-go-app

dunglas avatar May 21 '24 08:05 dunglas

I pushed a new version that uses ZVAL_COPY as suggested by @TimWolla. This should fix the leak (to be double-checked).

This behaves almost as expected but I identified two issues:

  • [x] $_SERVER['FRANKENPHP_WORKER'] and probably other custom environment variables passed to the worker script are lost inside the closure. We should probably merge the variables from the worker script in the closure to prevent this issue.
  • [X] $_SERVER['FRANKENPHP_WORKER'] is NULL terminated, that's an existing, unrelated bug. ~~I'll fix it.~~ Fixed in https://github.com/dunglas/frankenphp/pull/809

dunglas avatar May 21 '24 14:05 dunglas

The behavior is now the expected one. If you don't mind, cloud you please review again?

dunglas avatar May 21 '24 16:05 dunglas

Actually, merging the $_SERVER superglobals of the worker and the request wasn't expected, we even have a test (the failing one) for that.

As showcased by the test, it's entirely possible to do that userland if needed.

Should we keep the existing behavior of not merging (after a second thought, I think that it's more flexible to not merge)?

dunglas avatar May 21 '24 17:05 dunglas

I'm still seeing a leak:

b57e622520ad# ../caddy/frankenphp/frankenphp php-server --worker worker-getopt.php
2024/05/22 13:44:51.696 WARN    admin   admin endpoint disabled
2024/05/22 13:44:51.696 WARN    http.auto_https server is listening only on the HTTP port, so no automatic HTTPS will be applied to this server {"server_name": "php", "http_port": 80}
2024/05/22 13:44:51.696 INFO    tls.cache.maintenance   started background certificate maintenance      {"cache": "0xc000345a80"}
2024/05/22 13:44:51.713 INFO    FrankenPHP started 🐘   {"php_version": "8.3.8-dev"}
2024/05/22 13:44:51.714 INFO    http.log        server running  {"name": "php", "protocols": ["h1", "h2", "h3"]}
2024/05/22 13:44:51.714 INFO    Caddy serving PHP app on :80
2024/05/22 13:44:51.723 WARN    tls     storage cleaning happened too recently; skipping for now        {"storage": "FileStorage:/root/.local/share/caddy", "instance": "00d35d99-1901-4bf0-aa0b-e1f662db2acd", "try_again": "2024/05/23 13:44:51.723", "try_again_in": 86399.999999265}
2024/05/22 13:44:51.723 INFO    tls     finished cleaning storage units
[Wed May 22 13:44:53 2024]  Script:  '-'
/usr/local/src/php/Zend/zend_string.h(174) :  Freeing 0x00007fe08c2021e0 (40 bytes), script=-
Last leak repeated 12 times
[Wed May 22 13:44:53 2024]  Script:  '-'
/usr/local/src/php/Zend/zend_string.h(174) :  Freeing 0x00007fe08c202230 (40 bytes), script=-
Last leak repeated 40 times
[Wed May 22 13:44:53 2024]  Script:  '-'
/usr/local/src/php/Zend/zend_string.h(174) :  Freeing 0x00007fe08c2026e0 (48 bytes), script=-
Last leak repeated 12 times
[Wed May 22 13:44:53 2024]  Script:  '-'
/usr/local/src/php/Zend/zend_hash.c(291) :  Freeing 0x00007fe08c25f000 (56 bytes), script=-
Last leak repeated 4 times
[Wed May 22 13:44:53 2024]  Script:  '-'
/usr/local/src/php/Zend/zend_hash.c(1311) :  Freeing 0x00007fe08c260c00 (2560 bytes), script=-
=== Total 73 memory leaks detected ===
2024/05/22 13:44:53.542 ERROR   unexpected termination, restarting      {"worker": "/go/src/app/testdata/worker-getopt.php", "exit_status": 2}

and then calling curl -v http://localhost/worker-getopt.php?some_query_parameters.

I also might've done something wrong while starting FrankenPHP, because of the exit(2) being triggered.

TimWolla avatar May 22 '24 13:05 TimWolla

And the test script with the file_put_contents from yesterday’s review still outputs the wrong data for the after:.

TimWolla avatar May 22 '24 13:05 TimWolla

I've an idea for the after:. I'll take a look. The crash and the leak are weird, I'm not sure of why it behaves like this.

dunglas avatar May 22 '24 14:05 dunglas

Maybe related: https://github.com/dunglas/frankenphp/pull/81

dunglas avatar May 22 '24 14:05 dunglas

@TimWolla @withinboredom would you mind testing/reviewing this patch one more time? The current implementation looks ok-ish to me.

dunglas avatar May 27 '24 11:05 dunglas

@dunglas For the test script in https://github.com/dunglas/frankenphp/pull/796#pullrequestreview-2067722841 I'm still seeing the wrong data being written to the log file:

before: worker-getopt.php 
before: worker-getopt.php 
before: worker-getopt.php 
before: worker-getopt.php 
before: worker-getopt.php 
before: worker-getopt.php 
before: worker-getopt.php 
before: worker-getopt.php 
before: worker-getopt.php 
before: worker-getopt.php 
before: worker-getopt.php 
before: worker-getopt.php 
before: worker-getopt.php 
before: worker-getopt.php 
before: worker-getopt.php 
before: worker-getopt.php 
before: worker-getopt.php 
before: worker-getopt.php 
before: worker-getopt.php 
before: worker-getopt.php 
before: worker-getopt.php 
before: worker-getopt.php 
before: worker-getopt.php 
before: worker-getopt.php 
inside: /worker-getopt.php?some_query_parameters 127.0.0.1
after : /worker-getopt.php?some_query_parameters 127.0.0.1
before: /worker-getopt.php?some_query_parameters 127.0.0.1

TimWolla avatar May 27 '24 12:05 TimWolla

In the last commit, I fixed the crash without changing the existing behavior (the docs change is now irrelevant, I'll revert that). I think it's good enough for now.

dunglas avatar May 31 '24 23:05 dunglas

In the last commit, I fixed the crash without changing the existing behavior (the docs change is now irrelevant, I'll revert that). I think it's good enough for now.

I can confirm the crash is fixed, but internal functions see an empty $_SERVER now. That's certainly better than a crash, but would still be inconsistent with userland’s PoV.

TimWolla avatar Jun 03 '24 13:06 TimWolla

I cleaned the code and updated the docs. I think I've found a way to keep the userland and internal structures in sync but I didn't create a test case. @TimWolla would you mind checking if this now works with the Tideways extension?

dunglas avatar Jun 04 '24 14:06 dunglas

I think I've found a way to keep the userland and internal structures in sync but I didn't create a test case.

I can confirm that internal functions called after frankenphp_handle_request() see the same $_SERVER as userland initially saw.

However what is not kept in sync is manually writing to $_SERVER either within the callback or after frankenphp_handle_request() finishes and before calling the internal function.

$_SERVER['dummy'] = 'value';

Dumping var_dump(count($_SERVER)) returns 42 in my test case, but an internal function

ZEND_FUNCTION(native_debug)
{
    zend_is_auto_global(ZSTR_KNOWN(ZEND_STR_AUTOGLOBAL_SERVER));
    HashTable *server = Z_ARRVAL_P(&PG(http_globals)[TRACK_VARS_SERVER]);
    printf("%d\n", zend_hash_num_elements(server));
    zval *request_uri = zend_hash_str_find(server, "REQUEST_URI", strlen("REQUEST_URI"));
    printf("%s\n", Z_STRVAL_P(request_uri));
}

prints 41 (because dummy is missing). Likewise modifying an entry using:

$_SERVER['REQUEST_URI'] = 'dummy';

is not visible to the native function.


I'm not sure if as a user I would prefer a crash or silently incorrect data. Your call.

TimWolla avatar Jun 05 '24 08:06 TimWolla

@TimWolla are the two data structures synced automatically in a normal script? I may be wrong, but I don't think that we do anything special FrankenPHP-side that would cause this desync.

dunglas avatar Jun 05 '24 09:06 dunglas

@TimWolla according to my tests, this behaves the same even when not in worker mode, so it looks "expected" to me, at least let's say that it's not related to FrankenPHP.

dunglas avatar Jun 05 '24 11:06 dunglas

@dunglas Can confirm with the cli-server. Sorry for the confusion. This looks good to me then!

TimWolla avatar Jun 05 '24 12:06 TimWolla