unit icon indicating copy to clipboard operation
unit copied to clipboard

feature request: need "prefix" to provide script_name in perl

Open maaarghk opened this issue 2 years ago • 16 comments

for all the same reasons as detailed in #590 for python and #651 for ruby - the ability to set a value in script_name is required.

i also think it should be possible to set this to a blank string, which is allowed by the psgi spec

maaarghk avatar Jan 12 '24 22:01 maaarghk

Hi @maaarghk

Indeed, according to the PSGI spec

The environment MUST include these keys (adopted from PEP 333, Rack and JSGI) except when they would normally be empty.

and as one such key

SCRIPT_NAME: The initial portion of the request URL's path, corresponding to the application. This tells the application its virtual "location". This may be an empty string if the application corresponds to the server's root URI.

All the other required keys seem to be there, so it looks like this just got overlooked.

I'll have a look.

ac000 avatar Jan 13 '24 02:01 ac000

Hi @maaarghk,

In the discussions https://github.com/nginx/unit/issues/590 and https://github.com/nginx/unit/issues/651, real-world use cases emerged that led to issues which couldn't be resolved without these variables. As mentioned here: https://github.com/nginx/unit/issues/590#issuecomment-952008127, every additional variable introduced in the context adds overhead. Therefore, please provide examples of how SCRIPT_NAME is utilized in the Perl.

andrey-zelenkov avatar Jan 13 '24 11:01 andrey-zelenkov

On Sat, 13 Jan 2024 03:28:01 -0800 andrey-zelenkov @.***> wrote:

Hi @maaarghk,

In the discussions https://github.com/nginx/unit/issues/590 and https://github.com/nginx/unit/issues/651, real-world use cases emerged that led to issues which couldn't be resolved without these variables. As mentioned here: https://github.com/nginx/unit/issues/590#issuecomment-952008127, every additional variable introduced in the context adds overhead. Therefore, please provide examples of how SCRIPT_NAME is utilized in the Perl.

Regardless of the above, it is required by the spec...

ac000 avatar Jan 13 '24 19:01 ac000

It makes no sense to implement unused parts of the specification, especially if this leads to general degradation.

andrey-zelenkov avatar Jan 13 '24 20:01 andrey-zelenkov

That's a pretty vague and generalised statement. I suppose you have some facts to back that up?

Given the way Unit works, I am however struggling to see under which circumstances SCRIPT_NAME would be anything other than an empty string...

ac000 avatar Jan 15 '24 11:01 ac000

That's a pretty vague and generalised statement. I suppose you have some facts to back that up?

PSGI variables (similar to WSGI) are generated unconditionally for each request to the Perl application, leaving no option to disable this behavior. Consequently, the module must allocate additional resources for each request when incorporating this variable. Therefore, I am inclined to believe that adding it without obtaining any real benefits, such as addressing an actual user bug, is not advisable. Specifications are not immutable; they may be duplicated, contain errors, be rewritten over time, or be created post facto. Prioritizing the resolution of genuine user issues holds greater significance for me than blindly adhering to documentation requirements.

andrey-zelenkov avatar Jan 15 '24 13:01 andrey-zelenkov

if you were migrating an existing setup like this one which uses nginx - https://github.com/maaarghk/two-placks-one-domain - setting script_name would be required.

the same thing can be done with something simpler like starman using https://metacpan.org/pod/Plack::App::URLMap

i think the same argument as in the python discussion applies where it's totally accurate to say it's pointless to send extra bytes over the wire for every single request for something that is essentially a constant part of the app configuration. ideally if you're starting a new app from scratch you'd take that into account.

But that ship has sailed - if someone wants to migrate an existing app and make minimal changes, or host a third party app where patching the upstream codebase is not desirable, optional support for this has to be present.

https://github.com/plack/Plack/blob/853f25df3e972e28a9822aabbdca87337477deb8/lib/Plack/Middleware/Lint.pm#L36

maaarghk avatar Jan 15 '24 14:01 maaarghk

@maaarghk Thank you! That what I was looking for.

andrey-zelenkov avatar Jan 15 '24 14:01 andrey-zelenkov

On Mon, 15 Jan 2024 06:00:03 -0800 andrey-zelenkov @.***> wrote:

That's a pretty vague and generalised statement. I suppose you have some facts to back that up?

PSGI variables (similar to WSGI) are generated unconditionally for each request to the Perl application, leaving no option to disable this behavior. Consequently, the module must allocate additional resources for each request when incorporating this variable. Therefore, I am inclined to believe that adding it without obtaining any real benefits,

Well, yes, but that's all pretty vague and could be said about anything.

I doubt setting 1 variable would have noticeable consequences, but of course it can be measured to verify that...

such as addressing an actual user bug, is not advisable. Specifications are not immutable; they may be duplicated, contain errors, be rewritten over time, or be created post facto. Prioritizing the resolution of genuine user issues holds greater significance for me than blindly adhering to documentation requirements.

I believe this is a strawman argument...

ac000 avatar Jan 15 '24 14:01 ac000

AFAICT we only need to set SCRIPT_NAME to an empty string, would need to check, but we may be able to do that once at module initialisation...

ac000 avatar Jan 15 '24 15:01 ac000

@ac000 in https://github.com/maaarghk/two-placks-one-domain i give an example of script_name being set (via fastcgi_split_path_info) to a non-blank string, allowing an app which defines its routes as / and /about to be hosted in subdirectory of /app2

maaarghk avatar Jan 15 '24 15:01 maaarghk

Yes, we do something like that for PHP so you can hit /path/to/index.php/some/internal/path and get

SCRIPT_NAME = "/path/to/index.php" PATH_INFO = "/some/internal/path"

I'm not sure that makes sense for Perl under Unit, where you don't hit a .pl file (like you might with CGI). Here SCRIPT_NAME is always the root so SCRIPT_NAME= "".

E.g with config like

{
    "listeners": {
        "[::1]:8080": {
            "pass": "routes"
        }
    },

    "routes": [
        {
            "match": {
                "uri": "/bar/baz/perl*"
            },
            "action": {
                "pass": "applications/perl"
            }
        }
    ],

    "applications": {
        "perl": {
            "type": "perl",
            "script": "/home/andrew/src/perl/1060.pl"
        }
    }
}

The Perl script is loaded at module init time.

$ curl localhost:8080/bar/baz/perl/a/b/c
REQUEST_URI => /bar/baz/perl/a/b/c
PATH_INFO   => /bar/baz/perl/a/b/c

I think those are right and match what the ruby module does for example.

AIUI the perl script here is always the root URI and any path given in the URI is then PATH_INFO

ac000 avatar Jan 15 '24 15:01 ac000

Yeah, perl (well, Plack) does not use SCRIPT_PATH in that way though, it's more taken to mean "url prefix". In the above case if 1060.pl is a Plack app then it would have to define its routes like: /bar/baz/perl/a/b/c. but it might already exist and define its routes as /a/b/c, in which case it would expect:

REQUEST_URI => /bar/baz/perl/a/b/c
PATH_INFO => /a/b/c
SCRIPT_PATH => /bar/baz/perl

Then, the script could know, to generate a URL for its route /page2, the whole required url is /bar/baz/perl/page2.

maaarghk avatar Jan 15 '24 16:01 maaarghk

Hmm, OK, so I don't think we can set this automatically, looks like we will need something like a prefix option like we have in Python (not sue why we don't have such a thing in Ruby), that can default to the empty string if not set...

This also means that PATH_INFO will need to be adjusted accordingly.

ac000 avatar Jan 16 '24 04:01 ac000

@ac000 Actually, I think it is needed in Ruby... we tried setting a SCRIPT_NAME but it never worked properly. We ended up whipping together a middleware to extract a environment variable to get around it, which is non-standard and brittle since it's not standard.

travisbell avatar Jan 21 '24 18:01 travisbell

Hey folks!

Here's an initial implementation of SCRIPT_NAME for Perl. Better late than never!

Here's the patch for interest/testing...

diff --git a/src/nxt_application.h b/src/nxt_application.h
index f5d7a9df..f00a3a4b 100644
--- a/src/nxt_application.h
+++ b/src/nxt_application.h
@@ -66,6 +66,7 @@ typedef struct {
 
 typedef struct {
     char       *script;
+    char       *prefix;
     uint32_t   threads;
     uint32_t   thread_stack_size;
 } nxt_perl_app_conf_t;
diff --git a/src/nxt_conf_validation.c b/src/nxt_conf_validation.c
index 2099f887..8093fb92 100644
--- a/src/nxt_conf_validation.c
+++ b/src/nxt_conf_validation.c
@@ -134,6 +134,8 @@ static nxt_int_t nxt_conf_vldt_python_protocol(nxt_conf_validation_t *vldt,
     nxt_conf_value_t *value, void *data);
 static nxt_int_t nxt_conf_vldt_python_prefix(nxt_conf_validation_t *vldt,
     nxt_conf_value_t *value, void *data);
+static nxt_int_t nxt_conf_vldt_perl_prefix(nxt_conf_validation_t *vldt,
+    nxt_conf_value_t *value, void *data);
 static nxt_int_t nxt_conf_vldt_threads(nxt_conf_validation_t *vldt,
     nxt_conf_value_t *value, void *data);
 static nxt_int_t nxt_conf_vldt_thread_stack_size(nxt_conf_validation_t *vldt,
@@ -988,6 +990,10 @@ static nxt_conf_vldt_object_t  nxt_conf_vldt_perl_members[] = {
         .name       = nxt_string("script"),
         .type       = NXT_CONF_VLDT_STRING,
         .flags      = NXT_CONF_VLDT_REQUIRED,
+    }, {
+        .name       = nxt_string("prefix"),
+        .type       = NXT_CONF_VLDT_STRING,
+        .validator  = nxt_conf_vldt_perl_prefix,
     }, {
         .name       = nxt_string("threads"),
         .type       = NXT_CONF_VLDT_INTEGER,
@@ -3171,6 +3177,34 @@ nxt_conf_vldt_argument(nxt_conf_validation_t *vldt, nxt_conf_value_t *value)
 }
 
 
+static nxt_int_t
+nxt_conf_vldt_perl_prefix(nxt_conf_validation_t *vldt, nxt_conf_value_t *value,
+    void *data)
+{
+    nxt_str_t  str;
+
+    nxt_conf_get_string(value, &str);
+
+    /*
+     * We could allow just '/', though I don't know how useful that is,
+     * but the way the code works that would result in stripping the
+     * leading '/' from PATH_INFO (which is wrong).
+     */
+    if (nxt_str_eq(&str, "/", 1)) {
+        return nxt_conf_vldt_error(vldt,
+                                   "The Perl \"prefix\" must not be '/'");
+    }
+
+    if (*str.start != '/') {
+        return nxt_conf_vldt_error(vldt,
+                                   "The Perl \"prefix\" must start with a "
+                                   "'/'");
+    }
+
+    return NXT_OK;
+}
+
+
 static nxt_int_t
 nxt_conf_vldt_php(nxt_conf_validation_t *vldt, nxt_conf_value_t *value,
     void *data)
diff --git a/src/nxt_main_process.c b/src/nxt_main_process.c
index 060ead41..1693e7c2 100644
--- a/src/nxt_main_process.c
+++ b/src/nxt_main_process.c
@@ -255,6 +255,12 @@ static nxt_conf_map_t  nxt_perl_app_conf[] = {
         offsetof(nxt_common_app_conf_t, u.perl.script),
     },
 
+    {
+        nxt_string("prefix"),
+        NXT_CONF_MAP_CSTRZ,
+        offsetof(nxt_common_app_conf_t, u.perl.prefix),
+    },
+
     {
         nxt_string("threads"),
         NXT_CONF_MAP_INT32,
diff --git a/src/perl/nxt_perl_psgi.c b/src/perl/nxt_perl_psgi.c
index 807d1741..b20bd177 100644
--- a/src/perl/nxt_perl_psgi.c
+++ b/src/perl/nxt_perl_psgi.c
@@ -104,6 +104,8 @@ static CV                   *nxt_perl_psgi_cb;
 static pthread_attr_t       *nxt_perl_psgi_thread_attr;
 static nxt_perl_psgi_ctx_t  *nxt_perl_psgi_ctxs;
 
+static const char           *nxt_perl_script_name;
+
 static uint32_t  nxt_perl_psgi_compat[] = {
     NXT_VERNUM, NXT_DEBUG,
 };
@@ -626,8 +628,25 @@ nxt_perl_psgi_env_create(PerlInterpreter *my_perl,
                               &r->method, r->method_length));
     RC(nxt_perl_psgi_add_sptr(my_perl, hash_env, NL("REQUEST_URI"),
                               &r->target, r->target_length));
-    RC(nxt_perl_psgi_add_sptr(my_perl, hash_env, NL("PATH_INFO"),
-                              &r->path, r->path_length));
+
+    if (nxt_perl_script_name != NULL) {
+        const char  *pinfo;
+
+        if (strlen(nxt_perl_script_name) > r->path_length) {
+            goto fail;
+        }
+
+        RC(nxt_perl_psgi_add_str(my_perl, hash_env, NL("SCRIPT_NAME"),
+                                 nxt_perl_script_name,
+                                 strlen(nxt_perl_script_name)));
+
+        pinfo = nxt_unit_sptr_get(&r->path) + strlen(nxt_perl_script_name);
+        RC(nxt_perl_psgi_add_str(my_perl, hash_env, NL("PATH_INFO"),
+                                 pinfo, strlen(pinfo)));
+    } else {
+        RC(nxt_perl_psgi_add_sptr(my_perl, hash_env, NL("PATH_INFO"),
+                                  &r->path, r->path_length));
+    }
 
     array_version = newAV();
 
@@ -1168,6 +1187,13 @@ nxt_perl_psgi_start(nxt_task_t *task, nxt_process_data_t *data)
     perl_init.data = c;
     perl_init.ctx_data = &pctx;
 
+    if (c->prefix != NULL) {
+        nxt_perl_script_name = strdup(c->prefix);
+        if (nxt_slow_path(nxt_perl_script_name == NULL)) {
+            goto fail;
+        }
+    }
+
     unit_ctx = nxt_unit_init(&perl_init);
     if (nxt_slow_path(unit_ctx == NULL)) {
         goto fail;
@@ -1181,6 +1207,8 @@ nxt_perl_psgi_start(nxt_task_t *task, nxt_process_data_t *data)
 
     nxt_perl_psgi_ctx_free(&pctx);
 
+    free((void *) nxt_perl_script_name);
+
     PERL_SYS_TERM();
 
     exit(rc);
@@ -1193,6 +1221,8 @@ nxt_perl_psgi_start(nxt_task_t *task, nxt_process_data_t *data)
 
     nxt_perl_psgi_ctx_free(&pctx);
 
+    free((void *) nxt_perl_script_name);
+
     PERL_SYS_TERM();
 
     return NXT_ERROR;

Some examples...

Firstly, if you don't specify prefix in the config then behaviour is just as it is now.

However if we do something like

{
    "listeners": {
        "[::1]:8080": {
            "pass": "applications/1060"
        }
    },

    "applications": {
        "1060": {
            "type": "perl",
            "script": "/home/andrew/src/perl/1060.pl",
            "prefix": "/foo.pl"
        }
    }
}
#!/usr/bin/perl

my $app = sub {
        my $env = shift;

        return [
                '200',
                [ 'Content-Type'=>'text/plain' ],
                [ "REQUEST_URI => " . $env->{REQUEST_URI} . "\n" .
                  "SCRIPT_NAME => " . $env->{SCRIPT_NAME} . "\n" .
                  "PATH_INFO   => " . $env->{PATH_INFO} . "\n" ],
        ];
};
$ curl localhost:8080/foo.pl/a/b/c
REQUEST_URI => /foo.pl/a/b/c
SCRIPT_NAME => /foo.pl
PATH_INFO   => /a/b/c

If I've understood things right, then I believe that is the correct behaviour.

SCRIPT_NAME has been stripped from PATH_INFO.

If you forget to specify the SCRIPT_NAME part, we will get some different behaviour depending on the length of the REQUEST_URI, e.g

$ curl localhost:8080/a/b/c
<!DOCTYPE html><title>Error 503</title><p>Error 503.

That is due to this check

+        if (strlen(nxt_perl_script_name) > r->path_length) {
+            goto fail;
+        }

That check is really to stop bad things happening code wise, for example overrunning a buffer.

However if we do

$ curl localhost:8080/a/b/c/d
REQUEST_URI => /a/b/c/d
SCRIPT_NAME => /foo.pl
PATH_INFO   => d

Which is all kinds of wrong.

I could put a more aggressive check in that makes sure that REQUEST_URI starts with SCRIPT_NAME (if set).

But would then need to decide what to do if it doesn't, return a 503? or act like SCRIPT_NAME wasn't set?

ac000 avatar Apr 03 '24 14:04 ac000