feature request: need "prefix" to provide script_name in perl
for all the same reasons as detailed in #590 for python and #651 for ruby - the ability to set a value in script_name is required.
i also think it should be possible to set this to a blank string, which is allowed by the psgi spec
Hi @maaarghk
Indeed, according to the PSGI spec
The environment MUST include these keys (adopted from PEP 333, Rack and JSGI) except when they would normally be empty.
and as one such key
SCRIPT_NAME: The initial portion of the request URL's path, corresponding to the application. This tells the application its virtual "location". This may be an empty string if the application corresponds to the server's root URI.
All the other required keys seem to be there, so it looks like this just got overlooked.
I'll have a look.
Hi @maaarghk,
In the discussions https://github.com/nginx/unit/issues/590 and https://github.com/nginx/unit/issues/651, real-world use cases emerged that led to issues which couldn't be resolved without these variables. As mentioned here: https://github.com/nginx/unit/issues/590#issuecomment-952008127, every additional variable introduced in the context adds overhead. Therefore, please provide examples of how SCRIPT_NAME is utilized in the Perl.
On Sat, 13 Jan 2024 03:28:01 -0800 andrey-zelenkov @.***> wrote:
Hi @maaarghk,
In the discussions https://github.com/nginx/unit/issues/590 and https://github.com/nginx/unit/issues/651, real-world use cases emerged that led to issues which couldn't be resolved without these variables. As mentioned here: https://github.com/nginx/unit/issues/590#issuecomment-952008127, every additional variable introduced in the context adds overhead. Therefore, please provide examples of how
SCRIPT_NAMEis utilized in the Perl.
Regardless of the above, it is required by the spec...
It makes no sense to implement unused parts of the specification, especially if this leads to general degradation.
That's a pretty vague and generalised statement. I suppose you have some facts to back that up?
Given the way Unit works, I am however struggling to see under which circumstances SCRIPT_NAME would be anything other than an empty string...
That's a pretty vague and generalised statement. I suppose you have some facts to back that up?
PSGI variables (similar to WSGI) are generated unconditionally for each request to the Perl application, leaving no option to disable this behavior. Consequently, the module must allocate additional resources for each request when incorporating this variable. Therefore, I am inclined to believe that adding it without obtaining any real benefits, such as addressing an actual user bug, is not advisable. Specifications are not immutable; they may be duplicated, contain errors, be rewritten over time, or be created post facto. Prioritizing the resolution of genuine user issues holds greater significance for me than blindly adhering to documentation requirements.
if you were migrating an existing setup like this one which uses nginx - https://github.com/maaarghk/two-placks-one-domain - setting script_name would be required.
the same thing can be done with something simpler like starman using https://metacpan.org/pod/Plack::App::URLMap
i think the same argument as in the python discussion applies where it's totally accurate to say it's pointless to send extra bytes over the wire for every single request for something that is essentially a constant part of the app configuration. ideally if you're starting a new app from scratch you'd take that into account.
But that ship has sailed - if someone wants to migrate an existing app and make minimal changes, or host a third party app where patching the upstream codebase is not desirable, optional support for this has to be present.
https://github.com/plack/Plack/blob/853f25df3e972e28a9822aabbdca87337477deb8/lib/Plack/Middleware/Lint.pm#L36
@maaarghk Thank you! That what I was looking for.
On Mon, 15 Jan 2024 06:00:03 -0800 andrey-zelenkov @.***> wrote:
That's a pretty vague and generalised statement. I suppose you have some facts to back that up?
PSGI variables (similar to WSGI) are generated unconditionally for each request to the Perl application, leaving no option to disable this behavior. Consequently, the module must allocate additional resources for each request when incorporating this variable. Therefore, I am inclined to believe that adding it without obtaining any real benefits,
Well, yes, but that's all pretty vague and could be said about anything.
I doubt setting 1 variable would have noticeable consequences, but of course it can be measured to verify that...
such as addressing an actual user bug, is not advisable. Specifications are not immutable; they may be duplicated, contain errors, be rewritten over time, or be created post facto. Prioritizing the resolution of genuine user issues holds greater significance for me than blindly adhering to documentation requirements.
I believe this is a strawman argument...
AFAICT we only need to set SCRIPT_NAME to an empty string, would need to check, but we may be able to do that once at module initialisation...
@ac000 in https://github.com/maaarghk/two-placks-one-domain i give an example of script_name being set (via fastcgi_split_path_info) to a non-blank string, allowing an app which defines its routes as / and /about to be hosted in subdirectory of /app2
Yes, we do something like that for PHP so you can hit /path/to/index.php/some/internal/path and get
SCRIPT_NAME = "/path/to/index.php" PATH_INFO = "/some/internal/path"
I'm not sure that makes sense for Perl under Unit, where you don't hit a .pl file (like you might with CGI). Here SCRIPT_NAME is always the root so SCRIPT_NAME= "".
E.g with config like
{
"listeners": {
"[::1]:8080": {
"pass": "routes"
}
},
"routes": [
{
"match": {
"uri": "/bar/baz/perl*"
},
"action": {
"pass": "applications/perl"
}
}
],
"applications": {
"perl": {
"type": "perl",
"script": "/home/andrew/src/perl/1060.pl"
}
}
}
The Perl script is loaded at module init time.
$ curl localhost:8080/bar/baz/perl/a/b/c
REQUEST_URI => /bar/baz/perl/a/b/c
PATH_INFO => /bar/baz/perl/a/b/c
I think those are right and match what the ruby module does for example.
AIUI the perl script here is always the root URI and any path given in the URI is then PATH_INFO
Yeah, perl (well, Plack) does not use SCRIPT_PATH in that way though, it's more taken to mean "url prefix". In the above case if 1060.pl is a Plack app then it would have to define its routes like: /bar/baz/perl/a/b/c. but it might already exist and define its routes as /a/b/c, in which case it would expect:
REQUEST_URI => /bar/baz/perl/a/b/c
PATH_INFO => /a/b/c
SCRIPT_PATH => /bar/baz/perl
Then, the script could know, to generate a URL for its route /page2, the whole required url is /bar/baz/perl/page2.
Hmm, OK, so I don't think we can set this automatically, looks like we will need something like a prefix option like we have in Python (not sue why we don't have such a thing in Ruby), that can default to the empty string if not set...
This also means that PATH_INFO will need to be adjusted accordingly.
@ac000 Actually, I think it is needed in Ruby... we tried setting a SCRIPT_NAME but it never worked properly. We ended up whipping together a middleware to extract a environment variable to get around it, which is non-standard and brittle since it's not standard.
Hey folks!
Here's an initial implementation of SCRIPT_NAME for Perl. Better late
than never!
Here's the patch for interest/testing...
diff --git a/src/nxt_application.h b/src/nxt_application.h
index f5d7a9df..f00a3a4b 100644
--- a/src/nxt_application.h
+++ b/src/nxt_application.h
@@ -66,6 +66,7 @@ typedef struct {
typedef struct {
char *script;
+ char *prefix;
uint32_t threads;
uint32_t thread_stack_size;
} nxt_perl_app_conf_t;
diff --git a/src/nxt_conf_validation.c b/src/nxt_conf_validation.c
index 2099f887..8093fb92 100644
--- a/src/nxt_conf_validation.c
+++ b/src/nxt_conf_validation.c
@@ -134,6 +134,8 @@ static nxt_int_t nxt_conf_vldt_python_protocol(nxt_conf_validation_t *vldt,
nxt_conf_value_t *value, void *data);
static nxt_int_t nxt_conf_vldt_python_prefix(nxt_conf_validation_t *vldt,
nxt_conf_value_t *value, void *data);
+static nxt_int_t nxt_conf_vldt_perl_prefix(nxt_conf_validation_t *vldt,
+ nxt_conf_value_t *value, void *data);
static nxt_int_t nxt_conf_vldt_threads(nxt_conf_validation_t *vldt,
nxt_conf_value_t *value, void *data);
static nxt_int_t nxt_conf_vldt_thread_stack_size(nxt_conf_validation_t *vldt,
@@ -988,6 +990,10 @@ static nxt_conf_vldt_object_t nxt_conf_vldt_perl_members[] = {
.name = nxt_string("script"),
.type = NXT_CONF_VLDT_STRING,
.flags = NXT_CONF_VLDT_REQUIRED,
+ }, {
+ .name = nxt_string("prefix"),
+ .type = NXT_CONF_VLDT_STRING,
+ .validator = nxt_conf_vldt_perl_prefix,
}, {
.name = nxt_string("threads"),
.type = NXT_CONF_VLDT_INTEGER,
@@ -3171,6 +3177,34 @@ nxt_conf_vldt_argument(nxt_conf_validation_t *vldt, nxt_conf_value_t *value)
}
+static nxt_int_t
+nxt_conf_vldt_perl_prefix(nxt_conf_validation_t *vldt, nxt_conf_value_t *value,
+ void *data)
+{
+ nxt_str_t str;
+
+ nxt_conf_get_string(value, &str);
+
+ /*
+ * We could allow just '/', though I don't know how useful that is,
+ * but the way the code works that would result in stripping the
+ * leading '/' from PATH_INFO (which is wrong).
+ */
+ if (nxt_str_eq(&str, "/", 1)) {
+ return nxt_conf_vldt_error(vldt,
+ "The Perl \"prefix\" must not be '/'");
+ }
+
+ if (*str.start != '/') {
+ return nxt_conf_vldt_error(vldt,
+ "The Perl \"prefix\" must start with a "
+ "'/'");
+ }
+
+ return NXT_OK;
+}
+
+
static nxt_int_t
nxt_conf_vldt_php(nxt_conf_validation_t *vldt, nxt_conf_value_t *value,
void *data)
diff --git a/src/nxt_main_process.c b/src/nxt_main_process.c
index 060ead41..1693e7c2 100644
--- a/src/nxt_main_process.c
+++ b/src/nxt_main_process.c
@@ -255,6 +255,12 @@ static nxt_conf_map_t nxt_perl_app_conf[] = {
offsetof(nxt_common_app_conf_t, u.perl.script),
},
+ {
+ nxt_string("prefix"),
+ NXT_CONF_MAP_CSTRZ,
+ offsetof(nxt_common_app_conf_t, u.perl.prefix),
+ },
+
{
nxt_string("threads"),
NXT_CONF_MAP_INT32,
diff --git a/src/perl/nxt_perl_psgi.c b/src/perl/nxt_perl_psgi.c
index 807d1741..b20bd177 100644
--- a/src/perl/nxt_perl_psgi.c
+++ b/src/perl/nxt_perl_psgi.c
@@ -104,6 +104,8 @@ static CV *nxt_perl_psgi_cb;
static pthread_attr_t *nxt_perl_psgi_thread_attr;
static nxt_perl_psgi_ctx_t *nxt_perl_psgi_ctxs;
+static const char *nxt_perl_script_name;
+
static uint32_t nxt_perl_psgi_compat[] = {
NXT_VERNUM, NXT_DEBUG,
};
@@ -626,8 +628,25 @@ nxt_perl_psgi_env_create(PerlInterpreter *my_perl,
&r->method, r->method_length));
RC(nxt_perl_psgi_add_sptr(my_perl, hash_env, NL("REQUEST_URI"),
&r->target, r->target_length));
- RC(nxt_perl_psgi_add_sptr(my_perl, hash_env, NL("PATH_INFO"),
- &r->path, r->path_length));
+
+ if (nxt_perl_script_name != NULL) {
+ const char *pinfo;
+
+ if (strlen(nxt_perl_script_name) > r->path_length) {
+ goto fail;
+ }
+
+ RC(nxt_perl_psgi_add_str(my_perl, hash_env, NL("SCRIPT_NAME"),
+ nxt_perl_script_name,
+ strlen(nxt_perl_script_name)));
+
+ pinfo = nxt_unit_sptr_get(&r->path) + strlen(nxt_perl_script_name);
+ RC(nxt_perl_psgi_add_str(my_perl, hash_env, NL("PATH_INFO"),
+ pinfo, strlen(pinfo)));
+ } else {
+ RC(nxt_perl_psgi_add_sptr(my_perl, hash_env, NL("PATH_INFO"),
+ &r->path, r->path_length));
+ }
array_version = newAV();
@@ -1168,6 +1187,13 @@ nxt_perl_psgi_start(nxt_task_t *task, nxt_process_data_t *data)
perl_init.data = c;
perl_init.ctx_data = &pctx;
+ if (c->prefix != NULL) {
+ nxt_perl_script_name = strdup(c->prefix);
+ if (nxt_slow_path(nxt_perl_script_name == NULL)) {
+ goto fail;
+ }
+ }
+
unit_ctx = nxt_unit_init(&perl_init);
if (nxt_slow_path(unit_ctx == NULL)) {
goto fail;
@@ -1181,6 +1207,8 @@ nxt_perl_psgi_start(nxt_task_t *task, nxt_process_data_t *data)
nxt_perl_psgi_ctx_free(&pctx);
+ free((void *) nxt_perl_script_name);
+
PERL_SYS_TERM();
exit(rc);
@@ -1193,6 +1221,8 @@ nxt_perl_psgi_start(nxt_task_t *task, nxt_process_data_t *data)
nxt_perl_psgi_ctx_free(&pctx);
+ free((void *) nxt_perl_script_name);
+
PERL_SYS_TERM();
return NXT_ERROR;
Some examples...
Firstly, if you don't specify prefix in the config then behaviour is
just as it is now.
However if we do something like
{
"listeners": {
"[::1]:8080": {
"pass": "applications/1060"
}
},
"applications": {
"1060": {
"type": "perl",
"script": "/home/andrew/src/perl/1060.pl",
"prefix": "/foo.pl"
}
}
}
#!/usr/bin/perl
my $app = sub {
my $env = shift;
return [
'200',
[ 'Content-Type'=>'text/plain' ],
[ "REQUEST_URI => " . $env->{REQUEST_URI} . "\n" .
"SCRIPT_NAME => " . $env->{SCRIPT_NAME} . "\n" .
"PATH_INFO => " . $env->{PATH_INFO} . "\n" ],
];
};
$ curl localhost:8080/foo.pl/a/b/c
REQUEST_URI => /foo.pl/a/b/c
SCRIPT_NAME => /foo.pl
PATH_INFO => /a/b/c
If I've understood things right, then I believe that is the correct behaviour.
SCRIPT_NAME has been stripped from PATH_INFO.
If you forget to specify the SCRIPT_NAME part, we will get some
different behaviour depending on the length of the REQUEST_URI, e.g
$ curl localhost:8080/a/b/c
<!DOCTYPE html><title>Error 503</title><p>Error 503.
That is due to this check
+ if (strlen(nxt_perl_script_name) > r->path_length) {
+ goto fail;
+ }
That check is really to stop bad things happening code wise, for example overrunning a buffer.
However if we do
$ curl localhost:8080/a/b/c/d
REQUEST_URI => /a/b/c/d
SCRIPT_NAME => /foo.pl
PATH_INFO => d
Which is all kinds of wrong.
I could put a more aggressive check in that makes sure that
REQUEST_URI starts with SCRIPT_NAME (if set).
But would then need to decide what to do if it doesn't, return a 503?
or act like SCRIPT_NAME wasn't set?