subprocess icon indicating copy to clipboard operation
subprocess copied to clipboard

Differences in the behavior of subprocess and unix shell

Open garrett-is-a-swann opened this issue 1 year ago • 1 comments

Hi, cool project! I was excited to use this for a build system rewrite, but I'm running into a few issues. Seems to be a parity issue between when using this library?

The first issue I noticed was if I attempt to execute gcc from a subprocess::command, I get the error:

> gcc -g -o path/to/file.o  -c path/to/file.c ...
gcc: fatal error: cannot execute 'cc1': execvp: No such file or directory

This same command runs fine in a shell outside of the cpp executable/subprocess. This got me investigating other commands...

    std::string cmd = "env";
    std::cout << "> " << cmd << "\n";
    int code = (subprocess::command{cmd} > output).run();
    std::cout << "(" << code << ") < " << output << std::endl;
> env
(0) <

env doesn't print anything, but echo $PATH prints the correct path?

As a sanity check to see if this was localized to the library, I tried using a simple exec to popen function:

std::string exec(const char* cmd) {
    std::array<char, 128> buffer;
    std::string result;
    std::unique_ptr<FILE, decltype(&pclose)> pipe(popen(cmd, "r"), pclose);
    if (!pipe) {
        throw std::runtime_error("popen() failed!");
    }
    while (fgets(buffer.data(), static_cast<int>(buffer.size()), pipe.get()) != nullptr) {
        result += buffer.data();
    }
    return result;
}

int main() {
    std::string cmd = "env";
    std::cout << "exec: " << cmd << std::endl;
    std::cout << exec(cmd.c_str()) << std::endl;

    std::cout << "subprocess: " << cmd << "\n";
    (subprocess::command{cmd} > output).run();
    std::cout << "< " << output << std::endl;
}
exec: env
... env printed ...
subprocess: env
< 

...where, as above shows, the exec function works as expected but the subprocess command does not? This is also the case when I supply the gcc command I mentioned above -- the simple exec function works, the subprocess one does not.

Any idea what the issue could be? I tried to investigate the docs, but the site does not seem to be working currently, so reaching out here.

Thanks for any 👀/ help!

garrett-is-a-swann avatar Dec 30 '24 22:12 garrett-is-a-swann

I managed to fix the issue -- I was caught off guard because echo $PATH seemed to work as expected, whereas env didn't. This is because the library uses wordexp (before spawning the child process) which injects the evars when it still has access to the environment variables.

To get the subprocess library working as I wanted above, one can edit the posix_spawnp call in subprocess.hpp to include the environ global:

  if (int err{::posix_spawnp(&pid, sh.argv()[0], action.get(), nullptr, sh.argv(), environ)}; err != 0)
  // ...

This could probably be included as an opt-in feature for the library, maybe at command initialization, ie:

(subprocess::command{.cmd = "env", .keep_env = true}).run()
  if (int err{::posix_spawnp(&pid, sh.argv()[0], action.get(), nullptr, sh.argv(), keep_env_? environ: nullptr)}; err != 0)
  // ...

One additional question: Why is wordexp used? Is this just to breakup the command into argv? If so, I found that wordexp has some somewhat nasty side effects that imo makes it not the best functionality for this usecase, especially since the criteria wordexp expects don't seem to be documented in the REAME.

Specifically, per the wordexp man page,for the string-command argument wordexp expects:

...the same as the expansion by the shell (see sh(1)) of the parameters to a command, the string s must not contain characters that would be illegal in shell command parameters. In particular, there must not be any unescaped newline or |, &, ;, <, >, (, ), {, } characters outside a command substitution or parameter substitution context.

This, in my opinion, is a rather big gotcha and also a bit of an issue. For example, I would expect that ("echo Hello world | grep Hello"_exp).run() and ("echo Hello world"_exp | "grep Hello").run() to be synonyms, but since the prior is two commands, and furthermore includes the pipe character, will actually segfault when the library attempts to call posix_spawnp since sh.argv() will return an empty wordlist because wordexp fails to expand a text with a pipe character. This also fails if you have a command like printf Hello World\n, which is actually how I found the issue.

A better approach might be to just split words manually? Another argument might be that we're technically performing word-expansion twice -- once with word exp, and second in the subprocess when the shell actually parses the command its given.


.... Nevermind, I played around with this. Turns out this won't work seamlessly since posix_spawnp expects to execute a single process, so specifying many with pipes is not allowed (the shell will attempt to treat the entire argv as a part of the single-command's arguments.

To get this to work, we would need to manually parse commands for nested commands (echo $(env)), and chained commands env | grep PATH, etc), and treat those as subcommands too...

This still may be a "better" approach, as it seems that wordexp is likely implemented as executing a subprocess in order to achieve the nested-command expansion properties it is able to perform. That is unnecessary overhead for simple commands where words can be easily counted (env, echo hello world, etc).

That being said, probably not the absolute biggest deal, when the primary benefit is some ergonomic gains, some performance gains in an already somewhat costly fork process, and being able to include endlines in commands...

garrett-is-a-swann avatar Jan 01 '25 00:01 garrett-is-a-swann