node icon indicating copy to clipboard operation
node copied to clipboard

[--env-file] Space between = and " will cause the value to be parsed as an unquoted string

Open panzi opened this issue 1 year ago • 11 comments

Meaning if you have this in your .env file:

FOO= "BAR"

This command:

node --env-file=.env -e 'console.log({ FOO: process.env.FOO })'

Will print this:

{ FOO: '"BAR"' }

I.e. the variable value includes the quotes. (The spaces aren't included in the value, though.) I think this is unexpected.

If this is not intentional there needs to be white space skipping before this line:

https://github.com/nodejs/node/blob/9495a2bf1f20f853e90c8208741ad53448df986a/src/node_dotenv.cc#L153

panzi avatar Jun 15 '24 01:06 panzi

Thanks for the bug report! Feel free to submit a PR with your requested changes, and someone will review it.

avivkeller avatar Jun 15 '24 01:06 avivkeller

Sorry, I'm not interested in contributing C++ code to NodeJS at the moment. I just noticed a lot of quirks with... well basically almost all dotenv parsers out there and am currently comparing them for fun. There are many more quirks in NodeJS's dotenv parser, but the others are for a bit more unusual situations. Having a space before the quote isn't that unusual though, I think. The other quirks that I'm just now writing down (copied form that document) are (in case you're interested):

This dialect supports strings quoited in double quotes ("), single quotes (') and back ticks (`). These strings can be multi-line, but only in double quoted strings \n will be translated to newlines.

If the second quote is missing only the current line is used as the value for the variable. Parsing of more variables continues in the next line!

Quotes start with #. There doesn't need to be a space before the #.

Keys may contain anything except spaces ( ), including tabs and newlines. Meaning this:

FOO#=1
BAR
=2

Is equivalent with this JSON:

{ "FOO#": "1", "BAR\n": "2" }

Lines with syntax errors (i.e. no =) are silently ignored, but they will trip up the parser so that the following correct line is also ignored.

Leading export will be ignored. Yes, the export needs to be followed by a space. If its a tab its used as part of the key.

panzi avatar Jun 15 '24 02:06 panzi

No problem! The parser specification hasn't been done yet, but it's being discussed at https://github.com/nodejs/node/issues/49148.

@IlyasShabi i know you worked on writing a specification, any insight? (Thanks!)

avivkeller avatar Jun 15 '24 02:06 avivkeller

@panzi There is no proper spec for this feature, the only reference is "whatever dotenv does." Do you happen to know what the expected behavior is, based on that?

tniessen avatar Jun 15 '24 11:06 tniessen

Dotenv skips the white space

var dotenv = require("dotenv")
console.log(dotenv.parse("a= 'b'"))
// {a: "b"}

For the other case:


var dotenv = require("dotenv")
console.log(dotenv.parse(`
FOO#=1
BAR
=2
`))
// {BAR: "2"}

avivkeller avatar Jun 15 '24 11:06 avivkeller

IIUC, this is your list of quirks:

  1. This dialect supports strings quoted in double quotes ("), single quotes ('), and backticks (``).
    • These strings can be multi-line, but only in double-quoted strings \n will be translated to newlines.
  2. If the second quote is missing, only the current line is used as the value for the variable.
    • Parsing of more variables continues in the next line.
  3. Quotes start with #.
    • There doesn't need to be a space before the #.
  4. Keys may contain anything except spaces, including tabs and newlines.
  5. Lines with syntax errors (e.g., no =) are silently ignored.
    • They will cause the parser to ignore the following correct line as well.
  6. Leading export will be ignored.
    • The export needs to be followed by a space. If it is a tab, it is used as part of the key.

avivkeller avatar Jun 15 '24 11:06 avivkeller

Nice bug report. I'll fix it.

anonrig avatar Jun 15 '24 12:06 anonrig

@panzi There is no proper spec for this feature, the only reference is "whatever dotenv does." Do you happen to know what the expected behavior is, based on that?

While this implementation is not behaving quite like the npm dotenv package I'm not sure if emulating that behavior 100% is even a good idea. That package parses this:

FOO
=BAR

As:

{ "FOO": "BAR" }

Is that a good idea?

Just for fun I'm looking at different dotenv implementations, write down how they work and try to implement a compatible parser in Rust (except for deliberate differences where e.g. I won't import a Unicode database to resolve named Unicode escape sequences (Python dotenv-cli), don't emulate obvious mangled encoding handling (Python dotenv-cli), and don't implement what I consider arbitrary code execution vulnerabilities (Ruby dotenv)). Here is what I have written down so far, in case you are interested in maybe comparing the sections about NodeJS and JavaScript dotenv: https://github.com/panzi/punktum#nodejs-dialect I'm sure I have lots of English spelling mistakes and the document could be structured better. It's not finished in any case.

Looking at all the dotenv implementations I don't know what I'd expect anymore, except that white space around keys and values isn't significant.

panzi avatar Jun 16 '24 01:06 panzi

While this implementation is not behaving quite like the npm dotenv package I'm not sure if emulating that behavior 100% is even a good idea.

I am not claiming that it's a good idea, but as far as I can tell, it is what's been happening thus far. And deviating from that convention ad-hoc, without a proper specification, IMO doesn't seem like a good idea either.

tniessen avatar Jun 16 '24 12:06 tniessen

Just for fun I wrote my own implementation of the JavaScript dotenv parser in C++, trying to be 100% compatible. I haven't found an edge case where it produces the any different output to the original. I've licensed it under MIT, if you want you can just copy-paste it into NodeJS. If you prefer any other open source license I can do that, too. See: https://github.com/panzi/cpp-dotenv

I'm not a C++ pro, so things might not be 100% ideal. Also I write the environment into an std::unordered_map<std::string, std::string> and use some C++20 features (std::string_view::find_first_not_of() and std::string_view::find_last_not_of()). See also all the comments that say "SYNTAX ERROR". While the original just ignores these, and I also ignore them the same way the original does, you might want to log errors there. The original supports DOTENV_CONFIG_DEBUG=true, but it only logs things like that it couldn't read the .env file, that there where encoding errors (this C++ version ignores encodings, it will work with ASCII, ISO-8859-1, and UTF-8, since it only compares with ASCII characters), or that it did/didn't overwrite an pre-existing environment variable (depending on DOTENV_CONFIG_OVERRIDE=ture - yes, override, not overwrite). Instead maybe you could make it so it logs these syntax errors when that environment variable is ste to true? In my (also just for fun) Rust implementation I did just that.

DOS line ending support needs testing. And might be possible to somehow implement without always allocating a new string.

panzi avatar Jun 17 '24 01:06 panzi

I found yet another edge case where my (and the NodeJS) implementation behaved differently than the JavaScript implementation and fixed it in my code. Would you like to use that code in NodeJS? I didn't submit it as a pull request because I assume building a huge project like NodeJS might be a hassle and that's just some fun thing I did in my spare time. Maybe I'll do that anyway at some point, but only if there is interest in this.

panzi avatar Jun 26 '24 15:06 panzi

I would like to work on it if it is still up for grabs

thisalihassan avatar Jul 07 '24 22:07 thisalihassan

I would like to work on it if it is still up for grabs

IIRC @anonrig said he'd work on it, so you'd have to ask him.

avivkeller avatar Jul 07 '24 23:07 avivkeller

If you don't just use my code you can still look at the source comments about correctly emulating the behavior of the regular expression of the original (without using regular expressions in C++).

panzi avatar Jul 07 '24 23:07 panzi

Fixed by #53786

avivkeller avatar Jul 09 '24 16:07 avivkeller