hop icon indicating copy to clipboard operation
hop copied to clipboard

[Feature Request]: An improved method to resolve variable expressions

Open mattcasters opened this issue 2 years ago • 18 comments

What would you like to happen?

Right now, if we have a variable expression, for example ${MY_VARIABLE}, any value that we have on the books for MY_VARIABLE in the local IVariables instance will be substituted and returned during IVariables.resolve().

The API that is used StringUtil.substitute() already supports recursive or nested variable resolution. However, this method is not used anywhere right now.

It would be great if we could give Hop users a way to signal that recursive or nested substitution of variables should take place. Some suggestions come to mind:

  • A new way to specify variables, for example with round brackes $(MY_VARIABLE)
  • A prefix system to the variables: ${NESTED(2):MY_VARIABLE}, for example to resolve the variable 2 levels deep.

If we think about the second prefixing system we can even think of a pluggable, configurable, system which would allow us to do more than just nested variables. For example we could build a ${ENCODED:DATABASE_USER}, for example if you want to obfuscate or encrypt the username of a database (not just its password) in an environment properties file.
We could combine prefixes, for example: ${ENCODED(EAS2):NESTED(1):MY_VARIABLE}. Other ideas are ways to get values from a secrets vault or resolve a value from a certain web-service.

The counter argument to all this is that it's a very cryptic way of doing things. Even as I'm typing this I'm not liking it. Perhaps we can think of a pluggable system attached to the lifecycle environment in which we're allowing these options for variables. That way we can build proper user interfaces reflected in proper configuration files and metadata.

Issue Priority

Priority: 3

Issue Component

Component: API

mattcasters avatar Nov 24 '23 08:11 mattcasters

+1 on prefixing, at least for the provider (AES/ENCRYPTED/MYVAULT).

As for the resolving of nested variables, I lean towards making the default way to handle variables and keep the syntax simple/optional. I see something like this working (just thinking out loud):

${MY_VARIALBE} -> full nested lookup ${::MY_VARIALBE} -> full nested lookup -> level 0 ${:1:MY_VARIALBE} -> full nested lookup -> level 1

${AES:MY_VARIABLE} -> full nested with AES provider ${AES::MY_VARIABLE} -> level 0 ${AES:1:MY_VARIABLE} -> level 1 deep

hansva avatar Nov 24 '23 09:11 hansva

The counter argument that I saw was that in development you might just want to use clear-text passwords and in production secrets are stored in a vault. Mind you, I'm not opposed to the prefixes. I just think that having this 'symbolic link' functionality could be cool to have as well.

mattcasters avatar Nov 24 '23 13:11 mattcasters

For any environment you're in it would be cool to have a 'symbolic link' feature for variables. You would be able to map a variable name to one or more operations or even the execution of a pipeline.

  • Nested resolution
  • Decode/decrypt a value
  • Look up in secrets vault
  • Execute a Hop pipeline to retrieve a value (ex: get a token from EntraID to use in the whole environment).
  • ...

mattcasters avatar Nov 24 '23 13:11 mattcasters

If we have variable nesting working your variable in the pipeline/database metadata can be ${MY_VARIABLE} and in the environment in development it would be "my cleartext password" in production it could be ${vault:secret}. This way you can hit both birds with one stone :)

hansva avatar Nov 24 '23 13:11 hansva

Yes indeed. One the one hand you want some sane defaults, nothing too much to configure, just to get some basic functionality like nested resolution. On the other hand we'll have users who need a bit of extra hand-holding to configure vault access or which pipeline to execute to get a value. Is there a risk to enabling nested resolution one level deep by default, besides performance? Perhaps we can have a system variable like HOP_VARIABLE_RESOLUTION_DEPTH with default set to 0 (used now).

mattcasters avatar Nov 24 '23 13:11 mattcasters

Question remains that we need a place to store the environment configuration metadata. The project metadata isn't really a suited location for it. We probably need a separate metadata location, alongside the environment configuration JSON file(s).

mattcasters avatar Nov 24 '23 13:11 mattcasters

Sorry for the stupid question but what do we intend by variable's depth? That is not clear to me

sramazzina avatar Nov 24 '23 18:11 sramazzina

for example you can have a variable that is ${MY_VARIABLE} but that variable could be configured in the environment with a value of ${MY_OTHER_VARIABLE} which could then be specified in hop-config as ${HOP_CONFIG_FOLDER} so then it goed 2 variables deep

hansva avatar Nov 24 '23 19:11 hansva

Cristal clear thanks for the explanation

sramazzina avatar Nov 25 '23 09:11 sramazzina

I was thinking about a simple generic plugin type to do a value lookup. The lookup could apply to one or more types (variables in our case) and have to be ability to return either a single value or a list. We could use it in various places to give back all sorts of values, in the GUI mostly, but also in things like variable replacement.

mattcasters avatar Nov 25 '23 15:11 mattcasters

Sharing my perspective as a devoted user with a deep appreciation for your work on Hop:

  1. The notion of utilizing a keyword for decrypting variables resonates with me. Presently, the support for variable decryption seems a bit arbitrary, introducing uncertainty about which values will undergo decryption.

  2. In a recent project that involved recursive variable replacement Olymp - SnippetService, I applied a methodology where the replacement process persists until no more variables can be identified in the string. This recursive approach naturally extends to the deepest level by default. This leads me to consider whether it is essential to introduce a specific keyword for nesting in Hop, or if the tool could seamlessly replace variables as long as they persist in the result (provided that the result is not identical to the input to prevent infinite loop, of course).

for example you can have a variable that is ${MY_VARIABLE} but that variable could be configured in the environment with a value of ${MY_OTHER_VARIABLE} which could then be specified in hop-config as ${HOP_CONFIG_FOLDER} so then it goed 2 variables deep

vdwals avatar Nov 27 '23 10:11 vdwals

Great points Dennis. I guess I'm more concerned with backward compatibility. I'm thinking specifically about the people that built around the current limitations of variable replacement. Depth indeed doesn't mean all that much. A limit was put on it to avoid some infinite loop in the current code. That itself is asinine since you can simply see that the output is the same as the input and just give up right there and then.

mattcasters avatar Nov 27 '23 13:11 mattcasters

So the plot thickens. I was writing a few extra unit tests for StringUtil, the place where the action is taking place. It turns out we already do recursive variable replacement.

mattcasters avatar Nov 29 '23 09:11 mattcasters

Then in this case it might be an error I introduced in the PROJECT_HOME extension point. I will take a look

hansva avatar Nov 29 '23 09:11 hansva

That doesn't mean the other ideas aren't valid. Working with a prefix does show the intent of the variable replacement clearly. ${AES:SOME_SECRET} is pretty clear once you know what's going on.

mattcasters avatar Nov 29 '23 09:11 mattcasters

The idea to have a new 'Lookup' plugin type in core is meant to allow all plugins to contribute and use without too many class loader issues. It also allows us to integrate with metadata. For example, we might have the "Projects" plugin that knows about the way to access a secrets vault, stored in some metadata folder somewhere. It would expose a lookup plugin with prefix VAULT. The StringUtil code would simply need to look in the core lookup plugins which lookup this represents. We wouldn't have to refactor thousands of resolve() calls across the codebase.

mattcasters avatar Nov 29 '23 09:11 mattcasters

Does this mean if we are setting a variable in a transform "Set Variables", if we put ${AES:MY_STRING} that the step would interpret that to be "Use AES to encrypt the string when setting the value from the stream." If on the other hand we were using a Get Variables transform and used ${AES:MY_STRING}, should we expect clear text to come out from an encrypted value? Is this just metadata? (encrypted_yn, and encryption_used = 'AES')?

usbrandon avatar Dec 07 '23 20:12 usbrandon

@usbrandon Variable expressions are basically used to look up a value somewhere. Setting a value in a Hop context can either be a clear text value or indeed another variable expression like the one you mentioned. No encoding or decoding is done at the time of setting the variable, only when the variable expression is decoded and the actual value is looked up. The results of the lookup are in most cases indeed a clear text value since that's what the transform, action or metadata element needs at runtime. I guess the main point of keeping the expression intact as long as possible is to avoid keeping the clear text secret in memory , to have it available only locally where needed, and not in the whole Hop runtime context. When the secret is needed a lookup is done in a config file, an asymmetric decryption is done, a secrets vault is queried, a token is requested somewhere, and so on.

mattcasters avatar Dec 07 '23 21:12 mattcasters