oathkeeper Consider jsonnet for templating

You use Go templates for generating your Json configuration parameters. That technology is well suited for unstructured text.

A better match for JSON generation is Jsonnet. There is a Go version of Jsonnet called go-jsonnet. I use it in production.

FYI, Jsonnet is used heavily by Databricks.

Please consider using Jsonnet as an alternative to Go templates.

Apr 28 '20 23:04 derrickburns

Interesting, JSONNet came up in another project. Maybe we could have a render engine to support (and make obsolete) the go template. Could you maybe give some examples of what that would look like for some of the rules?

Apr 29 '20 07:04 aeneasr

In your templating examples, you often pass JSON as strings. For example, I see

      "payload": "{\"subject\": \"{{ print .Subject }}\", \"resource\": \"{{ printIndex .MatchContext.RegexpCaptureGroups 0 }}\"}"

This encodes a json objects as a string. If you NEED this, then you can wrap any of the right hand sides in my examples with a call to std.manifestJson. Otherwise, when you evaluate the jsonnet with an embedded jsonnet interpreter, it will serialize the result as a json object.

The trick is to pass to the interpreter values for the external variables. You do this using command line arguments to the interpreter. In your case, you would pass in value for session, andextra.

Here are your Go Templating examples, followed by their representation using Jsonnet.

Go Template

{ "config_field": "{{ print .subject }}" }

{ "config_field": "{{ print .Extra.some.arbitrary.data }}" }

{
  "claims": "{\"aud\": \"{{ print .Extra.aud }}\", \"resource\": \"{{ printIndex .MatchContext.RegexpCaptureGroups 0 }}\""
}

{
  "claims": "{\"aud\": \"{{ print .Extra.aud }}\", \"scope\": {{ printf \"%+q\" .Extra.scp }}}"
}

{
  "handler": "keto_engine_acp_ory",
  "config": {
    "required_action": "my:action:{{ printIndex .MatchContext.RegexpCaptureGroups 0 }}",
    "required_resource": "my:resource:{{ printIndex .MatchContext.RegexpCaptureGroups 1 }}:foo:{{ printIndex .MatchContext.RegexpCaptureGroups 0 }}"
  }
}

authorizers:
  remote_json:
    # Set enabled to "true" to enable the authenticator, and "false" to disable the authenticator. Defaults to "false".
    enabled: true

    config:
      remote: http://my-remote-authorizer/authorize
      payload: |
        {
          "subject": "{{ print .Subject }}",
          "resource": "{{ printIndex .MatchContext.RegexpCaptureGroups 0 }}"
        }

Jsonnet

{ config_field: session.subject }

{ config_field: extra.some.arbitrary.data }

{ 
  claims:  {
    aud: extra.aud,
    resource: session.MatchContext.RegexpCaptureGroups[0]
  }
}

{
  claims: {
      aud: extra.aud,
      scope: extra.scp
   }
}

{
  handler: "keto_engine_acp_ory",
  config: {
    required_action: session.MatchContext.RegexpCaptureGroups[0],
    required_resource: "%s:%s:%s" % 
       [ session.MatchContext.RegexpCaptureGroups[1], 
         "foo", 
         session.MatchContext.RegexpCaptureGroups[0] 
        ]
  }
}

{ 
  authorizers: {
    remote_json: {
    # Set enabled to "true" to enable the authenticator, and "false" to disable the authenticator. Defaults to "false".
      enabled: true,

      config: {
        remote: 'http://my-remote-authorizer/authorize',
        payload:  {
          subject: session.Subject,
          resource: session.MatchContext.RegexpCaptureGroups[0]
        },
     },
  }
}

Apr 29 '20 23:04 derrickburns

As you can see, the Jsonnet is much easier to read, write, and debug.

And none of my examples touch on the power of the language.

Apr 29 '20 23:04 derrickburns

Yeah - this makes a ton of sense. Do you know if it is possible to disable functions? Or do you always have the full spec at hand? I would like to avoid executing tons of logic in the JSONNet definitions.

Performance is also a question, right now there is almost zero overhead because we cache the compiled templates and everything else is just a string substitution. How would that change with JSONNet?

If you have any guidance here, that would be helpful. By the way, if you'd like to take charge of this issue, let me know. I will probably not be able to work on this in the next months.

Apr 30 '20 12:04 aeneasr

I don't know anything about performance of jsonnet per se. However, my sense in general is that you should measure first.

Also, jsonnet is a pure functional language with referential transparency. This means that if the inputs don't change, the outputs don't change. So if you are interested in performance, one could hash the inputs and cache the outputs.

I don't have time now to work on this. Perhaps later

Apr 30 '20 14:04 derrickburns

After a bit of research I quickly found that both C++ and the Go version of the official JSONNet (written by Google) are insanely slow (talking minute-long evaluations in extreme cases). There appears to be a Scala adaption which is much faster by Databricks but well it's in scala and has no Go bindings (and probably never will have). There are ongoing issues on GitHub for this, see https://github.com/google/go-jsonnet/issues/111

As long as those aren't resolved or we come up with a very, very, very smart way of doing caching (it might still kill some requests with timeouts...) I don't see a way of adding that to the current request pipeline.

May 05 '20 10:05 aeneasr

I’ve used all of the versions that I mention. I also understand the internal algorithms that the scala version uses to make things faster. Basically the scala version uses memorization.

In your use case, I don’t think this matters. Your uses will be small jsonnet scripts where memorization is not that useful.

So, I don't think that by simply reading the performance analysis that Li did that you can come to a sound conclusion. :)

May 05 '20 12:05 derrickburns

In my particular case, the Scala version is much slower due to the startup penalty of the JVM.

May 05 '20 12:05 derrickburns

Interesting! I actually ran this (jsonnet-go) on some very small JSONNet - which would probably be used in our configs - and it's indeed in the below-ms range even with a fresh JSONNet VM created on the spot.

May 05 '20 12:05 aeneasr

Yes, with Performance work you always have to measure your use cases. Algorithmic asymptomatic order analysis and worst case analysis are great tools for understanding larger cases, but don't provide useful information for small cases.

May 05 '20 12:05 derrickburns

For example, bubble sort on nearly sorted arrays is much faster than quicksort on the same input. So if you know that your data is almost always sorted, then bubblesort may be your best performing algorithm.

May 05 '20 12:05 derrickburns

That said, I love Li's work. It is very very elegant. :)

May 05 '20 12:05 derrickburns

I am marking this issue as stale as it has not received any engagement from the community or maintainers in over half a year. That does not imply that the issue has no merit! If you feel strongly about this issue

open a PR referencing and resolving the issue;
leave a comment on it and discuss ideas how you could contribute towards resolving it;
open a new issue with updated details and a plan on resolving the issue.

We are cleaning up issues every now and then, primarily to keep the 4000+ issues in our backlog in check and to prevent maintainer burnout. Burnout in open source maintainership is a widespread and serious issue. It can lead to severe personal and health issues as well as enabling catastrophic attack vectors.

Thank you for your understanding and to anyone who participated in the issue! 🙏✌️

If you feel strongly about this issues and have ideas on resolving it, please comment. Otherwise it will be closed in 30 days!

Sep 21 '21 00:09 github-actions[bot]

Incorrect stalebot detection - this was assigned a milestone.

Sep 21 '21 05:09 aeneasr

Hello contributors!

I am marking this issue as stale as it has not received any engagement from the community or maintainers a year. That does not imply that the issue has no merit! If you feel strongly about this issue

open a PR referencing and resolving the issue;
leave a comment on it and discuss ideas how you could contribute towards resolving it;
leave a comment and describe in detail why this issue is critical for your use case;
open a new issue with updated details and a plan on resolving the issue.

Throughout its lifetime, Ory has received over 10.000 issues and PRs. To sustain that growth, we need to prioritize and focus on issues that are important to the community. A good indication of importance, and thus priority, is activity on a topic.

Unfortunately, burnout has become a topic of concern amongst open-source projects.

It can lead to severe personal and health issues as well as opening catastrophic attack vectors.

The motivation for this automation is to help prioritize issues in the backlog and not ignore, reject, or belittle anyone.

If this issue was marked as stale erroneous you can exempt it by adding the backlog label, assigning someone, or setting a milestone for it.

Thank you for your understanding and to anyone who participated in the conversation! And as written above, please do participate in the conversation if this topic is important to you!

Thank you 🙏✌️

Sep 23 '22 00:09 github-actions[bot]