idea: use a schema-pattern for bleep config
this is not a short-term request, just offering some informed opinion based on my experience with libraries I maintain.
Right now, bleep defines circe encoders/decoders directly, which is pretty standard and works very well. The problem with that is that it is easy for the json-schema to fall out of date, as one has to manually inspect the circe decoders and re-capture the details manually in a json schema description.
What I propose is that we apply a "schema-pattern", which is something that smithy4s (a library I maintain) uses extensively, and that allows for decoupling the data from the serialisation in such a way that you could always keep the json schema in sync with the json codecs.
- I have some literature written there : https://disneystreaming.github.io/smithy4s/docs/design/schemas
- I took inspiration from this talk https://www.youtube.com/watch?v=oRLkb6mqvVM
This would not only solve the current issue, but would enable, down the line, for very interesting usecases. In particular, I believe doing this would facilitate the writing of a bleep-specific LSP server that would give editor support when editing bleep.yaml files, as the circe decoders would not be well suited for that usecase. It'd mean that we could have bleep config decoders (decoupled from bleep itself) that would retain positional information, and make it easy to create semantic colouring for bleep files, as well as things like auto-completion of values, etc.
We could utilize Guardrail for that end. It is used to generate Scala code (including case classes and Circe codecs) from OpenAPI description -- OpenAPI is a superset of JSONSchema.
https://github.com/guardrail-dev/guardrail/
I'm also a huge fan of contract-first or contract-driven development! :+1:
guardrail would not allow for generating the kind of "short-hand" things that bleep is using, such as JsonList, etc. Moreover it wouldn't open the same door for future things (such as an LSP) like what I'm describing
Also a huge fan!
I'll do a quick writeup of requirements later, and we'll see what fits best :)
There is also https://www.scala-sbt.org/contraband/ which generates classes we can grow in a binary compatible way. Maybe even as java classes, I've been considering making the surface of bleep java-friendly.
I also have a wish to use https://github.com/plokhotnyuk/jsoniter-scala instead of circe. Json parsing consumes a lot of CPU at boot
we use jsoniter extensively in smithy4s, it's definitely a great library, but the errors it show OOTB are not human friendly at all. Moreover jsoniter is a really for parsing Json, not Yaml (there's no Json ADT involved, it goes directly from json bytes to data). You'd have to backtrack on the decision to use yaml, in order to use jsoniter, and I think Yaml is much better suited than Json for bleep.
Moreover, I think I've not made my point across : both guardrail and contraband are protocol-specific solutions, and though they would solve the problem of keeping the datatypes in sync with the schemas, they wouldn't account for the custom formats that bleep uses for datatypes, nor would they help with the other aspects of the tooling (like LSP) that would be really important for UX in the future.
I think a Scala-first solution for Bleep is better suited, but one that use an intermediate data-description mechanism instead of class typeclass-based codecs would solve the syncing solution between data and json-schema whilst letting bleep retain control of its "metamodel" (the building blocks that are used to build the config model, such as JsonList which is awesome)
to be clear : I'm not suggesting to pull smithy4s as a dependency in bleep, I'm suggesting for the same pattern that is used as a basis to smithy4s to be applied in bleep
You'd have to backtrack on the decision to use yaml, in order to use jsoniter, and I think Yaml is much better suited than Json for bleep.
I hadn't even considered that - that's disappointing. I was hoping for a free performance lunch waiting there :)
Moreover, I think I've not made my point across : both guardrail and contraband are protocol-specific solutions, and though they would solve the problem of keeping the datatypes in sync with the schemas, they wouldn't account for the custom formats that bleep uses for datatypes, nor would they help with the other aspects of the tooling (like LSP) that would be really important for UX in the future.
I'll read your links in more detail then, I thought the point was to have some kind of model - any model - from which we could generate classes, codecs, schemas and so on.
I'll read your links in more detail then, I thought the point was to have some kind of model - any model - from which we could generate classes, codecs, schemas and so on.
It'd certainly be possible, but quite contrived : IDLs are a good solution where the payloads are produced/consumed by programs. In this case, the payload (the configuration) is handwritten, and you have made what I think are very good decisions to ensure that the format is as convenient for the user as possible. In other words, without really realising it, you have designed your own "metamodel" (ie building blocks you use to model things) that doesn't abide by any pre-existing specification or protocol.
I think using code-generation to recreate the model you have would have diminishing returns : you'd have to come up with your own IDL (or re-use a protocol agnostic one like smithy and codify the specificities of your bespoke encoding as options/annotations), and write a bespoke code-generator for that, when the bleep ecosystem would probably be Scala centric. That'd be a lot of work for little return.
Instead, you can codify your custom metamodel as an abstraction :
trait BleepCodec[Codec[_]] {
def int : Codec[Int]
def string: Codec[String]
def jsonList[A](member: Codec[A]) : Codec[JsonList[A]]
def jsonSet[A](member: Codec[A]) : Codec[JsonSet[A]]
def struct[A](...) : Codec[A]
def union[A](...) : Codec[A]
def wrapped[A, B](underlying: Codec[A], wrap: A => B, unwrap: B => A): Codec[B]
// ...
}
and use this to construct a description of the data for each datatype present in your configuration type. So you end up having
case class BleepConfig(...)
object BleepConfig {
def derive[Codec[_] : BleepCodec] : Codec[BleepConfig]
}
and then you can have
implicit val circeEncoderCodec: BleepCodec[Encoder] = ???
implicit val circeDecoderCodec: BleepCodec[Decoder] = ???
type JsonSchema[A] = ???
implicit val jsonSchemaRepresentation: BleepCodec[JsonSchema] = ???
This has the quality of allowing for an always-in-sync codecs and descriptions, whilst being much easier to implement and test than a code-generator.
you have designed your own "metamodel" (ie building blocks you use to model things) that doesn't abide by any pre-existing specification or protocol.
just wanted to comment that in terms of codecs and schema, JsonList and friends can be fully described with sum types, type JsonList[A] = A | List[A]. The merge semantics can probably be kept out of it.
edit: oh, and they are always optional, but it should still be quite representable
re:
trait BleepCodec[Codec[_]] {
this is cool for making things stay in sync. does smithy do anything like this? I just want a model which can be interpreted by different tools, to generate classes, codecs, json schema and whatever not. I don't really want to NIH that particular piece of the stack, hehe
Smithy4s is built on this very idea. I mean I could probably show you what the bleep config model would look like in smithy syntax, and run smithy4s on it to show you what the Scala generated code would look like if you'd like.
I mean if you want to work on that, fine. It's interesting, so I'd appreciate it. But it's not exactly on my shortlist of things to do before a more public release :)
Yeah and I don't think it should be, just wanted to raise the issue because I know that things falling out of sync are a pain and I've got some reasonably extensive knowledge on various solutions to that very problem