schematic test: fine tune property tests

Draft disclaimer

i'm not married to anything here, and would appreciate any and all feedback

Description

the problem

allowing "blueprint"-style map schematics at the root-level of a list/1 schematic means that there is a chance (higher chance the more you run the property test) that you end up with a oneof/1 schematic to cover a list with a variety of maps in it where one of the map/1 schematics in that oneof is a highly permissive one, like map(%{}), which will readily unify with any map and then drop all of the keys when it is dumped out

the solution?

see the discussion below. TL;DR: we're not so sure

an additional property (or several) can at least be added to describe the behaviours of map schematics, particularly when there are unspecified keys that will be lost during unification

Closes #22

Specific requests for feedback

in our thread for #22 we discussed limiting the generation space for the input |> unify |> dump == input test; is this going too far?
- related: i tried re-writing the generator to allow for a specific option to be passed in to avoid lists with a root-level map schematic defined using a "blueprint", but the resulting options (not to mention implementation) started to feel very cumbersome and confusing. it seems to me that having that level of granularity would be more trouble than it's worth

TODOs

[ ] add a follow-up property test to describe the behaviour of map blueprint schematics, i.e. "unspecified keys in a map blueprint are discarded when dumped"

May 25 '23 21:05 clark-lindsay

Did you try not generating maps with empty blueprints?

May 25 '23 22:05 mhanberg

Did you try not generating maps with empty blueprints?

i believe i did, but i don’t see it in my notes. i’ll do it again just to be sure

May 25 '23 22:05 clark-lindsay

i just tried ensuring that no empty-map blueprints are generated by matching on %{} and returning a K-V schematic like: Schematic.map(keys: Schematic.nullable(nil), values: Schematic.nullable(nil)), but that fails the property very quickly. maybe that approach is too naive?

i'm struggling to come up with a K-V schematic that would match %{} and nothing else, because if it doesn't do that then you end up in the same scenario as before, right? i guess you could do a raw/1 and make it always fail to unify for any value... let me try that

EDIT: ah, right, it just ignores non-conforming keys and lets it unify.

May 25 '23 23:05 clark-lindsay

i would just omit the empty blueprint from the generated input space.

there is probably an argument that an empty blueprint should be illegal, or should not unify if there are keys in the input.

May 25 '23 23:05 mhanberg

maybe i'm not thinking straight at this point in my day, but it seems to me that the property just doesn't hold as soon as you mix oneof/1 with the semantics of map/1, since the latter will drop keys that don't fit the schematic but can still successfully unify

e.g. if we have the list:

[%{"a" => 1}, %{"a" => 1, "b" => "foo"}]

then we might generate a schematic like:

[
   map(keys: str(), values: int()),
   map(keys: str(), values: oneof([str(), int()]))
]
  |> oneof()
  |> list()

which would unify both of the values using the first schematic, with the extra K-V in the second map being ignored, and then the property would be violated

May 26 '23 03:05 clark-lindsay

I might have to noodle on this.

May 26 '23 03:05 mhanberg

naively, i guess you could reduce over all of the schematics in a oneof and return the result of the unification that maintained the most data in the input? just spit-balling. definitely a performance hit there too

May 26 '23 03:05 clark-lindsay

Yeah when implementing the initial version I thought of how to determine that and I'm not sure if that really is possible.

Unless you can somehow determine which schematic had the highest cardinality, sort by that, then attempt to unify.

Might be possible, but I'm also about to fall asleep so might be typing gibberish.

I feel like I should read a white paper on type systems.

May 26 '23 03:05 mhanberg

hm, yeah. we might need a lot more info about the schematic than we currently have.

alternatively, once you call unify for one of the oneof schematics, and it's successful, and it's a map, couldn't we just hold onto that value and wait and see if another schematic in the oneof unifies, and then compare the size of the results via tree traversal? basically just "find the max" where the magnitude function is defined by a tree traversal

i think that whichever one is largest would have used the "most specific" schematic, and would thus be the best choice for unification.

but yeah, it's late. maybe Quinn or Keathley can suggest the appropriate paper(s)

May 26 '23 03:05 clark-lindsay

alternatively, once you call unify for one of the oneof schematics, and it's successful, and it's a map, couldn't we just hold onto that value and wait and see if another schematic in the oneof unifies, and then compare the size of the results via tree traversal? basically just "find the max" where the magnitude function is defined by a tree traversal

i think that whichever one is largest would have used the "most specific" schematic, and would thus be the best choice for unification.

i just went and implemented this and it does work as expected, and would handle a case like the one i laid out in a previous comment, but property testing can naturally still find the cases that break it.

at this point, i think i'm in favor of removing maps from that first property altogether, and coming up with a different set of properties that holds when maps are involved. unless you think that this property (or something very similar) should hold, regardless of the data-types involved, at which point we've probably got some reading to do 🤷

May 26 '23 20:05 clark-lindsay