Eraser nodes being placed where they shouldn't
Reproducing the behavior
As Nicolas pointed out a few days ago, some programs that should be equivalent return different results, with a few eraser nodes being placed in places they shouldn't. Nicolas suspects it's an issue with duplication, but I still haven't found it.
See the following program in Bend:
map f xs = match xs {
List/Cons: (List/Cons (f xs.head) (map f xs.tail))
List/Nil: List/Nil
}
list = [1,2,3]
id = @x x
main = (map @f (map f list) [id])
It gets compiled to this, which works correctly:
@List/Cons = (a (b ((@List/Cons/tag (a (b c))) c)))
@List/Cons/tag = 1
@List/Nil = ((@List/Nil/tag a) a)
@List/Nil/tag = 0
@id = (a a)
@list = c
& @List/Cons ~ (1 (b c))
& @List/Cons ~ (2 (a b))
& @List/Cons ~ (3 (@List/Nil a))
@main = b
& @map ~ (@main__C0 (a b))
& @List/Cons ~ (@id (@List/Nil a))
@main__C0 = (a b)
& @map ~ (a (@list b))
@map = (a ((@map__C1 (a b)) b))
@map__C0 = (* (a (d ({(a b) c} f))))
& @List/Cons ~ (b (e f))
& @map ~ (c (d e))
@map__C1 = (?(((* @List/Nil) @map__C0) a) a)
Notice how main contains a call to main__C0. If we replace it with its value, we get
@List/Cons = (a (b ((@List/Cons/tag (a (b c))) c)))
@List/Cons/tag = 1
@List/Nil = ((@List/Nil/tag a) a)
@List/Nil/tag = 0
@id = (a a)
@list = c
& @List/Cons ~ (1 (b c))
& @List/Cons ~ (2 (a b))
& @List/Cons ~ (3 (@List/Nil a))
@main = e
& @map ~ ((a b) (d e))
& @map ~ (a (@list b))
& @List/Cons ~ (@id (@List/Nil d))
// @main__C0 = (a b)
// & @map ~ (a (@list b))
@map = (a ((@map__C1 (a b)) b))
@map__C0 = (* (a (d ({(a b) c} f))))
& @List/Cons ~ (b (e f))
& @map ~ (c (d e))
@map__C1 = (?(((* @List/Nil) @map__C0) a) a)
The original program returns
Result: ((@List/Cons/tag (((@List/Cons/tag (1 (((@List/Cons/tag (2 (((@List/Cons/tag (3 (@List/Nil v89))) v89) v8e))) v8e) v93))) v93) (@List/Nil va4))) va4)
while the other version returns
Result: ((@List/Cons/tag (((@List/Cons/tag (1 (((@List/Cons/tag (* (((@List/Cons/tag (* (@List/Nil vc4))) vc4) vb3))) vb3) va2))) va2) (@List/Nil vec))) vec)
This happens in all implementations of HVM.
System Settings
- OS: MacOS
- CPU: Apple M2
Additional context
No response
If I disable the REF-DUP optimization, I get a very different result (((@List/Cons/tag (((1 (1 (((1 (* (((1 (* (((0 vce) vce) vc4))) vc4) vb3))) vb3) va2))) va2) (@List/Nil vfc))) vfc)), so presumably the safety check is missing some cases.
I've tried a few other examples of the same pattern that don't result in the same bug and, as expected, they always have a difference of 1 interaction between the original generated HVM and the equivalent change. When the bug does happen, the variation in number of interactions is different (in the original example, the "good" version does 109 interactions and the "bad" does 138).
When disabling the REF-DUP optimization, like @CatsAreFluffy said, although the final result is wildly different, we get the expected behaviour of the number of interactions having only a difference of 1. I think that's a clue that the bug does not happen when we remove that optimization, although that could just be because the wrong interactions are being performed.
Could this be related to the SWIT-COMM being problematic, as @tjjfvi pointed out? We need to change the SWIT rules to use SWI nodes, isntead of CON nodes.
I think this is a combination of the following, safe is always set to true: https://github.com/HigherOrderCO/HVM/blob/21d630f488f1627a075ab72e8d6e422dd069d5f8/src/ast.rs#L507-L516 And that the REF-DUP optimization has no warning on C or Cuda, only on the rust implementation.
Also, as checked with Nicolas earlier today, the example in the issue (and also the original example that led to this) should not be valid, since they clone non-affine global references. It's probably just a coincidence that they return the "correct" result, and not the expected error message ERROR: attempt to clone a non-affine global reference.
When setting all definitions as safe: false in the code Enrico mentioned above, we get the expected error message in the Rust implementation, although not on the C implementation, since it doesn't perform the same checking in the REF-DUP optimization.
After digging some more into this issue, we think that both versions of the HVM program are invalid and should somehow error out, since both clone non-affine lambdas.
PR #379 improves safety checking so the first version errors as expected, but it's not enough for the second version. It marks every definition that contains a DUP as unsafe, and propagates that unsafety to all definitions that depend on those. Since main__C0 in the first example is then marked as unsafe, when it's REF in main interacts with a DUP we error out as expected.
https://github.com/HigherOrderCO/HVM/blob/1c6ca2e90d5984d7afa28386cd2ac86755d6bda9/src/hvm.rs#L657-L669
Since the second version expands that reference and clones it directly, this safety check is not reached.
For now, we can say that the eraser nodes being placed in such occasions are not necessarily a bug, since they're the result of undefined behavior. Nicolas told me adding an affine type checker to the language would be able to catch such errors at compile time, so it's something we could do in the medium-long term.
I think while this can be UB for bend it shouldn't be for HVM, though some flag perhaps should be required to run such programs.