roaring Clarify In-Place operation semantics

Been fooling around with optimizing in-place operations, in particular if we can do better with run containers. My interest is in solutions that are performant both in terms of calculations and allocations. Right now, the semantics for container calls like ior() and iand() are not well specified. They always return a container, that container will always be the union/intersection and it will usually, but not always, be the receiving struct. This ambiguity is mainly handled by always using the returned value, such as at https://github.com/RoaringBitmap/roaring/blob/master/roaring.go#L731 or https://github.com/RoaringBitmap/roaring/blob/master/roaring.go#L554. However, in at least one place we perform a naked ior(), which breaks if the underlying container is not updated with the union result.

My suggestion is to change the semantics so that iand() etc. return nil when the container was actually updated in-place, and a non-nil result value when it wasn't. The downside is that the code would be slightly more verbose. For example, instead of

c1 := rb.highlowcontainer.getWritableContainerAtIndex(pos1)
c2 := x2.highlowcontainer.getContainerAtIndex(pos2)
diff := c1.iand(c2)
if !diff.isEmpty() {
	rb.highlowcontainer.replaceKeyAndContainerAtIndex(intersectionsize, s1, diff, false)
	intersectionsize++
}

it'd be

c1 := rb.highlowcontainer.getWritableContainerAtIndex(pos1)
c2 := x2.highlowcontainer.getContainerAtIndex(pos2)
diff := c1.iand(c2)
if diff == nil && !c1.isEmpty {
        intersectionsize++
} else if !diff.isEmpty() {
	rb.highlowcontainer.replaceKeyAndContainerAtIndex(intersectionsize, s1, diff, false)
	intersectionsize++
}

The main advantages are that it would be more explicit and not have to always reset the value in the bitmap.

It'd be a good piece of work to refactor everywhere, so figured I'd get feedback first.

Jun 02 '21 23:06 jacksonrnewhouse

The AndAny function may, in fact, be broken. I have opened an issue https://github.com/RoaringBitmap/roaring/issues/309

Jun 03 '21 00:06 lemire

Right now, the semantics for container calls like ior() and iand() are not well specified.

This may not be explicit but I think it is well defined. Both ior() and iand() should return the container with the desired answer (always), with the possibility that this container could be the referent in some cases.

... which is not to say that we can't do better... maybe we should....

The main advantages are that it would be more explicit and not have to always reset the value in the bitmap.

Can you elaborate?

It is typically faster and simpler to unconditionally set a value. Additional branches make the code harder to read, to maintain and typically slower.

I am not saying you are wrong. However, it seems like if we are going to refactor (a time intensive proposal), one would like to get simpler code as a result... not more complicated code.

if we really want to make it explicit, then I submit to you that we can think of other means, like...

inplace, answer := mycontainer.ior(othercontainer)

That would be quite explicit then...

Please understand that I am not arguing... I just want to push back a little so I can better understand what you are proposing.

Jun 03 '21 00:06 lemire

I realized AndAny's issues when I refactored array.ior(run) to potentially return a new run container. At first I thought I had found a bug but because of when those unchecked ior calls are made it will always work. BUT, that is only because of a bunch of implementation dependent choices, and the general contract allows for implementations that break it.

Then, I looked at the various implementations, and they seemed to generally attempt to modify the receiver. I'd also be happy to just make explicit that receiver may not be modified

Jun 03 '21 01:06 jacksonrnewhouse

I'd also be happy to just make explicit that receiver may not be modified

That is the contract for or and and. The difference with iand and ior is that you allow the receiver/referent to be modified. If it was modified, then it is returned, otherwise a new container is returned. With or and and, a new container is always returned. It is crucially important in some cases to be able to do that... otherwise you always need to make copies...

Jun 03 '21 15:06 lemire