BinaryCIF icon indicating copy to clipboard operation
BinaryCIF copied to clipboard

`ByteArray` encoding type 33

Open BradyAJohnston opened this issue 2 years ago • 8 comments

I'm working on my own parser, and I have it successfully working with importing example data .bcif from the py-mmcif as well as CellPack .bcif files from molstar.org/dev/me. It seems to be working well on parsing everything for the structures, but when extracting the symmetry operations from the CellPack files, I am coming across a ByteArray type that doesn't make sense.

[[{'kind': 'Delta', 'origin': 1, 'srcType': 3},
  {'kind': 'RunLength', 'srcType': 3, 'srcSize': 767},
  {'kind': 'IntegerPacking', 'byteCount': 1, 'isUnsigned': True, 'srcSize': 4},
  {'kind': 'ByteArray', 'type': 4}],
 [{'kind': 'ByteArray', 'type': 33}],
 [{'kind': 'ByteArray', 'type': 33}],
 [{'kind': 'ByteArray', 'type': 33}],
 [{'kind': 'ByteArray', 'type': 33}],
 [{'kind': 'ByteArray', 'type': 33}],
 [{'kind': 'ByteArray', 'type': 33}],
 [{'kind': 'ByteArray', 'type': 33}],
 [{'kind': 'ByteArray', 'type': 33}],
 [{'kind': 'ByteArray', 'type': 33}],
 [{'kind': 'ByteArray', 'type': 33}],
 [{'kind': 'ByteArray', 'type': 33}],
 [{'kind': 'ByteArray', 'type': 33}]]

Is 33 something special that isn't explicitly mentioned in the spec, or have I gotten something wrong earlier in my pipeline?

BradyAJohnston avatar Aug 06 '23 10:08 BradyAJohnston

33 is for Float64 arrays

arose avatar Aug 06 '23 15:08 arose

This is my first time doing this kind of raw byte decoding, so I might be missing something here that is obvious, but why is this the case? Does 33 mean not 33? Are the data types specified as below?

ByteArray {
    kind = "ByteArray"
    type: Int8 | Int16 | Int32 | Uint8 | Uint16 | Uint32 | Float32 | Float64
 #  type:   1  |   2   |   3   |   4   |    5   |   6    |    7    |   33
}

BradyAJohnston avatar Aug 07 '23 01:08 BradyAJohnston

Not sure it is in the spec. Here is the normative implementation... https://github.com/molstar/molstar/blob/master/src/mol-io/common/binary-cif/encoding.ts#L60-L72

arose avatar Aug 07 '23 03:08 arose

Okay thanks for the additional clarification. Should this be something that is specified in the spec, if it's the official implementation?

BradyAJohnston avatar Aug 07 '23 03:08 BradyAJohnston

Yeah it should. the spec could definitely use an overhaul.

arose avatar Aug 07 '23 03:08 arose

Are there any other little 'gotchas' you can think of while I'm tackling this?

BradyAJohnston avatar Aug 07 '23 03:08 BradyAJohnston

I'd look at the molstar implementation or this minimal python implementation https://gist.github.com/dsehnal/b06f5555fa9145da69fe69abfeab6eaf

arose avatar Aug 07 '23 04:08 arose

Ah many thanks. I was also doing a minimal numpy implementation, and the example you linked does exactly what I am after but they've done it much more cleanly than I had come up with. Wish I had googled a bit harder and I would have saved myself a weekend of tinkering.

Would be useful to have that minimal implementation linked in the README also.

BradyAJohnston avatar Aug 07 '23 04:08 BradyAJohnston