BitString decoding is ambiguous in version >=2.5.0
Hi! I'm encountering an issue with the BitString decoding since version 2.5.0. Specifically, the issue was introduced in commit 18b3b7d339f4afe96693f610d2438ae904a071f8, which strips the "unused bits" byte and shifts the unused bits from the end to the beginning of the bit string.
The issue is that ASN.1 BitStrings are big-endian, and the Decoder doesn't output the number of unused bits, so as far as I can tell there's no way to tell which of the bits in the output corresponds to which bit in the encoded BitString. Here's a practical example:
The FIDO U2F Authenticator Transports Extension is an X.509 extension defined thus:
-- FIDO U2F certificate extensions
id-fido-u2f-ce-transports OBJECT IDENTIFIER ::= { id-fido-u2f-ce 1 }
fidoU2FTransports EXTENSION ::= {
WITH SYNTAX FIDOU2FTransports
ID id-fido-u2f-ce-transports
}
FIDOU2FTransports ::= BIT STRING {
bluetoothRadio(0), -- Bluetooth Classic
bluetoothLowEnergyRadio(1),
uSB(2),
nFC(3),
uSBInternal(4)
}
An extension value representing the list [uSB(2), nFC(3)] is encoded in DER as: 03 02 04 30, representing the bit string 0011 .... with the 4 unused bits written as ..
Now consider this example script:
import asn1
def show_bits(bstr):
return [f"{b:08b}" for b in bstr]
ext_value = bytes([0x03, 0x02, 0x04, 0x30])
dec = asn1.Decoder()
dec.start(ext_value)
tag, value = dec.read()
print(ext_value.hex(), show_bits(ext_value))
print(value.hex(), show_bits(value))
In asn1 version 2.4.2 and earlier, this outputs:
03020430 ['00000011', '00000010', '00000100', '00110000']
0430 ['00000100', '00110000']
and you can extract the bit flags by identifying bit i in the spec with the bit 1 << (8 - i - 1) in the second byte of value.
In asn1 version 2.5.0 and later, this outputs:
03020430 ['00000011', '00000010', '00000100', '00110000']
03 ['00000011']
Now the decoder doesn't return now many bits are unused, so there seems to be no way to tell if the shifted bits in the outputs represent [uSB(2), nFC(3)] (bit string 0011 ....) or [bluetoothRadio(0), bluetoothLowEnergyRadio(1)] (bit string 11.. ....), or any other combination. Had the bits been reversed, you could identify bit i in the spec with the bit 1 << i in the output, but the output is still big-endian so that's not possible without also extracting the "unused bits" value from the original input.
Am I missing something?
I think the decoder needs to either also return the number of unused bits, or return some higher-level data model (for example, list[bool] or an accessor class instance) that allows extracting and/or iterating over the BitString flags.
I do not fully understand the issue, i have to look at it in details and I will. I am currently a little busy with other projects so it may take some time.
Thanks, no rush! I'd be happy to help design and implement a solution. I'll probably experiment a bit with an accessor pattern, but let me know if you'd prefer a particular direction on how to handle this. I'm not familiar with what conventions there are in the library, but I'll take a look around and try to find something that can fit in.
Also, here's an expanded example (again using the FIDO transports X.509 extension as a practical example) to perhaps better show the nuances here:
import asn1
def show_bytes(bstr):
return " ".join([f"{b:02x}" for b in bstr])
def show_bits(bstr):
return [f"{b:08b}" for b in bstr]
U2F_TRANSPORTS = [
'bluetoothRadio', 'BLE', 'USB', 'NFC', 'USBInternal', 'lightning']
def decode_transports(flags, unused_bits):
return [
U2F_TRANSPORTS[i]
for i in range(8 - unused_bits)
if flags & (1 << (8 - i - 1))]
examples = [
bytes([0x03, 0x02, 7, 0b10000000]),
bytes([0x03, 0x02, 6, 0b01000000]),
bytes([0x03, 0x02, 5, 0b00100000]),
bytes([0x03, 0x02, 2, 0b00000100]),
bytes([0x03, 0x02, 5, 0b01100000]),
bytes([0x03, 0x02, 4, 0b00110000]),
bytes([0x03, 0x02, 4, 0b11000000]),
bytes([0x03, 0x02, 2, 0b00110000]),
bytes([0x03, 0x02, 1, 0b01100000]),
bytes([0x03, 0x02, 0, 0b00110000]),
]
for ext_value in examples:
dec = asn1.Decoder()
dec.start(ext_value)
tag, value = dec.read()
unused = ext_value[2]
desc = ", ".join(decode_transports(ext_value[3], unused))
print(f"\nExample: {desc}; {unused} unused")
print(f"Encoded: {show_bytes(ext_value): <12} bits: {show_bits(ext_value)}")
print(f"Decoded: {show_bytes(value): <12} bits: {show_bits(value)}")
Running with asn1 version 2.5.0, this outputs:
Example: bluetoothRadio; 7 unused
Encoded: 03 02 07 80 bits: ['00000011', '00000010', '00000111', '10000000']
Decoded: 01 bits: ['00000001']
Example: BLE; 6 unused
Encoded: 03 02 06 40 bits: ['00000011', '00000010', '00000110', '01000000']
Decoded: 01 bits: ['00000001']
Example: USB; 5 unused
Encoded: 03 02 05 20 bits: ['00000011', '00000010', '00000101', '00100000']
Decoded: 01 bits: ['00000001']
Example: lightning; 2 unused
Encoded: 03 02 02 04 bits: ['00000011', '00000010', '00000010', '00000100']
Decoded: 01 bits: ['00000001']
Example: BLE, USB; 5 unused
Encoded: 03 02 05 60 bits: ['00000011', '00000010', '00000101', '01100000']
Decoded: 03 bits: ['00000011']
Example: USB, NFC; 4 unused
Encoded: 03 02 04 30 bits: ['00000011', '00000010', '00000100', '00110000']
Decoded: 03 bits: ['00000011']
Example: bluetoothRadio, BLE; 4 unused
Encoded: 03 02 04 c0 bits: ['00000011', '00000010', '00000100', '11000000']
Decoded: 0c bits: ['00001100']
Example: USB, NFC; 2 unused
Encoded: 03 02 02 30 bits: ['00000011', '00000010', '00000010', '00110000']
Decoded: 0c bits: ['00001100']
Example: BLE, USB; 1 unused
Encoded: 03 02 01 60 bits: ['00000011', '00000010', '00000001', '01100000']
Decoded: 30 bits: ['00110000']
Example: USB, NFC; 0 unused
Encoded: 03 02 00 30 bits: ['00000011', '00000010', '00000000', '00110000']
Decoded: 30 bits: ['00110000']
Notice how all of these decoded values in the example are ambiguous:
-
00000001could mean "any one bit flag is set", which one depends on how many bits were unused -
00000011could mean either[BLE, USB]or[USB, NFC], again depending on how many bits were unused - ...and so on with the other examples with different numbers of unused bits.
I should also mention that what I said above:
Had the bits been reversed, you could identify bit
iin the spec with the bit1 << iin the output
is only true in this particular case, where undefined is equivalent to false. But that cannot be assumed true in the general case, and in that case there's no way to distinguish zero bits from undefined bits. The BitStrings 0... ...., 00.. .... and 000. .... all get decoded to 0000 0000, but [False] may not always be equivalent to [False, False, False].