Deserializing error with UTF-8 BOM (Byte Order Mark) Content
Deserializing Panic with UTF-8 BOM (Byte Order Mark) Content
I encounter an issue when attempting to deserialize a string encoded in UTF-8 with a Byte Order Mark (BOM). The deserializer throws the following error: Error("expected value", line: 1, column: 1).
How to Reproduce
To reproduce the issue, encode a JSON file in UTF-8 with BOM and use from_reader or from_str for deserialization.
Workaround
As a temporary workaround, I check if the file content begins with the first three bytes of the BOM and remove them if present:
use std::fs;
fn main() {
// Specify the path to your file
let file_path = "path/to/your/file_with_bom.json";
// Read the file to a Vec<u8>
let mut data = fs::read(file_path).unwrap();
// UTF-8 BOM is three bytes: EF BB BF
if data.starts_with(&[0xEF, 0xBB, 0xBF]) {
// Remove the first three bytes (the BOM)
data = data[3..].to_vec();
}
// Proceed with deserialization...
}
One way would be to handle it in Rust itself https://github.com/rust-lang/rfcs/issues/2428 at least IETF RFC 3629 doesn't forbids it. (even though I'm personally against it, as it is a protocol detail)
But your file is theoretically not compliant with IETF RFC 7159 (even though this also not strictly forbidden in the beforementioned RFC as its a protocol detail)
Implementations MUST NOT add a byte order mark to the beginning of a JSON text. In the interests of interoperability, implementations that parse JSON texts MAY ignore the presence of a byte order mark rather than treating it as an error.
Either way its at least totally valid to ignore the BOM to be still conformant.