skips events on failed `read_text`
Hi, I'm getting behavior that's gotta be a bug. My parsing code is pretty straitforward:
fn parse_calibration(calibration: &str) -> Result<AxiaCalibration, ConfigurationError> {
let mut result: AxiaCalibration = Default::default();
let mut reader = Reader::from_str(calibration);
reader.trim_text(true);
// 350 here to be long enough to hold the <calmtx> line in EXAMPLE_CALIBRATION.
let mut buf = Vec::with_capacity(350); // this buffer gets recycled
let mut inner_buf = Vec::with_capacity(350); // this buffer gets recycled
loop {
match reader.read_event(&mut buf) {
Ok(Event::Start(e)) => {
if let Ok(s) = reader.read_text(e.name(), &mut inner_buf) {
println!("found {:?}", String::from_utf8(e.name().to_vec()));
parse_calibration_field(&e.name(), &s, &mut result)?;
} else {
println!("nogo {:?}", String::from_utf8(e.name().to_vec()));
}
}
Ok(Event::Eof) => break,
Err(e) => return Err(e.into()),
a => println!("{:?}", a), // ignore comments, end tags, random text, etc
}
}
Ok(result)
}
but I'm finding that it skips whatever comes immediately after <netftCalibration> in
const EXAMPLE_CALIBRATION: &'static str = r#"<?xml version="1.0"?>
<netftCalibration>
<prodname>Ethernet Axia</prodname>
<!-- Calibration Data-->
<calthk>0</calthk>
<caldis>0</caldis>
<calsn>FT27120</calsn>
<calpn>SI-500-20</calpn>
<calfam>NET</calfam>
<caldt>2/22/2019</caldt>
<calmtx>101.194;-91.0611;-8.59668;-9.58448;-98.2309;106.267;-47.0668;64.063;106.857;-108.174;-66.2965;50.6429;59.2169;62.9028;57.032;61.8539;56.9527;62.1059;-2.19935;-1.12881;1.15961;-1.17096;0.976857;2.36193;-0.12006;2.04773;-1.9022;-2.01133;2.05653;-0.0719982;2.66346;-2.69029;2.5692;-2.52372;2.785;-2.80579</calmtx>
<calfu>1</calfu>
<scalfu>N</scalfu>
<caltu>2</caltu>
<scaltu>Nm</scaltu>
<calmr>148;148;378;5;5;8</calmr>
<calcpf>1000000</calcpf>
<calcpt>1000000</calcpt>
<calggn>0;0;0;0;0;0</calggn>
<calgof>0;0;0;0;0;0</calgof>
<calres>0;0;0;0;0;0</calres>
<calrng>148;148;378;5;5;8</calrng>
<calsf>0;0;0;0;0;0</calsf>
<calusra>0</calusra>
<calusrb>0</calusrb>
</netftCalibration>
"#;
If I reorder the lines such that the comment comes before <prodname> my unit tests pass and all is well in the world. But as written here, the Ok(Event::Start()) for prodname never emits.
Sorry for the long delay before answering.
Indeed read_text is not very smart at the moment and expects to receive Event::Text then Event::End directly. It doesn't expect an inner node.
In your case I believe you shouldn't use read_text but directly catch the Event::Text. I am thinking in modifying read_text behavior to read (and consume) all texts until the Event::End is reached, ignoring any sub nodes. Alternatively reading all texts, including sub nodes'.
The @tafia's suggested algorithm (reading all texts, including sub nodes) has been implemented in #455. As already written, in your case it is probably better to match Event::Text (and maybe Event::CData) directly.