quick-xml icon indicating copy to clipboard operation
quick-xml copied to clipboard

skips events on failed `read_text`

Open dunmatt opened this issue 6 years ago • 1 comments

Hi, I'm getting behavior that's gotta be a bug. My parsing code is pretty straitforward:

    fn parse_calibration(calibration: &str) -> Result<AxiaCalibration, ConfigurationError> {
        let mut result: AxiaCalibration = Default::default();

        let mut reader = Reader::from_str(calibration);
        reader.trim_text(true);

        // 350 here to be long enough to hold the <calmtx> line in EXAMPLE_CALIBRATION.
        let mut buf = Vec::with_capacity(350); // this buffer gets recycled
        let mut inner_buf = Vec::with_capacity(350); // this buffer gets recycled

        loop {
            match reader.read_event(&mut buf) {
                Ok(Event::Start(e)) => {
                    if let Ok(s) = reader.read_text(e.name(), &mut inner_buf) {
                        println!("found {:?}", String::from_utf8(e.name().to_vec()));
                        parse_calibration_field(&e.name(), &s, &mut result)?;
                    } else {
                        println!("nogo {:?}", String::from_utf8(e.name().to_vec()));
                    }
                }
                Ok(Event::Eof) => break,
                Err(e) => return Err(e.into()),
                a => println!("{:?}", a), // ignore comments, end tags, random text, etc
            }
        }
        Ok(result)
    }

but I'm finding that it skips whatever comes immediately after <netftCalibration> in

    const EXAMPLE_CALIBRATION: &'static str = r#"<?xml version="1.0"?>
    <netftCalibration>
    <prodname>Ethernet Axia</prodname>
    <!-- Calibration Data-->
    <calthk>0</calthk>
    <caldis>0</caldis>
    <calsn>FT27120</calsn>
    <calpn>SI-500-20</calpn>
    <calfam>NET</calfam>
    <caldt>2/22/2019</caldt>
    <calmtx>101.194;-91.0611;-8.59668;-9.58448;-98.2309;106.267;-47.0668;64.063;106.857;-108.174;-66.2965;50.6429;59.2169;62.9028;57.032;61.8539;56.9527;62.1059;-2.19935;-1.12881;1.15961;-1.17096;0.976857;2.36193;-0.12006;2.04773;-1.9022;-2.01133;2.05653;-0.0719982;2.66346;-2.69029;2.5692;-2.52372;2.785;-2.80579</calmtx>
    <calfu>1</calfu>
    <scalfu>N</scalfu>
    <caltu>2</caltu>
    <scaltu>Nm</scaltu>
    <calmr>148;148;378;5;5;8</calmr>
    <calcpf>1000000</calcpf>
    <calcpt>1000000</calcpt>
    <calggn>0;0;0;0;0;0</calggn>
    <calgof>0;0;0;0;0;0</calgof>
    <calres>0;0;0;0;0;0</calres>
    <calrng>148;148;378;5;5;8</calrng>
    <calsf>0;0;0;0;0;0</calsf>
    <calusra>0</calusra>
    <calusrb>0</calusrb>
    </netftCalibration>
    "#;

If I reorder the lines such that the comment comes before <prodname> my unit tests pass and all is well in the world. But as written here, the Ok(Event::Start()) for prodname never emits.

dunmatt avatar Apr 16 '19 03:04 dunmatt

Sorry for the long delay before answering.

Indeed read_text is not very smart at the moment and expects to receive Event::Text then Event::End directly. It doesn't expect an inner node.

In your case I believe you shouldn't use read_text but directly catch the Event::Text. I am thinking in modifying read_text behavior to read (and consume) all texts until the Event::End is reached, ignoring any sub nodes. Alternatively reading all texts, including sub nodes'.

tafia avatar Apr 25 '19 08:04 tafia

The @tafia's suggested algorithm (reading all texts, including sub nodes) has been implemented in #455. As already written, in your case it is probably better to match Event::Text (and maybe Event::CData) directly.

Mingun avatar Aug 25 '22 10:08 Mingun