bgpkit-parser icon indicating copy to clipboard operation
bgpkit-parser copied to clipboard

record_to_elems() may fail silently

Open outis151 opened this issue 1 year ago • 4 comments

When running record_to_elems() on bgpkit_parser::parser::mrt::mrt_elem::Elementor, the function may fail silently due to the following: https://github.com/bgpkit/bgpkit-parser/blob/fb15bb43f61752591307e67d000c49a376bf9b8e/src/parser/mrt/mrt_elem.rs#L364

Moreover, it is noteworthy that this error can be triggered when running record_to_elems() only selectively and skipping records, as peer_table may not be initialized that way. This may also happen in case an Elementor is incorrectly created for every record.

Thanks for this project

outis151 avatar Sep 14 '24 18:09 outis151

Thank you for bringing this issue up!

My current thinking about processing records without peering_index_table first:

  • BGP TableDumpV2 format has peer_table to record all peer_ip and peer_asn for each RIB entry.
  • the two peer fields are currently required for BgpElem and they are supposed to be present in each record
  • if we run record_to_elems() before we processed peer_index_table, we will not be able to have the peer IP and ASN info, and thus failing to generate valid BGP elems. Does it make sense?

The issue about silently failing is valid, and it might be better to panic or somehow bubble up an error instead of only logging errors.

What do you think?

Also, it would be helpful if you could provide some more concrete example/scenario so that I can better understand your use case and reason about a solution on this matter.

digizeph avatar Sep 14 '24 23:09 digizeph

The idea was to skip processing mrt records based on the timestamp in the CommonHeader. Since I am working with local mrt files, I cannot use an API to limit the time range.

In the process of implementing that, I made some mistakes with the use of Elementor such as creating it for each record etc (with it failing silently in that case) which lead me to take a closer at the possible cases where the Elementor may fail to figure out its proper use.

Now I have learned that I cannot skip any record that has subtype of 1 PEER_INDEX_TABLE so that the Elementor works properly for the reasons you described. (Really I should be able to skip the entire file based on the first CommonHeader as all CommonHeader records seem to have the same timestamp in my case but I want to make sure.)

So my thoughts were to add a sentence or two in the documentation of Elementor::new() describing these points but perhaps you know of a better way to enforce proper usage.

outis151 avatar Sep 15 '24 09:09 outis151

Thanks for the explanation! The PEER_INDEX_TABLE should always be the first record for any RIB dump file. For RIB dumps, I agree that you should be able to skip the entire RIB dump file based on the first CommonHeader as you mentioned. You don't need to go through any records further in the dump file.

I am also curious about your experience on iterating BGP records and using Elementor for extracting messages. I wonder if that's for perforamnce reasons? If so, do you see significant boost by skipping records after reading CommonHeader? I am open to any performance improvement ideas.

digizeph avatar Sep 16 '24 02:09 digizeph

Indeed, I tried using Elementor for performance reasons but I doubt there is any major difference as the payload of each CommonHeader is already mostly parsed anyway from looking at the code. An idea here would be to implement a getter function for the payload of the header instead of using a struct field but it may not be worth implementing depending on how niche of a use-case record skipping is. An implementation like that would require a custom Debug trait implementation as well as advancing the reader manually if the payload is skipped including passing the reader around to various places from what I can tell from looking at the source.

outis151 avatar Sep 16 '24 20:09 outis151

hi @outis151 , in v0.12.0, we've added a fallible iterator FallibleRecordIterator and FallibleElemIterator. You can check out the example here to see how it works in practice.

We've also added a RawRecordIterator that parses only the header and pass on the raw bytes. This combing with some multi-thread design can achieve relatively good speed up. Here is an example. It does need to parse PEER_INDEX_TABLE first and then do the raw iteration. The current implementation is still complex as we are unsure the best way to integrate the multi-thread parsing into the main workflow yet.

I'll close this ticket and feel free to create new issues should you have any problems using this library. Thanks!

digizeph avatar Oct 20 '25 17:10 digizeph