record_to_elems() may fail silently
When running record_to_elems() on bgpkit_parser::parser::mrt::mrt_elem::Elementor, the function may fail silently due to the following: https://github.com/bgpkit/bgpkit-parser/blob/fb15bb43f61752591307e67d000c49a376bf9b8e/src/parser/mrt/mrt_elem.rs#L364
Moreover, it is noteworthy that this error can be triggered when running record_to_elems() only selectively and skipping records, as peer_table may not be initialized that way. This may also happen in case an Elementor is incorrectly created for every record.
Thanks for this project
Thank you for bringing this issue up!
My current thinking about processing records without peering_index_table first:
- BGP TableDumpV2 format has peer_table to record all
peer_ipandpeer_asnfor each RIB entry. - the two peer fields are currently required for
BgpElemand they are supposed to be present in each record - if we run
record_to_elems()before we processed peer_index_table, we will not be able to have the peer IP and ASN info, and thus failing to generate valid BGP elems. Does it make sense?
The issue about silently failing is valid, and it might be better to panic or somehow bubble up an error instead of only logging errors.
What do you think?
Also, it would be helpful if you could provide some more concrete example/scenario so that I can better understand your use case and reason about a solution on this matter.
The idea was to skip processing mrt records based on the timestamp in the CommonHeader. Since I am working with local mrt files, I cannot use an API to limit the time range.
In the process of implementing that, I made some mistakes with the use of Elementor such as creating it for each record etc (with it failing silently in that case) which lead me to take a closer at the possible cases where the Elementor may fail to figure out its proper use.
Now I have learned that I cannot skip any record that has subtype of 1 PEER_INDEX_TABLE so that the Elementor works properly for the reasons you described.
(Really I should be able to skip the entire file based on the first CommonHeader as all CommonHeader records seem to have the same timestamp in my case but I want to make sure.)
So my thoughts were to add a sentence or two in the documentation of Elementor::new() describing these points but perhaps you know of a better way to enforce proper usage.
Thanks for the explanation! The PEER_INDEX_TABLE should always be the first record for any RIB dump file. For RIB dumps, I agree that you should be able to skip the entire RIB dump file based on the first CommonHeader as you mentioned. You don't need to go through any records further in the dump file.
I am also curious about your experience on iterating BGP records and using Elementor for extracting messages. I wonder if that's for perforamnce reasons? If so, do you see significant boost by skipping records after reading CommonHeader? I am open to any performance improvement ideas.
Indeed, I tried using Elementor for performance reasons but I doubt there is any major difference as the payload of each CommonHeader is already mostly parsed anyway from looking at the code. An idea here would be to implement a getter function for the payload of the header instead of using a struct field but it may not be worth implementing depending on how niche of a use-case record skipping is. An implementation like that would require a custom Debug trait implementation as well as advancing the reader manually if the payload is skipped including passing the reader around to various places from what I can tell from looking at the source.
hi @outis151 , in v0.12.0, we've added a fallible iterator FallibleRecordIterator and FallibleElemIterator. You can check out the example here to see how it works in practice.
We've also added a RawRecordIterator that parses only the header and pass on the raw bytes. This combing with some multi-thread design can achieve relatively good speed up. Here is an example. It does need to parse PEER_INDEX_TABLE first and then do the raw iteration. The current implementation is still complex as we are unsure the best way to integrate the multi-thread parsing into the main workflow yet.
I'll close this ticket and feel free to create new issues should you have any problems using this library. Thanks!