nom icon indicating copy to clipboard operation
nom copied to clipboard

How to create a parser for later reuse.

Open CyanHillFox opened this issue 10 months ago • 1 comments

We want to use nom to parse the streaming output text of the LLM, which contains a bunch of structured content. So we're trying to create a parser and save it for later reuse, because it is very often to match on streaming output.

But when I tried it, I ran into a problem. For example, the nom::branch::Choice<T> implements Parser only for T = &mut [A] / [A; N] / (A, B, ..), in which Vec<A> is not included. But in our use case, the tag_names is not fixed. So we tried passing the Vec to nom::branch::alt, but the result Choice doesn't implement Parser, if we pass the reference of Vec to nom::branch::alt, the result Choice will borrow the local variable tags. The sample code:

use nom::Parser;
use nom::{self, IResult};

fn new_matcher<'a>(tag_names: &[&'static str],) -> impl Parser<&'a str, Output = &'a str, Error = nom::error::Error<&'a str>> {
    let mut tags: Vec<_> = tag_names
        .iter()
        .map(|tag_name: &&str| {
            nom::bytes::streaming::tag::<&str, &'a str, nom::error::Error<&'a str>>(*tag_name)
        })
        .collect();
    // wrong because all_tag_name will borrow local variable
    let all_tag_name = nom::branch::alt(tags.as_mut());
    // wrong because the result Choice doesn't implement nom Parser
    // let all_tag_name = nom::branch::alt(tags);
    all_tag_name
}

I'm wondering what the correctway to create and save a nom Parser for later use.

CyanHillFox avatar Mar 19 '25 08:03 CyanHillFox

Usually when you get the error that Choice does not implement Parser it is because not all branches (in a tuple) have the same Output type. (When it is an array they have to have the same actual type.)

If you're passing in a vec you can always call as_slice()

Edit: This seems to do what you'd want. (check out this comment's edit history to see a "proper" trait using implementation.)

use nom::Parser;

fn tag_collection<'a, E>(
    tag_names: &[&'static str],
) -> impl Parser<&'a str, Output = &'a str, Error = E>
where
    E: nom::error::ParseError<&'a str>,
{
    let mut tags: Vec<_> = tag_names
        .iter()
        .map(|tag_name: &&str| nom::bytes::tag::<&str, &'a str, E>(*tag_name))
        .collect();

    move |input| nom::branch::alt(tags.as_mut_slice()).parse_complete(input)
}

fn main() {
    let input = "bbfranceaaaaabbb";

    let tags = vec!["aa", "bb", "cc"];
    let mut tag_collection = tag_collection::<nom::error::Error<&str>>(tags.as_slice());

    let result = tag_collection.parse_complete(input).unwrap();

    dbg!(result);
}

asibahi avatar Mar 19 '25 12:03 asibahi