agents
agents copied to clipboard
examples simple-rag bug: split_paragraphs isn't working correctly
livekit.agents.tokenize._basic_paragraph.split_paragraphs
def split_paragraphs(text: str) -> list[tuple[str, int, int]]:
"""
Split the text into paragraphs.
Returns a list of paragraphs with their start and end indices of the original text.
"""
matches = re.finditer(r"\n{2,}", text)
paragraphs = []
for match in matches:
paragraph = match.group(0)
start_pos = match.start()
end_pos = match.end()
paragraphs.append((paragraph.strip(), start_pos, end_pos))
return paragraphs
Is this regex written incorrectly? It should be like this.
matches = re.finditer(r".+\n{2,}", text)
I also encountered the same problem
Kindly assign this to me, I would like to work on this.
@m-tabish of course, please go ahead and submit a PR. it'd be great if you are able to add test coverage here as well, so we won't break it unintentionally again.
Aren't you guys in Hacktoberfest? Is you are kindly add a tag here
fixed in #896