semchunk
semchunk copied to clipboard
A fast and lightweight pure Python library for splitting text into semantically meaningful chunks.
Hey, First off, I wanna say this is a pretty cool library! Thank you for the amazing work! I'm just curious if there is an option to have overlapping chunks...
I am trying to chunk a huge document but it runs forever. Did I miss something in my code? [File here](https://drive.google.com/file/d/1Xnp5jJhjIWNA6R5u9w96L9WO_Hb61Jmh/view?usp=sharing) ```python import semchunk import pandas as pd df =...
Hi there! Thanks for this neat library. I'm giving it a go. It would be great to have two variants of the `chunkerify` function that return a generator and async...
I had fun learning about your library and thought I might contribute. When investigating the message I kept getting that stated `Token indices sequence length is longer than the specified...
My markdown doc is structured as: ```md # header1 ## header2 Some text ## header2 Some more text ### Step 0: this is pre-planning step * ⚠️ this is a...
While semantic splitting is extremely valuable, in certain frameworks like Langchan, llama index, and docling you can often find yourself having the opposite problem when parsing markdown or html that...