text-processing topic
Auto-CORPus
Auto-CORPus pipeline developed by a University of Nottingham and Imperial College London collaboration to standardize text and table data extracted from full text publications. See Open Access publica...
split-markdown4gpt
A Python tool for splitting large Markdown files into smaller sections based on a specified token limit. This is particularly useful for processing large Markdown files with GPT models, as it allows t...
humanreadable
humanreadable is a Python library to convert human-readable values to other units.
syntakts
Simple to use text parser and syntax highlighter for Kotlin Multiplatform
pawpaw
Text Processing & Segmentation Framework
Kudasai
Streamlining Japanese-English Translation with Advanced Preprocessing and Integrated Translation Technologies
flashtext2
The fastest FlashText library for Python
Valmiki_Ramayan_Dataset
Structured dataset of Valmiki Ramayana 📜 | Sanskrit Shlokas, Translations, & Explanations for AI & NLP🚀 Contributions welcome!