Arthit Suriyawongkul
Arthit Suriyawongkul
It looks like the current `urls.STRICT_DATE_REGES` immediately takes the first (two) digit(s) after a slash as date of month. ``` >>> from newspaper import Article >>> url = "https://prachatai.com/journal/2021/04/92713" >>>...
- Replace comparisons with None from `== None` to `is None` and from `!= None` to `is not None` - Drop unnecessary comparisons when obvious, follows Python convention that if...
- Convert FullUsage.html to Markdown (USAGE.md) for easy access and editing, directly from within GitHub - See example at https://github.com/bact/RDRPOSTagger/blob/markdown-usage/USAGE.md - Add badges to README.md, as well as a link...
Try this test set: ```python from pythainlp.transliterate import romanize test_cases = { None: "", "": "", "หมอก": "mok", "หาย": "hai", "แมว": "maeo", "เดือน": "duean", "ดำ": "dam", "ดู": "du", "บัว": "bua",...
ปัจจุบันไฟล์ใน pythainlp/corpus มีรูปแบบการตั้งชื่อที่ไม่สม่ำเสมอ เสนอให้มีการใช้ชื่อที่สม่ำเสมอครับ เพื่อความสะดวกในการดูแลโค้ด (มีความคาดเดาได้บางอย่าง) ตัวอย่าง # การใช้ _ หรือ - คั่นคำ - thaipos.py
## Description `pythainlp.util.collate()` results a wrong ordering, as current implementation ignores tone marks and symbols in the ordering. Try this code: ```python from pythainlp.util import collate collate(["ก้วย", "ก๋วย", "ก่วย", "กวย",...
This issue works as a note on the size of different language models PyThaiNLP currently use. - Some of them are included in the package - will be immediately available...
`words_th_thai2fit_201810.txt` is now almost two years behind `words_th.txt` - should consider update the dictionary and related models.
เสนอว่าไม่ควรใช้ `pythainlp.corpus.get_corpus_path()` นั้นเรียกดาวน์โหลดแฟ้มโดยอัตโนมัติหากมันหาแฟ้มไม่เจอครับ ควรจะปล่อยให้ผู้ใช้ตัดสินใจเองมากกว่า Current `get_corpus_path()` try to download the corpus file if it is not yet exist locally: https://github.com/PyThaiNLP/pythainlp/blob/831a9fcfd24e069b6e929283b3abdc161a9a5608/pythainlp/corpus/core.py#L81 ```python if db.search(query.name == name): path = get_full_data_path(db.search(query.name == name)[0]["file"])...
## Context - Country list tend to grow only bigger, if we like to keep track of the censorship that may last longer than the site itself - More probes...