Arthit Suriyawongkul issues

Results 246 issues of


                                            Arthit Suriyawongkul

Date regex should not assume date of month from just first (two) digits after /

It looks like the current `urls.STRICT_DATE_REGES` immediately takes the first (two) digit(s) after a slash as date of month. ``` >>> from newspaper import Article >>> url = "https://prachatai.com/journal/2021/04/92713" >>>...

Follows PEP 8 Python code convention and format

- Replace comparisons with None from `== None` to `is None` and from `!= None` to `is not None` - Drop unnecessary comparisons when obvious, follows Python convention that if...

Convert FullUsage.html to Markdown

- Convert FullUsage.html to Markdown (USAGE.md) for easy access and editing, directly from within GitHub - See example at https://github.com/bact/RDRPOSTagger/blob/markdown-usage/USAGE.md - Add badges to README.md, as well as a link...

'royin' engine gives wrong romanization in a lot of cases

Try this test set: ```python from pythainlp.transliterate import romanize test_cases = { None: "", "": "", "หมอก": "mok", "หาย": "hai", "แมว": "maeo", "เดือน": "duean", "ดำ": "dam", "ดู": "du", "บัว": "bua",...

bug

Hacktoberfest

Naming convention for consistency วิธีการตั้งชื่อไฟล์

ปัจจุบันไฟล์ใน pythainlp/corpus มีรูปแบบการตั้งชื่อที่ไม่สม่ำเสมอ เสนอให้มีการใช้ชื่อที่สม่ำเสมอครับ เพื่อความสะดวกในการดูแลโค้ด (มีความคาดเดาได้บางอย่าง) ตัวอย่าง # การใช้ _ หรือ - คั่นคำ - thaipos.py

question

Wrong ordering from collate()

## Description `pythainlp.util.collate()` results a wrong ordering, as current implementation ignores tone marks and symbols in the ordering. Try this code: ```python from pythainlp.util import collate collate(["ก้วย", "ก๋วย", "ก่วย", "กวย",...

bug

help wanted

Hacktoberfest

Considerations for language model inclusion in default package or download them later

This issue works as a note on the size of different language models PyThaiNLP currently use. - Some of them are included in the package - will be immediately available...

corpus

thai2fit dictionary update

`words_th_thai2fit_201810.txt` is now almost two years behind `words_th.txt` - should consider update the dictionary and related models.

enhancement

pythainlp.corpus.get_corpus_path() should not try to download the corpus automatically

เสนอว่าไม่ควรใช้ `pythainlp.corpus.get_corpus_path()` นั้นเรียกดาวน์โหลดแฟ้มโดยอัตโนมัติหากมันหาแฟ้มไม่เจอครับ ควรจะปล่อยให้ผู้ใช้ตัดสินใจเองมากกว่า Current `get_corpus_path()` try to download the corpus file if it is not yet exist locally: https://github.com/PyThaiNLP/pythainlp/blob/831a9fcfd24e069b6e929283b3abdc161a9a5608/pythainlp/corpus/core.py#L81 ```python if db.search(query.name == name): path = get_full_data_path(db.search(query.name == name)[0]["file"])...

bug

corpus

Proposal: Frequency tier for each URL

## Context - Country list tend to grow only bigger, if we like to keep track of the censorship that may last longer than the site itself - More probes...

discuss