vscode-spell-checker icon indicating copy to clipboard operation
vscode-spell-checker copied to clipboard

Ignore words with non-latin characters

Open alystair opened this issue 6 years ago • 2 comments

image

Obviously Spanish and other pure latin char languages would have to be ignored manually by user, but Russian and other languages that use non-latin characters should automatically be ignored unless we're using that language dictionary?

alystair avatar Apr 03 '20 09:04 alystair

I was able to get around it with this:

  "cSpell.includeRegExpList": [
    "\b[a-zA-Z0-9.]+\b"
  ],

johnml1135 avatar Feb 12 '24 19:02 johnml1135

It is necessary to explicitly ignore character sets. By default, the spell checker checks all text.

It is possible to tell the spell checker to ignore a character set using the ignoreRegExpList or only include text that matches expressions in includeRegExpList.

The spell checker uses JavaScript's builtin regexp engine. To use Unicode matching the u flag needs to be added.

It is also necessary to specify Script_Extensions= when using script names. See: Unicode character class escape: \p{...}, \P{...} - JavaScript | MDN. It is always best to try out expressions at regex101: build, test, and debug regex.

Using directive within a document

// cspell:ignoreRegExp /[\p{Script_Extensions=Cyrillic}]+/gu
image

VS Code Settings

.vscode/settings.json

  "cSpell.ignoreRegExpList": ["/[\\p{Script_Extensions=Cyrillic}]+/gu"]

Using CSpell config

cspell.json

{
  "ignoreRegExpList": ["/[\\p{Script_Extensions=Cyrillic}]+/gu"]
}

cspell.config.yaml

ignoreRegExpList": 
  - '/[\p{Script_Extensions=Cyrillic}]+/gu'

List of Character sets

Useful reference: Unicode Scripts

List

  • Common
  • Arabic
  • Armenian
  • Bengali
  • Bopomofo
  • Braille
  • Buhid
  • Canadian_Aboriginal
  • Cherokee
  • Cyrillic
  • Devanagari
  • Ethiopic
  • Georgian
  • Greek
  • Gujarati
  • Gurmukhi
  • Han
  • Hangul
  • Hanunoo
  • Hebrew
  • Hiragana
  • Inherited
  • Kannada
  • Katakana
  • Khmer
  • Lao
  • Latin
  • Limbu
  • Malayalam
  • Mongolian
  • Myanmar
  • Ogham
  • Oriya
  • Runic
  • Sinhala
  • Syriac
  • Tagalog
  • Tagbanwa
  • TaiLe
  • Tamil
  • Telugu
  • Thaana
  • Thai
  • Tibetan
  • Yi

Jason3S avatar Feb 14 '24 18:02 Jason3S