Issue in running python extract_vocab.py
Error while loading word embedding glove
Logs:
Loading from original dataset
Loading data from data/train_tok.jsonl
Loading data from data/train_tok.tables.jsonl
Loading data from data/dev_tok.jsonl
Loading data from data/dev_tok.tables.jsonl
Loading data from data/test_tok.jsonl
Loading data from data/test_tok.tables.jsonl
Loading word embedding from glove/glove.42B.300d.txt
Traceback (most recent call last):
File "extract_vocab.py", line 23, in
Error while loading word embedding glove
Logs: Loading from original dataset Loading data from data/train_tok.jsonl Loading data from data/train_tok.tables.jsonl Loading data from data/dev_tok.jsonl Loading data from data/dev_tok.tables.jsonl Loading data from data/test_tok.jsonl Loading data from data/test_tok.tables.jsonl Loading word embedding from glove/glove.42B.300d.txt Traceback (most recent call last): File "extract_vocab.py", line 23, in use_small=USE_SMALL) File "C:\Users\SQLNet\sqlnet\utils.py ", line 274, in load_word_emb for idx, line in enumerate(inf): File "C:\Users\miniconda3\lib\encodings\cp1252.py", line 23, in dec ode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2438: cha racter maps to
Execution is started with following changes in utils.py at row#273 with open(file_name,encoding="utf8") as inf:
Check your folder structure for data, Is your train_tok.jsonl under data folder or data/data/train_tok.jsonl?
thanks for editing @DevalNaik