PPLM icon indicating copy to clipboard operation
PPLM copied to clipboard

Are the bag of words case-sensitive?

Open yanan1116 opened this issue 4 years ago • 1 comments

Hello, I find that some words are cased while some are uncased. They have different word ids in the vocab of tokenizer of GPT.

What is the appropriate way to process the words ? Thanks.

image

yanan1116 avatar Jan 11 '22 20:01 yanan1116

Seems like there's no other better way to solve this, unless you include them all in bag of words.

kizunasunhy avatar Sep 23 '22 03:09 kizunasunhy