StackOverflow
StackOverflow copied to clipboard
Producing file formats of my data set
Hi @jacoxu, Thank you for great code. First of all I want to know that do you have any python code through which I can prepare following two files from my own data set:
- vocab_withIdx.dic
- vocab_emb_Word2vec_48.vec
When I saw your raw titles text files and vocab_withIdx.dic then I do not understand how you have prepared this. Have you performed any text preprocessing before you convert it into vocab with indexes. I would be very thankful to you for your help.