Question about testing on new data
Hi, I'm trying to run ZOE on a new dataset and the following questions were raised:
-
In the main.py, should I comment out runner.elmo_processor.load_cached_embeddings("target.min.embedding.pickle", "wikilinks.min.embedding.pickle")? If yes, could you show me how these two files are generated and what are the format for the raw version of these two files? Currently I found running new data is extremely slow (processed 30 sentences after one night). Anything idea how I can speed up things?
-
Are there any other files/data I need to generate for testing on new dataset? (maybe vocab_test.txt?)
Thank you!
-
The speed is slow on non-cached Wikipedia titles, especially on CPUs, because it runs multiple ELMo inferences to generate a title's representation. I could provide a huge SQLite file (~72GB) that contains all the Wikipedia titles, do you want me to share it? By having that file, you could use this function instead of
load_cached_embeddings. Furthermore, it is recommended to cache your test set as well, i.e. store what candidates are found at each instance so that you can tune your type inference at a low cost. To do this, I would suggest storing results into a map and pickle that map. -
Everything should work fine if you have your type mapping (inference) part working. The previous point only speeds things up, without any impact on the results.
Thank you. Please share it with me! Really appreciate it!
On Wed, Jul 31, 2019 at 10:18 AM Xuanyu Zhou [email protected] wrote:
The speed is slow on non-cached Wikipedia titles, especially on CPUs, because it runs multiple ELMo inferences to generate a title's representation. I could provide a huge SQLite file (~72GB) that contains all the Wikipedia titles, do you want me to share it? By having that file, you could use this function https://github.com/CogComp/zoe/blob/master/zoe_utils.py#L39 instead of load_cached_embeddings. Furthermore, it is recommended to cache your test set as well, i.e. store what candidates are found at each instance so that you can tune your type inference at a low cost. To do this, I would suggest storing results into a map and pickle that map. 2.
Everything should work fine if you have your type mapping (inference) part working. The previous point only speeds things up, without any impact on the results.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/CogComp/zoe/issues/30?email_source=notifications&email_token=AFB56KISOX5OALWGS5P5TT3QCHCMZA5CNFSM4IIH7RI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3H6I7A#issuecomment-516940924, or mute the thread https://github.com/notifications/unsubscribe-auth/AFB56KKCVRXOCBC4ZLGAZC3QCHCMZANCNFSM4IIH7RIQ .
Updated the file "elmo_cache_correct.db" in the Google Drive https://drive.google.com/drive/u/1/folders/1fD6WfCEPQICGPhxqlwuVmf8uOot-jQq8?ths=true. Sorry for the delay, it's a huge file to upload.
To use it, please refer to the function pointer above, and set server_mode=False.
Thank you. Downloading it now, will bother you more if there is any further problems!