George Zerveas

Results 38 comments of George Zerveas

Hi @MXueguang , thank you very much for your reply, and for sharing your nice code! I am trying to do a simple evaluation of Cocodenser: I use a Cocodenser...

Thank you very much for the prompt reply! I see, `Luyu/co-condenser-marco` is probably the model pre-trained on the MS MARCO collection through MLM and the contrastive objective, and `-retriever` is...

Thank you, I was using the official MS MARCO `collection.tsv`, and wasn't aware that RocketQA and (co)Condenser used a corpus with a title field. Using this new corpus, I was...

Thank you very much for confirming this. I tested tokenizing with the separator token in-between, and performance indeed jumped to 0.3813 MRR@10, which I find amazing (this improvement is very...

Hi Ian, First of all, I have never considered a binary dataset for this work, but who knows, it might work :) Can you please explain what is the meaning...

Hi Ian, sorry for the long delay in responding, I am caught up in many different things. Thank you for sharing the very interesting observations with respect to binary data!...

Yes, you can consider the number of epochs a "hyperparameter". Once you find out what it should be for each dataset, based on the original validation split, you use this...

No, as I wrote above, they should correspond to the predesignated number - and the hope is that this would be anyway close to the maximum performance.

Sure, these tables with hyperparameters are from the [KDD paper](https://dl.acm.org/doi/10.1145/3447548.3467401): ![image](https://user-images.githubusercontent.com/12188368/161867376-410d12df-26ff-43d4-af5b-0d37b0f83e57.png) ![image](https://user-images.githubusercontent.com/12188368/161867500-4228f5bf-389a-481f-918a-b96a3f578c8e.png) Regarding the learning rate, as far as I remember it was always set to 0.001 (the main reason...

Hi, as written in the README, these values are MSE and you will have to take the square root. Also, as I note in the README, you should consult the...