Shikib Mehri comments

Results 16 comments of


                                            Shikib Mehri

[BUG]: (num_samples should be a positive integer value, but got num_samples=0) from Colab

In my case, I solved this issue (albeit in a different setting) by setting the `block_size` to 512. If the `block_size` is unspecified, it seems that the tokenizer's maximum size...

HWU64 odd number of samples

Thank you for bringing this discrepancy to our attention. We use the data downloading scripts provided by [https://arxiv.org/pdf/2009.13570.pdf] to get all of the intent prediction datasets, including HWU, in order...

About Observers in the paper

Apologies for the extremely late reply. I'm not an official collaborator on this repo, so I did not get a notification about your issue. To answer your question, yes [PAD]...

Acc of CLINC150 and HWU64 on leaderboard are reversed.

Thank you for bringing this to our attention. I have taken steps to address this. Hopefully the results will be fixed soon. **Note:** this only affects our baseline submissions, and...

reproducibility for slot filling task

Hello, Apologies for the long delay in dealing with this issue. It seems that you are trying to reproduce our result with BERT on the DSTC8 dataset. I pulled the...

Reproducing few-shot experiments on Multiwoz2.1

Apologies for the delay in addressing this issue. I don't fully understand your issue, are you saying that you're achieving higher performance than JGA of 0.49 using our few-shot setup?...

hyper parameters for MultiWOZ

Apologies for the long delay in addressing this issue. Our hyperparameters are in this script: https://github.com/alexa/dialoglue/blob/master/trippy/DO.example.advanced Our 58 result is only achieved with --mlm_pre and --mlm_during.

I can't reproduce the same results in the paper

Thank you for raising this issue, and apologies about the difficulties. I am not sure if I have all of the original scripts available still, but if the suggestions below...

I can't reproduce the same results in the paper

I'm able to reproduce your issues with the average scores on my end. I don't have time right now to dig deep into why the released data is producing different...

I can't reproduce the same results in the paper

As for the second part of your question, the unfortunate answer is that the FED metric is extremely sensitive/brittle. Small differences in pre-processing, model initialization, calculation of the LM-likelihood, etc....