albert
albert copied to clipboard
[ALBERT] what are the parameters setting for training data generation ?
Hi,
What are the parameters setting for training data generation for your released models?
like, what is the dupe_factor, whether do whole word masking? I just found the max_seq_len, mask_probability, n_gram_mask,shorter_seq_prob in the paper?
Thanks