starcoder icon indicating copy to clipboard operation
starcoder copied to clipboard

chat/dialogues.py mask_user_labels is bug?

Open wanglongxingtianxia opened this issue 2 years ago • 1 comments

def mask_user_labels(tokenizer, dialogue_template, labels):
    """Masks the user turns of a dialogue from the loss"""
    user_token_id = tokenizer.convert_tokens_to_ids(dialogue_template.user_token)
    assistant_token_id = tokenizer.convert_tokens_to_ids(dialogue_template.assistant_token)
    for idx, label_id in enumerate(labels):

The labels parameter is a two-dimensional list, which causes mask user to fail

wanglongxingtianxia avatar Aug 22 '23 06:08 wanglongxingtianxia

Hi. mask_user_labels is designed to take labels as a list of token ids. So you should not pass a 2D list to this function. It is better if you apply it to your examples one at the time. You can modify it to accommodate your needs.

ArmelRandy avatar Aug 28 '23 09:08 ArmelRandy