Shengyang Sun

Results 2 issues of Shengyang Sun

**Describe the bug** In the `GPTSFTChatDataset`, if the first prompt length exceeds `max_seq_length`, all following turns are truncated out. Then the `loss_mask` becomes all `False` for the example. https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_chat_dataset.py#L359 This...

bug
stale

# What does this PR do ? The original implementation of the prompt template in `gpt_sft_chat_dataset.py` uses a fix system token `System`. This PR enables to read the system token...

NLP
Run CICD