FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

Add preprocess exception handler for trainer.

Open likejazz opened this issue 2 years ago • 1 comments

Add preprocess exception handling for 3 type of errors:

  • size of source is 0
  • chatgpt or bing is not in roles.
  • The order of human and assistant is incorrect.

likejazz avatar Apr 28 '23 11:04 likejazz

Thanks for the contribution. Instead of ignoring these warnings, can we use functions like this to clean up the dataset before training? This ensures the quality of the training data.

https://github.com/lm-sys/FastChat/blob/73ea04dec7832de68783a68a424d374c85e3a29d/fastchat/data/split_long_conversation.py#L78-L94

merrymercy avatar Apr 29 '23 12:04 merrymercy

@merrymercy OK, your guide is more useful than this patch. I'll close this issue.

likejazz avatar May 01 '23 07:05 likejazz