Language distribution of ShareGPT 70K conversation dataset for FastChat T5
What are all the languages present in the ShareGPT 70,000 conversation dataset which was used to fine-tune FastChat-T5?
The ReadMe file points to data_cleaning.md which was used to get data from ShareGPT. Within data_cleaning.md seems like sharegpt_clean_lang.json contains the list of languages in consideration and some languages are skipped.
how can i finetune with bounds of datasets?
What are all the languages present in the ShareGPT 70,000 conversation dataset which was used to fine-tune FastChat-T5?
The ReadMe file points to
data_cleaning.mdwhich was used to get data from ShareGPT. Withindata_cleaning.mdseems likesharegpt_clean_lang.jsoncontains the list of languages in consideration and some languages are skipped.
Hi I have the same question about the language distribution, do you have any idea?