tasksource icon indicating copy to clipboard operation
tasksource copied to clipboard

Feature request: select tasks by language

Open avidale opened this issue 2 years ago • 2 comments

Currently, the package doesn't allow choosing the language. I think many people who are developing models for specific languages (or language sets) would like to be able to access task data for a given language, so if you implement this functionality, it might be of a great help.

avidale avatar Jul 20 '23 17:07 avidale

Hi, thanks for your suggestion ! Currently, you can use the dataframe and check for the presence of some languages in the names. But it's not enough, some datasets have the language in a particular column that is removed by the preprocessings. So it's not great, I agree. Proper language handling is in my roadmap.

sileod avatar Jul 20 '23 18:07 sileod

Yes, adding the languages id to the dataframe would be a great first step. Another potential enhancement is to make the file recast.py localizeable, so that the user could provide the prompt templates in the chosen language instead of the default (English).

avidale avatar Jul 20 '23 18:07 avidale