New Programming Language Datasets

Open johnmn3 opened this issue 2 years ago • 0 comments

I'd like to add a dataset for Clojure/Script code so that Open Assistant is able to help Clojurists with coding. I have a dataset of Clojure questions and answers - some are better examples than others, but they all compile and run and can be seen as many different ways of solving a particular task. As a Lisp, it is also quite easy for us to generate code in large quantities, so we could potentially generate large quantities of examples.

I would like to massage the dataset in such a way that would make it easiest for Open Assistant to ingest and use the data. So I'm looking for guidance on how to structure the dataset. And perhaps other language communities could benefit from the same guidance, so I'm wondering if there could be a template we could follow for adding programming instructions for a given language.

Let me know how you would like me to proceed.

Also, amazing job on this project. I'm really thankful for this project and I know many others are as well.

Apr 17 '23 16:04 johnmn3