CLAP icon indicating copy to clipboard operation
CLAP copied to clipboard

question about dataset

Open superway117 opened this issue 1 year ago • 4 comments

about this point: "Utilizing a dataset engine capable of automatically generating 195 million pairs of code snippets and their descriptions"

  1. where can i find this dataset?
  2. what is the dataset engine ? thanks

superway117 avatar Jul 03 '24 06:07 superway117

Sorry, we only release the pre-trained model currently. You can find the the dataset engine description in Section 3.1 of our paper.

Hustcw avatar Jul 03 '24 06:07 Hustcw

i want to repeat your work on the dataset , appreciate if you could show the demo data of the dataset or provide me the script how to build the dataset

superway117 avatar Jul 04 '24 15:07 superway117

The compiling pipeline is complicated and it's not ready for open source, I could provide some demo data and scripts to request llm for explanation as I got some free time :) Sorry for that

Hustcw avatar Jul 05 '24 12:07 Hustcw