LMOps icon indicating copy to clipboard operation
LMOps copied to clipboard

[MiniLLM] Processed RoBERTa Corpus dataset download

Open AKaubay opened this issue 1 year ago • 1 comments

Unable to download processed RoBERTa Corpus only. Also encountering repeated interruptions during download of the full processed_data.tar with an error indicating dead links, possibly due to incomplete or corrupt file structure in the compressed archive. I also tried to download it from my personal computer with a 10 Mbps connection and still encountered the same problem. error 1 error 2

AKaubay avatar Apr 26 '24 09:04 AKaubay

It works fine in our environment. Is the download started by running the following commands?

DLINK=$(echo -n "aHR0cHM6Ly9jb252ZXJzYXRpb25odWIuYmxvYi5jb3JlLndpbmRvd3MubmV0L2JlaXQtc2hhcmUtcHVibGljL01pbmlMTE0vcHJvY2Vzc2VkX2RhdGEudGFyP3N2PTIwMjMtMDEtMDMmc3Q9MjAyNC0wNC0xMFQxMyUzQTExJTNBNDRaJnNlPTIwNTAtMDQtMTFUMTMlM0ExMSUzQTAwWiZzcj1jJnNwPXImc2lnPTRjWEpJalZSWkhJQldxSGpQZ0RuJTJGMDFvY3pwRFdYaXBtUENVazNaOHZiUSUzRA==" | base64 --decode)
wget -O processed_data.tar $DLINK

t1101675 avatar May 10 '24 15:05 t1101675

https://github.com/microsoft/LMOps/blob/main/minillm/README.md

The links have been updated.

donglixp avatar Sep 14 '24 16:09 donglixp