Keven Scharaswak
Keven Scharaswak
for RPi 4 (running Umbrel too) adjust the above code that the OP shared to this: ``` #! /usr/bin/env -S bash -ex GO_VERSION="${1}" if [ -z "${GO_VERSION}" ]; then GO_VERSION="1.16.5"...
@mauriceweber did you ever provide instructions for putting the wet file through the cc_net pipeline?
That is correct. If you look hard on the internet, you can still find it. then you can edit the python script to bring the local file into the fold....
I can verify that len(inputs_by_snapsh) is 0 edit: I seem to not have the listings correct, or the s3 bucket info correct. is it possible to get an example of...
Thank you so much for the response. Is the s5cmd command supposed to point to my own s3 bucket or someone else's? I created a bucket but it is blank...
> There is no data that needs to be pulled from an external S3 bucket, only your own where you have the ccnet output stored -- it is also only...
Wrote a python script to download all the ccnet data based on your links above. it does this in parallel and is basic. saturated my connection and server to get...
the -l is for the language. This was for an older version of CC Net. The original project has been archived, but you can remove the "-l en" part and...
I found that the dataset isnt ready for 20240320 yet, so I went back another snapshot and trying again
changing the date to 20240301 and removing the beam_runner parameter seems to have worked.