albert
albert copied to clipboard
Default Tutorial Not Working - Can't download MRPC data
When running the "prepare for training" tutorial code in Google Colab, I get the following error:
**** Model output directory: gs://albert_glue_tutorial/albert-tfhub/models/MRPC *****
Cloning into 'download_glue_repo'...
remote: Enumerating objects: 24, done.
remote: Total 24 (delta 0), reused 0 (delta 0), pack-reused 24
Unpacking objects: 100% (24/24), done.
Processing MRPC...
Traceback (most recent call last):
File "download_glue_repo/download_glue_data.py", line 150, in <module>
sys.exit(main(sys.argv[1:]))
File "download_glue_repo/download_glue_data.py", line 142, in main
format_mrpc(args.data_dir, args.path_to_mrpc)
File "download_glue_repo/download_glue_data.py", line 65, in format_mrpc
URLLIB.urlretrieve(MRPC_TRAIN, mrpc_train_file)
NameError: name 'URLLIB' is not defined
***** Task data directory: glue_data *****
I've followed the instructions as written on the Colab, setting up storage and filling in the parameter, as well as setting the runtime to TPU, then clicked "run all". How can I download the glue data needed for MRPC?
Looks like this issue has been noted here
The issue was fixed by doing the following.
- Click "Show Code" on the code cell where parameters (Bucket, Task, and Albert_Model) are filled in
- If you've already run the script once, you'll need to delete the
download_glue_repofolder. This can be done by adding the line!rm -rf download_glue_reporight after the# Download glue data.comment - Instead of cloning the broken repo, clone the fixed repo instead, which can be found here and was mentioned here. This can be done by changing the
!git cloneline to!git clone https://gist.github.com/fef1601580f269eca73bf26a198595f3.git download_glue_repo - Rerun everything. This time, the dataset should be downloaded correctly.