Koen van der Veen
Koen van der Veen
Thanks for your quick response! I used a AWS g3.xlarge. I tried multiple times but I do get consistent results around 70.3.
Thanks, I'd love to find out where the difference originates from. I downloaded the repo again to make sure I did not make any changes and ran again, but reached...
Hi! i ran the experiments again in the fix_randomization branch, but it did not result in different results (still around 70%). Did you re-download the data before running the experiments?...
Oke, I finally found the source of the difference. I used a newer version of the Deep Learning AMI in AWS, i ran the experiments with v10 now and got...
1. Are you referring to the lacking commands and filer.toml? If you are missing additional information can you let me know what information. I tried to stick more closely to...
Ah nvm about 1), I just noticed your commit. Let me get back with the error. Might be tomorrow instead of today
Ran it again, the error now is: ``` CacheRemoteObjectToLocalCluster name:"helmprojectstoragehelm" bucket:"helm" path:"/train-12.jsonl": rpc error: code = Unknown desc = volume server 172.18.0.4:8080 fetchAndWrite /train-12.jsonl: rpc error: code = Unknown desc...
when I run `volume.list` I get ``` Topology volumeSizeLimit:29999 MB hdd(volume:35/500 active:14 free:465 remote:0) DataCenter DefaultDataCenter hdd(volume:35/500 active:14 free:465 remote:0) Rack DefaultRack hdd(volume:35/500 active:14 free:465 remote:0) DataNode 172.18.0.4:8080 hdd(volume:35/500 active:14...
I did some additional testing, and made 2 observations - if i use `-concurrent=1` it does not fail, it is very slow though - I tried to use the 3.58_large_disk...
I think using the larg_disk image + volumesizelimit=40000 + master.volumePreallocate also works (need to test again to make sure I didnt get lucky). just large disk + large volumesizelimit does...