Stefan Schweter
Stefan Schweter
Hi @alexeib , sorry for bothering you again, but could you a look at the missing `lm_head`. I could also share the pretrained checkpoint if necessary :hugs:
@alexeib do you have any hint of solving it? I would really like to convert that checkpoint with Transformers library and test it on downstream tasks. Any help is highly...
I did some debugging and one minor fix would be: ```python if hasattr(self.encoder, "target_model") and self.encoder.target_model is None ``` instead of: ```python if self.encoder.target_model is None: ``` What do you...
@kbhartiya What of version of TensorFlow are you using 🤔 I'm using 1.12rc0 with the latest Keras 2.2.4 version and training starts without problem.
Hi @NLPAN , are you referring to this issue: https://github.com/clab/dynet/issues/266 :thinking: I think I found it here: https://github.com/clab/dynet/blob/master/CMakeLists.txt#L92 (`native` compiler flag). Please let me know, if that works!
Hi @PhilipMay , thanks for that hint! Corpus looks really interesting and: > This preprocessing is filtering duplicates only inside the same dump. This step took approx. 50,000 CPU hours...
Hi @PhilipMay , just one question: I've downloaded the HEAD and MIDDLE archives (using the urls provided in `gc4_corpus_head_urls.txt` and `gc4_corpus_middle_urls.txt`). However, a `du -sh` shows "only" 418GB in total....
Number of files are correct (I checked both *.txt files and the links on the website). I will check the content length header of the provided files now, e.g: ```bash...
With some bash magic: ```bash for url in $(cat gc4_corpus_middle_urls.txt) do filename=$(echo $url | cut -d "/" -f 8) disk_size=$(stat -c "%s" $filename) download_size=$(curl --silent -I $url | grep "Content-Length:"...
And I calculates the number of downloaded bytes: `448598516042` -> which is pretty close to 450GB then 😅 More precisely: `194227285957` (HEAD) + `254371230085` (MIDDLE) = `448598516042` in total. So...