Antoine Chaffin

Results 95 comments of Antoine Chaffin

Yeah, I understand that this kind of detail is not what you have in mind when writting the paper, that's why I made this proposition. I'm not really too worried...

No, the model definition is not the same as HF's, you basically start from a randomly initialised network.

I think it should be possible to translate the fairseq version to an HF one (since I guess that is what they did for the other regular versions) but I...

Very cool, thank you very much ! I’ll test it on monday and try to reproduce the paper results. Converting models weights always seems a bit tricky but doable, do...

> And here's the [Colab notebook with explanations](https://colab.research.google.com/drive/1LLJewY92LXdeug5m_ceMUHdlqrRQwSQJ?usp=sharing) that I used for conversion. Thank you very much, I'll bookmark it so I know how to do this kind of translation...

> I observed this phenomenon in various other Seq2Seq models (i.e., jibberish artifact at the end of the sequence), and it's easy to get rid of them, so I didn't...

> Don't think so. As previously stated, most of the Seq2Seq models have this behavior. I also observed it in my TTS research. A high-quality TTS model I trained always...

I did some additional experiments which confirmed that the issue is in the compression process. I computed the similarity between original vectors and decompression of the compressed version (`np.diag(embs[:100] @...

Sorry for the delay. Here are the modifications that I did: - Using np.quantile instead of the torch function in collection_indexer.py ``` # bucket_cutoffs = heldout_avg_residual.float().quantile(bucket_cutoffs_quantiles) # bucket_weights = heldout_avg_residual.float().quantile(bucket_weights_quantiles)...

Hello, The problem is that the linear weights has not the same name. What I did was simply exporting the linear layer, renaming it, loading it in the BaseColBERT model...