Lars Hillebrand
Lars Hillebrand
Hi, When I convert a normal dictionary like ```python some_dict = {'a': 1, 'b': 2, 'c': 3} ``` to an attribute dictionary via ```python from addict import Dict attribute_dict =...
Hi, Since you are the blog entry author I reference you @patrickvonplaten in this issue. With great interest I read the blog entry on the Reformer model ([https://huggingface.co/blog/reformer](https://huggingface.co/blog/reformer)). In Section...
## 🚀 Feature Metric states seem to be limited to `torch.Tensor` or `List[torch.Tensor]`. In my usecase i want to store a dictionary as state. My dataset comprises of samples who...
Hi, I have a quick question with respect to the relative shift operation: ```python def _rel_shift(self, x, zero_triu=False): zero_pad = torch.zeros((x.size(0), 1, *x.size()[2:]), device=x.device, dtype=x.dtype) x_padded = torch.cat([zero_pad, x], dim=1)...
I actually encountered a similar scenario. The standard Huggingface [bert-base-cased](https://huggingface.co/bert-base-cased/blob/main/config.json) model trained with 16 bit mixed precision (using pytorch-lightning), a vocab size of 100K and a seq len of 1024...
# 🐛 Bug I am currently experimenting with different scaled dot product attention implementations to evaluate training speed and GPU memory consumption. I compared all methods running the following `train.py`...
With: ``` # !pip install fluidml[examples] !git clone https://github.com/fluidml/fluidml.git %cd fluidml/ !git checkout graph-visualization !pip install .[examples] %cd fluidml/examples/pytorch_transformer_seq2seq_translation ``` it's already possible to load all the imports. However, multiprocessing...