Henry Bigelow
Henry Bigelow
Hi @dizwe , apologies for not noting in the README more clearly, but this project is still under construction and unfortunately there is too much that is changing at the...
Hi, The model is not ready yet. I'm currently training it using the vq-vae bottleneck, and it's only on 5k steps so far, and I've noticed various collapse and other...
Hi Rasipuram, I am pretty new to this sub-field myself so unfortunately don't know very many repos except the WaveNet ones, which aren't designed to extract features. I'll keep it...
Hi Zifan, I'm sorry I cannot be more help here, but I never did succeed in training this model. I tried training it for 10 days on a TPU (full...
Hi Pranay, Thanks for your interest. Unfortunately, the progress is halted at the moment. I just started a new job, and this was a side project. I do intend to...
Hi Sam, Thanks for letting me know. I'm not surprised actually, since it's a bit of an old model. It's my first pull request, so I needed the practice! Cheers,...
I'm not familiar with ninja, but I was able to build `causal_conv1d` from source. First, note that these two commands should produce matching CUDA versions: ```bash python3 -c 'import torch;...
Hi @FloMru, Would [this issue](https://github.com/state-spaces/mamba/issues/55#issuecomment-1858638484) help you? From what I understand, it's required that both torch and causal_conv1d use the same version of CUDA.
Hi Karami, The experiment I did is in [mamba-recall](https://github.com/hrbigelow/mamba-recall). Hopefully that can get you the answers you need. It's been awhile so I don't remember just now, but if I...
Hi Mahdi, That's a very good point. Yes indeed I trained only on the recall token prediction. I interpreted the phrase "trained on the induction head task" to mean actually...