Bryan
Bryan
Hi Alex, this is very impressive work! My use case is for an environment where - each env requires its own process - on a 16 core machine, maximum sample...
Hi, great work! I'm curious as to how much compute (in terms of # cpu cores, # and type of gpus) it takes to run the example on breakout. Thanks!
Quick question: given the attention mask `jnp.tril(jnp.ones(window_size, window_size * 2), window_size)` this means that in this implementation, for a given head & window, the `i`th query ends up attending not...
### Current Behavior Silly bug, adding emoji to notes in wandb.init(notes="emojihere")causes 500 error ### Expected Behavior _No response_ ### Steps To Reproduce _No response_ ### Screenshots _No response_ ### Environment...