YassineYousfi
YassineYousfi
GradScaler has an argument for enabling/disabling the scaler. When disabled, ``scaler.step()`` simply invokes ``optimizer.step()``, and the other methods are no-ops. I thought this made the code a bit cleaner by...
In the manual implementation of causal self-attention, the causal mask is registered as a buffer, which causes DDP to broadcast it at every step. Excluding it from being broadcasted gives...
great work @agrimgupta92! When can we expect the code release? Thanks!
Currently the code only supports bs=1 with input_pos being one dimensional. This fixes input_pos shape in the comments.
### Describe the bug openpilot unavailable - locationd temporary error after a u-turn. ### Provide a route where the issue occurs 09ed4c7e7b4937fb/00000290--440bddbb31/14 ### openpilot version d239c7f3252e2bbe6d94356acbc831ed53e23569 ### Additional info _No...