Andy Lo

Results 5 comments of Andy Lo

Suggestion for better docs: The order in which things are concatenated together in the residual stack isn't always clear. Example: https://github.com/neelnanda-io/TransformerLens/blob/829084a53836c5b8b388aa37a5ffce73b6371712/transformer_lens/ActivationCache.py#L1026-L1039 Specifically "... decomposition of the residual stream into **embed,...

Doesn't it get optimised away by the compiler anyway? (Haven't actually checked though) Plus pointwise operations are bandwidth-limited anyway, so adding/removing a few flops shouldn't make a difference.

A temporary workaround is to save to a temp directory and copy the saved content to the remote file system, though this wouldn't work so easily with the checkpoint manager...

Regarding the overall design of logit processors in V1, any reason why the grammar processor needs to live in the main (scheduler) process? The overall goal of V1 seems to...