tuned-lens
tuned-lens copied to clipboard
Tools for understanding how transformer predictions are built layer-by-layer
Hello, Thank you so much for your hard work! Is there any code to reproduce the experiments for table 1? Thanks!
We also need to bump some dependencies for this since gemma does not exist in the version of transformers we were requiring.
Drop Python 3.9 support too while we're at it (enabling `|` for union types) _Originally posted by @norabelrose in https://github.com/AlignmentResearch/tuned-lens/issues/125#issuecomment-1968239228_
Currently, if you try to create a prediction trajectory from a model and lens loaded in `bfloat16` error. ``` 294 traj_log_probs.append( --> 295 logits.log_softmax(dim=-1).squeeze().detach().cpu().numpy() 296 ) 298 # Add model...
Hello! Thanks for sharing this amazing work! I am trying to train the lens over a new dataset [HF Dataset](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) (note that the original "the pile" dataset was removed from...
Before white-box -> tuned-lens rename, the CLI was like this: ```white-box lens train ``` and ```white-box cbe extract ``` But after the rename we have ```tuned-lens train ``` without the...
**Describe the bug** Checkpointing crashes when `--zero` is set, with the error `RuntimeError: Tensors must be CUDA and dense` being thrown inside the method `consolidate_state_dict()` **Expected behavior** Shouldn't crash **Screenshots**
In the paper there is a nice visualization of prediction depth. Prediction depth is defined in the paper is the first layer where the most likely token is equal to...
This feature will be removed in #63. It would be nice to reimplement this so that we can not only see which tokens have a high probability at each layer...