DecisionTransformerInterpretability
DecisionTransformerInterpretability copied to clipboard
Interpreting how transformers simulate agents performing RL tasks
Hey there :wave: Just wanted to let you know that [your app on Streamlit Cloud deployed from this repo](https://jbloomaus-decisiontransformerinterpretability-app-4edcnc.streamlit.app/jbloomaus/decisiontransformerinterpretability/main/app.py) has gone over its resource limits. Access to the app is...
Hey there :wave: Just wanted to let you know that [your app on Streamlit Cloud deployed from this repo](https://jbloomaus-decisiontransformerinterpretability-app-4edcnc.streamlit.app/jbloomaus/decisiontransformerinterpretability/main/app.py) has gone over its resource limits. Access to the app is...
Hey very quick little issue. In the current version of main, `cuda` cannot be disabled for `run_ppo.py`. If one looks at `utils.py` we have: ``` parser.add_argument( "--cuda", action="store_true", default=True, help="if...
Merged the old PR in and added a new test. Ensured acceptance and unit tests pass. Added additional test as requested in #54 to test that the correct number of...
On default settings, I selected the AVE analysis, and got the following: ``` AssertionError: This app has encountered an error. The original error message is redacted to prevent data leaks....
- [x] 1. Dot product between each output action. - [x] 2. Dot product between each input action. - [ ] 3. Dot product between each time embedding. - [x]...
Basic concept is that we can sample from which heads we actually compute randomly in order to see which matter. Shapley values are usually computed over all subsets of heads....
QK - [ ] State to Action. OV - [] Fix head selection (default to all) - [ ] Find a way to automatically find axes in the OV circuit...
- [ ] Use t-lens naming scheme - [ ] Enable arbitrary combinations of heads and MLPs
- [x] Psychological eval - [ ] Activation Patching for instruction and RTG. -> try to explain - [ ] Work out how to tackle targets (patching same object multiple...