elk
elk copied to clipboard
Keeping language models honest by directly eliciting knowledge encoded in their activations.
updates: - [github.com/pre-commit/pre-commit-hooks: v4.4.0 → v6.0.0](https://github.com/pre-commit/pre-commit-hooks/compare/v4.4.0...v6.0.0) - https://github.com/psf/black → https://github.com/psf/black-pre-commit-mirror - [github.com/psf/black-pre-commit-mirror: 23.7.0 → 25.11.0](https://github.com/psf/black-pre-commit-mirror/compare/23.7.0...25.11.0) - [github.com/astral-sh/ruff-pre-commit: v0.0.278 → v0.14.6](https://github.com/astral-sh/ruff-pre-commit/compare/v0.0.278...v0.14.6) - [github.com/codespell-project/codespell: v2.2.5 → v2.4.1](https://github.com/codespell-project/codespell/compare/v2.2.5...v2.4.1)
Reproduced on my local setup and on colab ```py !pip install git+https://github.com/EleutherAI/elk/ import elk ``` ``` ----> 2 import elk [/usr/local/lib/python3.10/dist-packages/elk/__init__.py](https://localhost:8080/#) in ----> 1 from .evaluation import Eval 2 from...
It resolves issue #295
https://github.com/RohitRathore1/elk/blob/84e99a36a5050881d85f1510a2486ce46ac1f942/tests/test_smoke_eval.py#L19C1-L20C35
Ensembling from mid to last layer
Previously, the visualization code only allowed for ``auroc_estimate`` to be visualized. This PR adds this functionality, allowing users to specify an optional --metric argument in ``elk plot`` or ``elk sweep``
Solves NOT-291 This is quite a complex change, but this basically aims to train a reporter model per prompt, then evaluate it both on each individual prompt as well as...
fixes not-273
Adds `LdaFitter` for supervised LDA reporters