Skip to content

AlignmentResearch/lp_sae

 
 

Repository files navigation

Screenshot 2024-03-21 at 3 08 28 pm

SAE Lens

PyPI License: MIT build Deploy Docs codecov

SAELens exists to help researchers:

  • Train sparse autoencoders.
  • Analyse sparse autoencoders / research mechanistic interpretability.
  • Generate insights which make it easier to create safe and aligned AI systems.

Please refer to the documentation for information on how to:

  • Download and Analyse pre-trained sparse autoencoders.
  • Train your own sparse autoencoders.
  • Generate feature dashboards with the SAE-Vis Library.

SAE Lens is the result of many contributors working collectively to improve humanity's understanding of neural networks, many of whom are motivated by a desire to safeguard humanity from risks posed by artificial intelligence.

This library is maintained by Joseph Bloom and David Chanin.

Loading Pre-trained SAEs.

Pre-trained SAEs for various models can be imported via SAE Lens. See this page in the readme for a list of all SAEs.

Tutorials

Join the Slack!

Feel free to join the Open Source Mechanistic Interpretability Slack for support!

Citation

Please cite the package as follows:

@misc{bloom2024saetrainingcodebase,
   title = {SAELens
   author = {Joseph Bloom, David Chanin},
   year = {2024},
   howpublished = {\url{https://github.com/jbloomAus/SAELens}}
}}

About

Training Sparse Autoencoders on DRC networks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 95.0%
  • Jupyter Notebook 3.0%
  • Python 2.0%