-
Updated
May 12, 2024 - Python
ai-safety
Here are 109 public repositories matching this topic...
Hardened AI Assurance reference platform
-
Updated
Jan 23, 2023 - Python
A repository for the event on AI safety hosted by the Effective Altruism Society at the University of Cape Town.
-
Updated
Sep 16, 2021
📦 Redwood Research's transformer interpretability tools, conveniently packaged in a Docker container for simple and reproducible deployments.
-
Updated
Apr 21, 2024 - Dockerfile
Repository for the LWDA'24 presentation on 'Psychometric Profiling of GPT Models for Bias Exploration', featuring conference materials including the poster, paper, slides, and references.
-
Updated
Sep 23, 2024 - TeX
Short story about artificial general intelligence (firstly an english homework).
-
Updated
Dec 13, 2018
This project contains a proof of concept outlining the potential misuse of contemporary Artificial Intelligence models to influence public perception, highlighting the need to engineer robust defenses against such threats to ensure safety of our political systems. Entry for the OpenAI Preparedness Challenge.
-
Updated
Jan 14, 2024
-
Updated
Feb 19, 2024 - Jupyter Notebook
Knowledge representation model for educational retrieval-augmented generation systems
-
Updated
Jul 3, 2024
In-depth evaluation of the ETHICS utilitarianism task dataset. An assessment of approaches to improved interpretability (SHAP, Bayesian transformers).
-
Updated
Jun 3, 2021 - Jupyter Notebook
A Solution to The Gandalf AI from Lakera. https://gandalf.lakera.ai/ The Gandalf LLM README documents the inputs used to reveal secret passwords through various levels of the Gandalf AI by Lakera, with each input tested multiple times for consistency.
-
Updated
May 18, 2024
Analysis of the survey "Towards best practices in AGI safety and governance: A survey of expert opinion"
-
Updated
May 11, 2023 - Jupyter Notebook
A prettified page for MIT's AI Risk Database
-
Updated
Aug 24, 2024 - HTML
-
Updated
Sep 2, 2018 - HTML
R code for Intersectionality in Conversational AI Safety: How Bayesian Multilevel Models Help Understand Diverse Perceptions of Safety
-
Updated
Feb 9, 2024 - R
materials related to ideas on reading materials, events and, in general, the form of MIRIxPrague
-
Updated
Apr 2, 2018
Improve this page
Add a description, image, and links to the ai-safety topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the ai-safety topic, visit your repo's landing page and select "manage topics."