#

ai-safety

Here are 109 public repositories matching this topic...

dynaroars / vnncomp-benchmark-generation

benchmark verification ai-safety ai-assurance vnncomp

Updated May 12, 2024
Python

IQTLabs / aia-platform

Hardened AI Assurance reference platform

cybersecurity ai-safety devsecops ai-assurance

Updated Jan 23, 2023
Python

ea-uct / ai-safety-event-2021

A repository for the event on AI safety hosted by the Effective Altruism Society at the University of Cape Town.

ai-safety effective-altruism

Updated Sep 16, 2021

AlexTMjugador / redwoodresearch-interp-docker

📦 Redwood Research's transformer interpretability tools, conveniently packaged in a Docker container for simple and reproducible deployments.

docker ai ai-safety redwood-research ai-interpretability

Updated Apr 21, 2024
Dockerfile

gabrielhamalwa / magpie

Repository for the LWDA'24 presentation on 'Psychometric Profiling of GPT Models for Bias Exploration', featuring conference materials including the poster, paper, slides, and references.

ai-safety personality-traits interpretability cognitive-bias explainability ai-evaluation gpt-models machine-psychology ai-bias psychometric-analysis lwda24

Updated Sep 23, 2024
TeX

Servan42 / AI_story

Short story about artificial general intelligence (firstly an english homework).

artificial-intelligence story artificial-general-intelligence ai-safety

Updated Dec 13, 2018

ztjona / ztjona.github.io

My personal website.

machine-learning deep-learning ai-safety

Updated Oct 1, 2024
HTML

oscaem / preparedness-challenge

This project contains a proof of concept outlining the potential misuse of contemporary Artificial Intelligence models to influence public perception, highlighting the need to engineer robust defenses against such threats to ensure safety of our political systems. Entry for the OpenAI Preparedness Challenge.

research openai ai-safety preparedness

Updated Jan 14, 2024

campbellborder / spar-aaron-dolphin

ai-safety sycophancy

Updated Feb 19, 2024
Jupyter Notebook

ArnoldIOI / Educational-System-with-RAG

Knowledge representation model for educational retrieval-augmented generation systems

semantic-web provenance ai-safety ontology-engineering ai-ethics rag

Updated Jul 3, 2024

danielmamay / nlp-ethics

In-depth evaluation of the ETHICS utilitarianism task dataset. An assessment of approaches to improved interpretability (SHAP, Bayesian transformers).

machine-learning natural-language-processing ai-safety

Updated Jun 3, 2021
Jupyter Notebook

MattiasHenders / lakera-gandalf-answers

A Solution to The Gandalf AI from Lakera. https://gandalf.lakera.ai/ The Gandalf LLM README documents the inputs used to reveal secret passwords through various levels of the Gandalf AI by Lakera, with each input tested multiple times for consistency.

machine-learning deep-learning deep-reinforcement-learning ai-safety ai-security-testing

Updated May 18, 2024

governanceai / AGI-safety-and-governance-practices

Analysis of the survey "Towards best practices in AGI safety and governance: A survey of expert opinion"

artificial-intelligence ai-safety ai-governance artificial-intelligence-governance artificial-intelligence-safety expert-survey

Updated May 11, 2023
Jupyter Notebook

Omegastick / credit-hacking

Eliciting credit hacking behaviours in large language models

Updated Sep 14, 2023
Python

tomdug / ai-sandbagging

🤖 AI Sandbagging: an Interactive Explanation

ai-safety ai-alignment

Updated Sep 29, 2024
TypeScript

Privacy-Engineering-CMU / ai-risk-prettified

A prettified page for MIT's AI Risk Database

machine-learning privacy ai deep-learning risk-analysis risk jailbreak artificial-intelligence safety ai-safety ethics ai-ethics ai-risk ethics-in-ai large-language-models llm llms large-language-model llm-safety

Updated Aug 24, 2024
HTML

riceissa / miri-top-contributors

sql ai-safety ai-alignment donations-list-website

Updated Sep 2, 2018
HTML

gserapio / intersectional-ai-safety

R code for Intersectionality in Conversational AI Safety: How Bayesian Multilevel Models Help Understand Diverse Perceptions of Safety

r multilevel-models brms ai-safety conversational-ai responsible-ai google-research

Updated Feb 9, 2024
R

MIRIxPrague / organization

materials related to ideas on reading materials, events and, in general, the form of MIRIxPrague

workshops ai-safety reading-materials

Updated Apr 2, 2018

indabaX-ai-safety-workshop-2023

EffectiveAltruismUCT / indabaX-ai-safety-workshop-2023

IndabaX AI Safety Workshop 2023

ai africa ai-safety

Updated Jul 15, 2023

Improve this page

Add a description, image, and links to the ai-safety topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-safety topic, visit your repo's landing page and select "manage topics."