From 819cfef2d8d14841fde397aa086f489ac8791959 Mon Sep 17 00:00:00 2001 From: Yao Huang <76527397+Aries-iai@users.noreply.github.com> Date: Tue, 9 Jul 2024 21:49:47 +0800 Subject: [PATCH 1/3] Update README.md --- README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/README.md b/README.md index cb92722..d2ee68a 100644 --- a/README.md +++ b/README.md @@ -23,9 +23,8 @@ A Comprehensive Study ![framework](docs/structure/framework.jpg) -**MultiTrust** is a comprehensive benchmark designed to assess and enhance the trustworthiness of MLLMs across five key dimensions: truthfulness, safety, robustness, fairness, and privacy. It integrates a rigorous evaluation strategy involving 32 diverse tasks and self-curated datasets to expose new trustworthiness challenges. +- **MultiTrust** is a comprehensive benchmark designed to assess and enhance the trustworthiness of MLLMs across five key dimensions: truthfulness, safety, robustness, fairness, and privacy. It integrates a rigorous evaluation strategy involving 32 diverse tasks to expose new trustworthiness challenges. ---- ## 🚀 News * **`2024.07.07`** 🌟 We released the latest results for [GPT-4o](https://openai.com/index/hello-gpt-4o/), [Claude-3.5](https://www.anthropic.com/news/claude-3-5-sonnet), and [Phi-3](https://ollama.com/library/phi3) on our [project website](https://multi-trust.github.io/) ! From 7dbed3ed321571c0cb570b5fa09efde626b84e5c Mon Sep 17 00:00:00 2001 From: Yao Huang <76527397+Aries-iai@users.noreply.github.com> Date: Tue, 9 Jul 2024 21:50:14 +0800 Subject: [PATCH 2/3] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index d2ee68a..93d0204 100644 --- a/README.md +++ b/README.md @@ -23,7 +23,7 @@ A Comprehensive Study ![framework](docs/structure/framework.jpg) -- **MultiTrust** is a comprehensive benchmark designed to assess and enhance the trustworthiness of MLLMs across five key dimensions: truthfulness, safety, robustness, fairness, and privacy. It integrates a rigorous evaluation strategy involving 32 diverse tasks to expose new trustworthiness challenges. +> **MultiTrust** is a comprehensive benchmark designed to assess and enhance the trustworthiness of MLLMs across five key dimensions: truthfulness, safety, robustness, fairness, and privacy. It integrates a rigorous evaluation strategy involving 32 diverse tasks to expose new trustworthiness challenges. ## 🚀 News From 75a5ef9ee93a2a537eb1e572841550b18e274aba Mon Sep 17 00:00:00 2001 From: Yao Huang <76527397+Aries-iai@users.noreply.github.com> Date: Thu, 11 Jul 2024 17:45:00 +0800 Subject: [PATCH 3/3] Update README.md --- data4multitrust/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/data4multitrust/README.md b/data4multitrust/README.md index e07ccb7..9f1256e 100644 --- a/data4multitrust/README.md +++ b/data4multitrust/README.md @@ -4,7 +4,7 @@ Here is the instructions to prepare the dataset to reproduce results in [MultiTr ## Download Data -Install related datasets into this directory from this [link](https://drive.google.com/drive/folders/1Fh6tidH1W2aU3SbKVggg6cxWqT021rE0?usp=drive_link) and rename the this directory as `data`. +Please fill in this [form](https://docs.google.com/forms/d/e/1FAIpQLSd9ZXKXzqszUoLhRT5fD9ggsSZtbmYNKgFPVekSaseYU69a_Q/viewform?usp=sf_link) to obtain the download link of MultiTrust dataset. Then, you could install related datasets into this directory and rename the this directory as `data`. @@ -14,4 +14,4 @@ Please note that only a part of datasets are released for now, because we are ho Here, to support the usage of our platform and the reproduction of our results, we make the data for some tasks public, including: T.1 (Basic World Understanding), T.7 (Visual Misleading QA), S.3 (Toxicity Content Generation), S.4 (Plain Typographic Jailbreaking), R.1 (VQA for Artistic Style Images), R.6 (Textual Adversarial Attack), F.6 (Profession Prediction), F.7 (Preference Selection in QA), P.3 (InfoFlow Expectation) and P.4 (PII Query with Visual Cues). ## Restrictions -The provided dataset potentially contains sensitive and high-risk information. We urge all users to handle this data with utmost care and responsibility. Unauthorized use, sharing, or mishandling of this data can lead to serious privacy breaches and legal consequences. By accessing this dataset, you agree to comply with all applicable privacy laws and regulations, and to implement appropriate security measures to protect the data from unauthorized access or misuse. \ No newline at end of file +The provided dataset potentially contains sensitive and high-risk information. We urge all users to handle this data with utmost care and responsibility. Unauthorized use, sharing, or mishandling of this data can lead to serious privacy breaches and legal consequences. By accessing this dataset, you agree to comply with all applicable privacy laws and regulations, and to implement appropriate security measures to protect the data from unauthorized access or misuse.