CF-TriviaQA Dataset

This repository contains the CF-TriviaQA dataset for the paper Hallucination Augmented Recitations for Language Models.

Overview

CF-TriviaQA is a counterfactual open book QA dataset generated from the TriviaQA dataset using the Hallucination Augmented Recitations (HAR) approach. The purpose of this dataset is to improve attribution in Large Language Models (LLMs) by providing high-quality, attributable, and counterfactual examples.

Dataset Description

Size: 16,853 examples
Source: Generated from TriviaQA using HAR
Format: JSONL (JSON Lines)

Each entry in the dataset is a JSON object with the following structure:

{
  "question_text": "String containing the question from TriviaQA",
  "paragraph_text": "String containing the generated counterfactual document",
  "annotation": {
    "answer": [
      {
        "paragraph_reference": {
          "string": "String containing the generated counterfactual answer"
        }
      }
    ]
  },
  "question_id": "String identifier for the question"
}

Fields:

question_text: The original question from TriviaQA.
paragraph_text: The counterfactual document generated by HAR.
annotation.answer[0].paragraph_reference.string: The counterfactual answer generated by HAR.
question_id: A unique identifier for each question-answer pair.

The dataset is stored in a JSONL file named har_dataset.jsonl, where each line represents a single example.

Key Features

Counterfactual: All examples are counterfactual, conflicting with the original TriviaQA answers.
High Attribution: Answers are grounded in the generated documents.
Diverse Counterfactuals: Includes simple counterfactuals, temporal questions, and ambiguous questions.

Generation Process (HAR)

Recitation Generation: Using PaLM 2-L to generate multiple document-answer pairs for each TriviaQA question.
Factuality Filtering: Removing factual generations to ensure counterfactuality.
Attribution Filtering: Ensuring generated answers are grounded in the generated documents.

Evaluation

The dataset has been evaluated for:

Attribution: 0.87 score
Counterfactuality: 0.68 score

(Based on NLI-based evaluation using a T5-11B model)

Impact

Models finetuned with CF-TriviaQA show significant improvements in out-of-domain QA tasks, demonstrating enhanced text grounding capabilities.

License

This dataset is released under the Apache 2.0 License.

Citation

If you use this dataset in your research, please cite:

@misc{köksal2023hallucinationaugmentedrecitationslanguage,
      title={Hallucination Augmented Recitations for Language Models}, 
      author={Abdullatif Köksal and Renat Aksitov and Chung-Ching Chang},
      year={2023},
      eprint={2311.07424},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2311.07424}, 
}

Contact

For questions or issues related to the dataset, please open an issue in this repository or contact the authors via the information provided in the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
har_dataset.jsonl		har_dataset.jsonl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CF-TriviaQA Dataset

Overview

Dataset Description

Fields:

Key Features

Generation Process (HAR)

Evaluation

Impact

License

Citation

Contact

About

Releases

Packages

License

google-research-datasets/cf_triviaqa

Folders and files

Latest commit

History

Repository files navigation

CF-TriviaQA Dataset

Overview

Dataset Description

Fields:

Key Features

Generation Process (HAR)

Evaluation

Impact

License

Citation

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages