RAE: Retrieval-enhanced Knowledge Editing for Multi-Hop Question Answering

This repository contains the official implementation of our CIKM'2024 paper "Retrieval-enhanced Knowledge Editing in Language Models for Multi-Hop Question Answering" by Yucheng Shi, Qiaoyu Tan, Xuansheng Wu, Shaochen Zhong, Kaixiong Zhou, Ninghao Liu.

Overview

RAE is a novel framework for editing knowledge in large language models (LLMs) for multi-hop question answering tasks. It employs mutual information maximization for fact retrieval and a self-optimizing technique to prune redundant data.

Data

MQUAKE-CF-3k and MQUAKE-T Edited Knowledge Graph (KG)

You can download from these links: KG_MQUAKE-CF-3k and KG_MQUAKE-T.

Put them into ./data/.

Editing from Other Datasets

To build your edited KG :

python edit_KG.py

Note: You need to first download the original Wikidata KG from here. This Wikidata KG is based on the Wikidata5m project.

Dependencies

Please refer to requirements.txt for the list of dependencies.

Running the Code

Ensure you have prepared the edited KG before running:

Editing on MQUAKE-CF-3k

python main.py --model gpt2 --mode beam --dataset MQuAKE-CF-3k

Editing on MQUAKE-T

python main.py --model gpt2 --mode beam --dataset MQuAKE-T

Arguments Explanation

NatureL: When enabled, transforms a triple into a human-readable natural language statement. This benefits LLM modeling and improves retrieval success. Enabled by default.
Template: When enabled, builds "question+fact chain" as in-context examples to help LLMs understand the task. Examples are extracted from MQUAKE-CF, containing 9k examples different from the test cases.
Template_number: Number of templates used to extract relevant facts for fact chain retrieval. Default is 3.
entropy_template_number: Number of templates used for knowledge pruning tasks. Default is 6.
correctConflict: Specific design for MQUAKE-CF-3k dataset to handle editing conflicts where both unedited and edited versions of a fact are needed to answer different questions. You can leanr more details about this issue from DeepEdit. Enabled by default but not necessary for other datasets.

Citation

If you find this work helpful, please cite our paper:

@article{shi2024retrieval,
  title={Retrieval-enhanced knowledge editing for multi-hop question answering in language models},
  author={Shi, Yucheng and Tan, Qiaoyu and Wu, Xuansheng and Zhong, Shaochen and Zhou, Kaixiong and Liu, Ninghao},
  journal={arXiv preprint arXiv:2403.19631},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
preprocess		preprocess
wiki_api		wiki_api
README.md		README.md
main.py		main.py
model.py		model.py
requirements.txt		requirements.txt
utils_func.py		utils_func.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAE: Retrieval-enhanced Knowledge Editing for Multi-Hop Question Answering

Overview

Data

MQUAKE-CF-3k and MQUAKE-T Edited Knowledge Graph (KG)

Editing from Other Datasets

Dependencies

Running the Code

Editing on MQUAKE-CF-3k

Editing on MQUAKE-T

Arguments Explanation

Citation

About

Releases

Packages

Languages

sycny/RAE

Folders and files

Latest commit

History

Repository files navigation

RAE: Retrieval-enhanced Knowledge Editing for Multi-Hop Question Answering

Overview

Data

MQUAKE-CF-3k and MQUAKE-T Edited Knowledge Graph (KG)

Editing from Other Datasets

Dependencies

Running the Code

Editing on MQUAKE-CF-3k

Editing on MQUAKE-T

Arguments Explanation

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages