This repository contains the code to run the evaluation and analyses for MorehopQA: More Than Multi-hop Reasoning.
We also provide the dataset on Huggingface. For more details please also see our paper.
We propose a new multi-hop dataset, MoreHopQA, which shifts from extractive to generative answers. Our dataset is created by utilizing three existing multi-hop datasets: HotpotQA, 2Wiki-MultihopQA, and MuSiQue. Instead of relying solely on factual reasoning, we enhance the existing multi-hop questions by adding another layer of questioning.
Our dataset is created through a semi-automated process, resulting in a dataset with 1118 samples that have undergone human verification.
For each sample, we share our 6 evaluation cases, including the new question, the original question, all the necessary subquestions, and a composite question from the second entity to the final answer (case 3 below)
First, create conda env and activate:
conda env create -f conda_env.yml
conda activate genhop
If running on cuda 11, install pytorch 2 for cuda 11:
pip3 install --upgrade --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
To check, start a terminal with python 3 and check that
import torch
torch.cuda.is_available()
returns True.
To evaluate answer via NER, it is necessary to install the spacy model
python3 -m spacy download en_core_web_sm
Additionally, to run models from OpenAI, add the OpenAI API Key by
export OPENAI_API_KEY=*api_key*
To run on macOS, it might be necessary to install no-mkl versions of numpy and pandas.
conda install nomkl
then
conda install numpy pandas
followed by
conda remove mkl mkl-service
To evaluate all models from the paper, run
run_evaluation.sh
To reproduce our result tables, we provide the summarize_results.ipynb
notebook.
The MorehopQA dataset is licensed under CC BY 4.0
If you find this dataset helpful, please consider citing our paper
@misc{schnitzler2024morehopqa,
title={MoreHopQA: More Than Multi-hop Reasoning},
author={Julian Schnitzler and Xanh Ho and Jiahao Huang and Florian Boudin and Saku Sugawara and Akiko Aizawa},
year={2024},
eprint={2406.13397},
archivePrefix={arXiv}
}