Skip to content

The evaluation code for the paper "MoreHopQA: More Than Multi-hop Reasoning"

License

Notifications You must be signed in to change notification settings

Alab-NII/morehopqa

Repository files navigation

MoreHopQA: More Than Multi-hop Reasoning

This repository contains the code to run the evaluation and analyses for MorehopQA: More Than Multi-hop Reasoning.
We also provide the dataset on Huggingface. For more details please also see our paper.


The dataset

We propose a new multi-hop dataset, MoreHopQA, which shifts from extractive to generative answers. Our dataset is created by utilizing three existing multi-hop datasets: HotpotQA, 2Wiki-MultihopQA, and MuSiQue. Instead of relying solely on factual reasoning, we enhance the existing multi-hop questions by adding another layer of questioning.

Our dataset is created through a semi-automated process, resulting in a dataset with 1118 samples that have undergone human verification.

For each sample, we share our 6 evaluation cases, including the new question, the original question, all the necessary subquestions, and a composite question from the second entity to the final answer (case 3 below)


Setup

First, create conda env and activate:

conda env create -f conda_env.yml
conda activate genhop

If running on cuda 11, install pytorch 2 for cuda 11:

pip3 install --upgrade --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

To check, start a terminal with python 3 and check that

import torch
torch.cuda.is_available()

returns True.

To evaluate answer via NER, it is necessary to install the spacy model

python3 -m spacy download en_core_web_sm

Additionally, to run models from OpenAI, add the OpenAI API Key by

export OPENAI_API_KEY=*api_key*

macOS

To run on macOS, it might be necessary to install no-mkl versions of numpy and pandas.

conda install nomkl

then

conda install numpy pandas

followed by

conda remove mkl mkl-service

Run

To evaluate all models from the paper, run

run_evaluation.sh

To reproduce our result tables, we provide the summarize_results.ipynb notebook.


License

The MorehopQA dataset is licensed under CC BY 4.0

If you find this dataset helpful, please consider citing our paper

@misc{schnitzler2024morehopqa,
      title={MoreHopQA: More Than Multi-hop Reasoning}, 
      author={Julian Schnitzler and Xanh Ho and Jiahao Huang and Florian Boudin and Saku Sugawara and Akiko Aizawa},
      year={2024},
      eprint={2406.13397},
      archivePrefix={arXiv}
}

About

The evaluation code for the paper "MoreHopQA: More Than Multi-hop Reasoning"

Resources

License

Stars

Watchers

Forks