FAITH

Description

This code is for the FAITH method proposed in our WWW'24 full paper "Faithful Temporal Question Answering over Heterogeneous Sources".

Please visit the following repo LINK to access the code for the TIQ benchmark construction.

Overview of the FAITH pipeline. The figure illustrates the process for answering q₃ (“Queen’s record company when recording Bohemian Rhapsody?” ) and q₁ (“Record company of Queen in 1975?” ). For answering q₃, two intermediate questions q₃₁ and q₃₂ are generated, and run recursively through the entire temporal QA system.

For more details see our paper: Faithful Temporal Question Answering over Heterogeneous Sources and visit our project website: https://faith.mpi-inf.mpg.de.

If you use this code, please cite:

@article{jia2024faithful,
  title={Faithful Temporal Question Answering over Heterogeneous Sources},
  author={Jia, Zhen and Christmann, Philipp and Weikum, Gerhard},
  journal={WWW},
  year={2024}

Environment setup

We recommend the installation via conda, and provide the corresponding environment file in environment.yml:

  git clone https://github.com/zhenjia2017/FAITH.git
  cd FAITH/
  conda env create --file environment.yml
  conda activate faith
  pip install -e .

Alternatively, you can also install the requirements via pip, using the requirements.txt file.

Dependencies

FAITH makes use of CLOCQ for retrieving facts from WIKIDATA. CLOCQ can be conveniently integrated via the publicly available API, using the client from the repo.

FAITH makes use of SUTIME for annotating explicit dates in questions. You can install python-sutime via the following command.

  pip install sutime
  mvn dependency:copy-dependencies -DoutputDirectory=./jars -f $(python3 -c 'from importlib import util; import pathlib; print(pathlib.Path(util.find_spec("sutime").origin).parent / "pom.xml")')

Then you can run the script for starting SUTIME as a backend service.

  bash scripts/start_sutime_server.sh

Data

You need the following data:

wikipedia_wikidata_mappings.pickle
wikipedia_mappings.pickle
wikidata_mappings.pickle
types.pickle
stopwords.txt
labels.pickle
augmented_wikidata_mappings.pickle

We provide the trained model and data for reproducing the results. You can download from here (unzip and put it in the "_data" folder; total data size around 20 GB). The data folder structure is as follows:

_data
├──tiq (or timequestions)
    ├── intermediate_question
    ├── tsf_annotation
    ├── iques_model.bin
    ├── tsf_model.bin
    ├── faith
        ├──sbert_model.bin
        ├──seq2seq_ha
        └──explaignn
            ├── gnn-answering-ignn-100-05-05.bin
            └── gnn-pruning-ignn-100-00-10.bin
    ├── unfaith
        ├──sbert_model.bin
        └──explaignn
            ├── gnn-answering-ignn-100-05-05.bin
            └── gnn-pruning-ignn-100-00-10.bin
├── wikipedia_wikidata_mappings.pickle    
├── wikipedia_mappings.pickle
├── wikidata_mappings.pickle
├── types.pickle
├── stopwords.txt
├── labels.pickle
└── augmented_wikidata_mappings.pickle

tiq/intermediate_question: generated intermediate questions from GPT as training dataset for fine-tuning BART model on TIQ benchmark
tiq/tsf_annotation: TSF training data generated via distant supervision for fine-tuning BART model on TIQ benchmark
tiq/iques_model.bin: fine-tuned BART model for generating intermediate questions in TQU stage
tiq/tsf_model.bin: fine-tuned BART model for generating TSF in TQU stage
tiq/faith/sbert_model.bin: fine-tuned BERT model for scoring evidence in FER stage
tiq/faith/seq2seq_ha: fine-tuned BART model for heterogeneous answering in HA stage
tiq/faith/explaignn/gnn-pruning-ignn-100-00-10.bin: graph reduction model in HA stage
tiq/faith/explaignn/gnn-answering-ignn-100-05-05.bin: answer inference model in HA stage
tiq/unfaith/sbert_model.bin: fine-tuned BERT model for scoring evidence in FER stage for UNFAITH settings
tiq/unfaith/explaignn/gnn-pruning-ignn-100-00-10.bin: graph reduction model in HA stage for UNFAITH settings
tiq/unfaith/explaignn/gnn-answering-ignn-100-05-05.bin: answer inference model in HA stage for UNFAITH settings

Reproduce paper results

Main results

Please run the following script to reproduce the main results of FAITH (Table 3 in the WWW 2024 paper):

  bash scripts/pipeline.sh --evaluate <PATH_TO_CONFIG>

For example,

  bash scripts/pipeline.sh --evaluate config/evaluate.yml

If you want to reproduce the results of UNFAITH, please set faith_or_unfaith as unfaith in the config file.

Training

There are three stages in FAITH: Temporal Question Understanding (TQU), Faithful Evidence Retrieval (FER), and Explainable Heterogeneous Answering (EHA).

TQU

In the TQU stage, there are two Seq2seq models for:

(i) generating TSFs
(ii) generating intermediate questions

We already provide the annotated TSF, and intermediate questions for TIQ and TimeQuestions benchmarks, as training data respectively. If you would like to generate annotated TSF for other datasets, please follow the instruction in the document.

For generating intermediate questions as training data via GPT, please follow the instruction in the document.

FER

In the FER stage, we apply a BERT-based reranker to train classifier and score evidences as the input for answering.

EHA

In the EHA stage, there are two GNN models for:

(i) graph reduction
(ii) answer inference

We apply the two GNN models in answer inference stage:

(i) graph reduction: as the number of evidence decreases, the size of the graph is reduced. The number of evidence decreases from 100 to 20.
(ii) answer inference: among the 20 evidence, we conduct the answer inference and output the 5 evidence.

For training the models individually, please run the following script:

  bash scripts/pipeline.sh --train_<stage name> <PATH_TO_CONFIG> [<SOURCES_STR>]

For example, training the answer inference model.

  bash scripts/pipeline.sh --train_ha config/train_ha_answer_inference.yml

Note that you need two the config files for training the two GNN models respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
_benchmarks		_benchmarks
_data		_data
_intermediate_representations		_intermediate_representations
config		config
faith		faith
scripts		scripts
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
faith-overview-figure.png		faith-overview-figure.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FAITH

Description

Environment setup

Dependencies

Data

Reproduce paper results

Main results

Training

TQU

FER

EHA

For training the models individually, please run the following script:

About

Releases 1

Packages

Contributors 2

Languages

License

zhenjia2017/FAITH

Folders and files

Latest commit

History

Repository files navigation

FAITH

Description

Environment setup

Dependencies

Data

Reproduce paper results

Main results

Training

TQU

FER

EHA

For training the models individually, please run the following script:

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages