GiLOT: Interpreting Generative Language Models via Optimal Transport

Abstract

While large language models (LLMs) surge with the rise of generative AI, algorithms to explain LLMs highly desire. Existing feature attribution methods adequate for discriminative language models like BERT often fail to deliver faithful explanations for LLMs, primarily due to two issues: (1) For every specific prediction, the LLM outputs a probability distribution over the vocabulary–a large number of tokens with unequal semantic distance; (2) As an autoregressive language model, the LLM handles input tokens while generating a sequence of probability distributions of various tokens. To address above two challenges, this work proposes GiLOT that leverages Optimal Transport to measure the distributional change of all possible generated sequences upon the absence of every input token, while taking into account the tokens’ similarity, so as to faithfully estimate feature attribution for LLMs. We have carried out extensive experiments on top of Llama families and their fine-tuned derivatives across various scales to validate the effectiveness of GiLOT for estimating the input attributions. The results show that GiLOT outperforms existing solutions on a number of faithfulness metrics under fair comparison settings.

paper link

Getting Started

There are two main ways to begin:

For a simple demonstration, please use the demo.ipynb file. Please note that the visualization may not display correctly on GitHub. Refer to the figure below for the expected result. (This demo will be continuously enhanced.)
To reproduce the results from our paper, please refer to the scripts in the reproduce/llama_variants directory.

We welcome you to open issues and engage in discussions with us.

Citing

If you find this project useful in your research, please consider cite:

@inproceedings{
   li2024gilot,
   title={Gi{LOT}: Interpreting Generative Language Models via Optimal Transport},
   author={Xuhong Li and Jiamin Chen and Yekun Chai and Haoyi Xiong},
   booktitle={Forty-first International Conference on Machine Learning},
   year={2024},
   url={https://openreview.net/forum?id=qKL25sGjxL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
misc		misc
reproduce/llama_variants		reproduce/llama_variants
LICENSE		LICENSE
README.md		README.md
demo.ipynb		demo.ipynb
evaluation.py		evaluation.py
interpreter.py		interpreter.py
run.sh		run.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GiLOT: Interpreting Generative Language Models via Optimal Transport

Abstract

Getting Started

Citing

About

Releases

Packages

Contributors 2

Languages

License

holyseven/GiLOT

Folders and files

Latest commit

History

Repository files navigation

GiLOT: Interpreting Generative Language Models via Optimal Transport

Abstract

Getting Started

Citing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages