Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations
Humans possess a remarkable ability to assign novel interpretations to linguistic expressions, enabling them to learn new words and understand community-specific connotations. However, Large Language Models (LLMs) have a knowledge cutoff and are costly to finetune repeatedly. Therefore, it is crucial for LLMs to learn novel interpretations in-context. In this paper, we systematically analyse the ability of LLMs to acquire novel interpretations using in-context learning. To facilitate our study, we introduce MAGNIFICo, an evaluation suite implemented within a text-to-SQL semantic parsing framework that incorporates diverse tokens and prompt settings to simulate real-world complexity. Experimental results on MAGNIFICo demonstrate that LLMs exhibit a surprisingly robust capacity for comprehending novel interpretations from natural language descriptions as well as from discussions within long conversations. Nevertheless, our findings also highlight the need for further improvements, particularly when interpreting unfamiliar words or when composing multiple novel interpretations simultaneously in the same example. Additionally, our analysis uncovers the semantic predispositions in LLMs and reveals the impact of recency bias for information presented in long contexts.
- compatible with python 3
- dependencies can be installed using
MAGNIFICo/requirements.txt
Install VirtualEnv using the following (optional):
$ [sudo] pip install virtualenv
Create and activate your virtual environment (optional):
$ virtualenv -p python3 venv
$ source venv/bin/activate
Install all the required packages:
at MAGNIFICo/:
$ pip install -r requirements.txt
Download the spider database for evaluation. You can find it online here. Place the extracted database
folder inside MAGNIFICo/spider/
.
All the data we created can be found in MAGNIFICo/magnifico_data
The set of command line arguments available can be seen in the respective main.py
file. Here, we illustrate running the experiments for GPT-4 and LLaMA-2 for specific experimental settings. Follow the same methodology for running any experiment over any model.
Running GPT-4 for evaluating the 'plausible and nonsense form' settings with 'natural language descriptions' prompt type across all interpretations:
At MAGNIFICo:
$ python main.py -model_type chat -model gpt-4 -batch_size 1 -settings plausible,nonsense -prompt_types instr -instr_positions end -interpretations all
Set up and install HuggingFace's Text-Generation-Inference locally.
Open a server in one terminal window:
$ CUDA_VISIBLE_DEVICES=0,1 HUGGING_FACE_HUB_TOKEN=<hf_token> text-generation-launcher --model-id meta-llama/Llama-2-70b-hf --huggingface-hub-cache <cache_dir> --num-shard 2 --max-input-length 3500 --max-total-tokens 4096 --master-port 29500 --port 8080
Then at MAGNIFICo:
$ python main.py -model_type tgi -model llama-2-70b -batch_size 1 -combi
If you use our data or code, please cite our work:
@inproceedings{patel-etal-2023-magnifico,
title = "{MAGNIFIC}o: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations",
author = "Patel, Arkil and
Bhattamishra, Satwik and
Reddy, Siva and
Bahdanau, Dzmitry",
editor = "Bouamor, Houda and
Pino, Juan and
Bali, Kalika",
booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.emnlp-main.134",
doi = "10.18653/v1/2023.emnlp-main.134",
pages = "2167--2189",
abstract = "Humans possess a remarkable ability to assign novel interpretations to linguistic expressions, enabling them to learn new words and understand community-specific connotations. However, Large Language Models (LLMs) have a knowledge cutoff and are costly to finetune repeatedly. Therefore, it is crucial for LLMs to learn novel interpretations in-context. In this paper, we systematically analyse the ability of LLMs to acquire novel interpretations using in-context learning. To facilitate our study, we introduce MAGNIFICo, an evaluation suite implemented within a text-to-SQL semantic parsing framework that incorporates diverse tokens and prompt settings to simulate real-world complexity. Experimental results on MAGNIFICo demonstrate that LLMs exhibit a surprisingly robust capacity for comprehending novel interpretations from natural language descriptions as well as from discussions within long conversations. Nevertheless, our findings also highlight the need for further improvements, particularly when interpreting unfamiliar words or when composing multiple novel interpretations simultaneously in the same example. Additionally, our analysis uncovers the semantic predispositions in LLMs and reveals the impact of recency bias for information presented in long contexts.",
}
For any clarification, comments, or suggestions please contact Arkil.