This repository contains the official implementation of our MoP, accepted by ICML 2024.
[ICML 2024] One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts
Authors: Ruochen Wang*, Sohyun An*, Minhao Cheng, Tianyi Zhou, Sung Ju Hwang, Cho-Jui Hsieh
Presented by TurningPoint AI - UCLA - UMD - PSU - KAIST
Mixture-of-Prompts (MoP) is the first prompt optimization framework that extends the Mixture-of-Expert paradigm to Prompt Optimization. Unlike traditional methods that optimize a single prompt, MoP optimizes a mixture of prompts (experts) with different specialties, each tailored to only handle queries belong to a partition of the entire problem space. In addition, MoP also jointly optimizes the instruction and In-Context demos field of the prompt for each expert. Designed as a plug-and-play solution, MoP is compatible with various prompt generation algorithms, including APE, InstructZero, etc. Our extensive empirical study on 55 tasks across three major benchmarks, one of the largest of its kind, demonstrates that MoP achieves an average win rate of 81% over comparable baselines. Dividing the problem space into sub-regions also benefits the performance on Out-Of-Distribution data.
1. Set environment
All commands are stored in scripts/install_env.sh
. If you wish to run InstructZero (iz), run the following installation script as well scripts/install_env_iz.sh
.
2. OpenAI account
Our experiments are conducted using gpt-3.5-turbo-instruct
model from OpenAI. To use OpenAI API services, you can set the OpenAI Key in scripts/run.sh
file.
3. Download benchmarks
Our experiments are conducted on Instruction Induction Benchmark, SuperNature Instruction Benchmark and Big-Bench Hard. Your can download our preprocessed data files HERE, and place them at mop/benchmarks/
(for example mop/benchmarks/superni
)
Run the following script to verify your setups
bash run-mop-fast.sh
bash run-ape-kcen-fast.sh
bash run-iz-kcen-fast.sh
bash run.sh --method mop --benchmark ii --datasets [0-24]
Below are the key fields inside run.sh
:
--method
specifies the prompt optimization algorithm.--benchmark
is the evaluation benchmark, includingii
(Instruction Induction*),sni
(SuperNatural Instructions), and ``--datasets
is the task_id. You can run on multiple datasets sequentially using comma-separated format (e.g. 1,2,3,4). These benchmarks comprises of hundreds of tasks and it is expensive to evaluate on all of them. We filter around 10-20 tasks from each benchmark, focusing on the challenging ones such as math/coding problems, and those cannot be resolved with a simple prompt (More details in the Appendix).--n_experts
is the maximum number of experts. The actual number of expert will be decided by the Kauto algorithm during demo assignment phase.
bash run.sh --method mop --benchmark sni --datasets 0-19
bash run.sh --method mop --benchmark bbh --datasets 0-9
To run the baselines, simply change the --method
field of the above commands
bash run.sh --method ape
bash run.sh --method ape-random
bash run.sh --method ape-kcen
bash run.sh --method iz
bash run.sh --method iz-random
bash run.sh --method iz-kcen
@inproceedings{wang2024one,
title={One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts},
author={Wang, Ruochen and An, Sohyun and Cheng, Minhao and Zhou, Tianyi and Hwang, Sung Ju and Hsieh, Cho-Jui},
booktitle={International Conference on Machine Learning},
year={2024}
}