Scenario-Wise Rec, an open-sourced benchmark for Multi-Scenario/Multi-Domain Recommendation.
Dataset introduction
Dataset | Domain number | # Interaction | # User | # Item |
---|---|---|---|---|
MovieLens | Domain 0 | 210,747 | 1,325 | 3,429 |
Domain 1 | 395,556 | 2,096 | 3,508 | |
Domain 2 | 393,906 | 2,619 | 3,595 | |
KuaiRand | Domain 0 | 2,407,352 | 961 | 1,596,491 |
Domain 1 | 7,760,237 | 991 | 2,741,383 | |
Domain 2 | 895,385 | 171 | 332,210 | |
Domain 3 | 402,366 | 832 | 547,908 | |
Domain 4 | 183,403 | 832 | 43,106 | |
Ali-CCP | Domain 0 | 32,236,951 | 89,283 | 465,870 |
Domain 1 | 639,897 | 2,561 | 188,610 | |
Domain 2 | 52,439,671 | 150,471 | 467,122 | |
Amazon | Domain 0 | 198,502 | 22,363 | 12,101 |
Domain 1 | 278,677 | 39,387 | 23,033 | |
Domain 2 | 346,355 | 38,609 | 18,534 | |
Douban | Domain 0 | 227,251 | 2,212 | 95,872 |
Domain 1 | 179,847 | 1,820 | 79,878 | |
Domain 2 | 1,278,401 | 2,712 | 34,893 | |
Mind | Domain 0 | 26,057,579 | 737,687 | 8,086 |
Domain 1 | 11,206,494 | 678,268 | 1,797 | |
Domain 2 | 10,237,589 | 696,918 | 8,284 | |
Domain 3 | 9,226,382 | 656,970 | 1,804 |
Model introduction
Model | model_name | Link |
---|---|---|
Shared Bottom | SharedBottom | Link |
MMOE | MMOE | Link |
PLE | PLE | Link |
SAR-Net | sarnet | Link |
STAR | star | Link |
M2M | m2m | Link |
AdaSparse | adasparse | Link |
AdaptDHM | adaptdhm | Link |
EPNet | ppnet | Link |
PPNet | epnet | Link |
WARNING: Our package is still being modified and developed, so if there are usage problems, feel free to post issues.
First, clone the repo:
git clone https://github.com/Xiaopengli1/Scenario-Wise-Rec.git
Then,
cd Scenario-Wise-Rec
then use pip to install our packages:
pip install .
We provide running scripts for users. See /scripts
, and dataset samples are provided in /scripts/data
. You could directly test it by simply do:
python run_ali_ccp_ctr_ranking_multi_domain.py --model [model_name]
For Full-Dataset download and test, refer to the following steps.
Four Multi-Scenario/Multi-Domain Datasets are provided. See the following table.
Dataset | Domain Number | Users | Items | Interaction | Download |
---|---|---|---|---|---|
Movie-Lens | 3 | 6k | 4k | 1M | ML_Download |
KuaiRand | 5 | 1k | 4M | 11M | KR_Download |
Ali-CCP | 3 | 238k | 467k | 85M | AC_Download |
Amazon | 3 | 85k | 54k | 823k | TR_Download |
Douban | 3 | 2k | 210k | 1.7M | DB_Download |
Mind | 4 | 748k | 20k | 56M | MD_Download |
Substitute the full-dataset with sampled dataset.
python run_movielens_rank_multi_domain.py --dataset_path [path] --model_name [model_name] --device ["cpu"/"cuda:0"] --epoch [maximum epoch] --learning_rate [1e-3/1e-5] --batch_size [2048/4096] --seed [random seed]
We offer two template files run_example.py and base_example.py for a pipeline to help you to process different multi-scenario dataset and your own multi-scenario models.
See run_example.py.
During the function get_example_dataset(input_path)
to process your dataset. Be noted the feature
"domain_indicator"
is the feature to indicate domains. For other implementation details refer the file.
See base_example.py. Where you could build your own model here, where we left two spaces for users to implement scenario-shared and scenario-specific models. We also leave comments on how to format the output dimension. Please refer to the file to see more details.
We welcome any contribution that could help improve the benchmark, please fork the repo and create a pull request. You can also open an issue if you have any questions. Don't forget to give the project a star! Thanks again!
Our framework is referred to Torch-RecHub. Thanks to their contribution.