How Sensitive is Recommendation Systems' Offline Evaluation to Popularity

This is the implementation of the following paper:

@InProceedings{recsys_eval19,
  author    = {Amir H. Jadidinejad and Craig Macdonald and Iadh Ounis},
  title     = {How Sensitive is Recommendation Systems' Offline Evaluation to Popularity?},
  booktitle = {In Workshop on Offline Evaluation for Recommender Systems (REVEAL2019) at the 13th ACM Conference on Recommender Systems.},
  year      = {2019},
}

Requirements

pytorch (1.0.1)
spotlight (0.1.5)
pytrec-eval (0.3)

Results

The following plot summarizes the results of popularity-stratified sampling:

By setting P threshold to maximum, evaluation of models is corresponding to the offline recommendation system's evaluation:

See the paper or our poster for more details.

How to reproduce?

Use the corresponding Jupyter notebook to reproduce the results of each dataset (MovieLens, Amazon) for a specific popularity threshold P:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

How Sensitive is Recommendation Systems' Offline Evaluation to Popularity

Requirements

Results

How to reproduce?

Files

README.md

Latest commit

History

README.md

File metadata and controls

How Sensitive is Recommendation Systems' Offline Evaluation to Popularity

Requirements

Results

How to reproduce?