Self-supervised learning for Event Sequences on synthetic task of next item prediction (GPT-approach based module)

This repository contains a code implementation of the final project Self-supervised learning for Event Sequences on synthetic task of next item prediction (GPT-approach based module) for Machine Learning 2023 course.

Project team:

Egor Fadeev
Alexander Ganibaev
Matvey Lukyanov
Aleksandr Yugay

Idea:

Self-supervised learning is a powerful technique for leveraging large amounts of unlabeled data to improve the performance of machine learning models, particularly in domains where labeled data is scarce or expensive to obtain. In this project, we focus on self-supervised learning applied to event sequences, specifically transaction data, and explore the use of two different pre-training approaches for obtaining embeddings: classical representations and contrastive representations. We demonstrate that both embedding models are viable for downstream tasks, specifically in predicting the next merchant category code (MCC) of a transaction. Our experiments show that the pre-trained contrastive embeddings perform better on less stable data, while the pre-trained representation embeddings suit better for homogeneous transaction data. These findings can help guide the selection of pre-training approaches for transactional data, and our work opens up opportunities for further exploration of self-supervised learning in other domains.

Repo description

sber_experiments.ipynb — running experiments on Sber datasets

rosbank_experiments.ipynb — running experiments on Rosbank datasets

sber_notebook.ipynb — visualizations of Sber dataset
rosbank_notebook.ipynb — visualizations of Rosbank dataset
models.py — model pipeline, including its layers, architecture and etc
dataset.py — data preprocessing for model usage

Prerequisites

git clone https://github.com/Matteus1904/GPT-like_approach_for_event_sequences
pip install -r requirements.txt
mkdir -p data
curl -OL https://storage.yandexcloud.net/di-datasets/age-prediction-nti-sbebank-2019.zip
unzip -j -o age-prediction-nti-sbebank-2019.zip 'data/*.csv' -d data/sberbank
mv age-prediction-nti-sbebank-2019.zip data/
mkdir -p data/rosbank
curl -OL https://storage.yandexcloud.net/di-datasets/rosbank-ml-contest-boosters.pro.zip
unzip -j -o rosbank-ml-contest-boosters.pro.zip '*.csv' -d data/rosbank
mv rosbank-ml-contest-boosters.pro.zip data/rosbank/

Run experiments

To run the default experiments, use the following command:

python main.py

It will save the results into results/stats_contr_{dataset}_config_{config_name}.csv.

You can run the model with your own hyperparameters — to do that, you can either change the corresponding values in an existing config in config.ini, or create your own config in config.ini using the same data format and add it after the existing ones.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
eda		eda
pics		pics
report_and_presentation		report_and_presentation
results		results
.gitignore		.gitignore
README.md		README.md
configs.ini		configs.ini
dataset.py		dataset.py
main.py		main.py
models.py		models.py
requirements.txt		requirements.txt
rosbank_experiments.ipynb		rosbank_experiments.ipynb
sber_experiments.ipynb		sber_experiments.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-supervised learning for Event Sequences on synthetic task of next item prediction (GPT-approach based module)

Idea:

Repo description

Prerequisites

Run experiments

About

Releases

Packages

Contributors 4

Languages

Matteus1904/GPT-like_approach_for_event_sequences

Folders and files

Latest commit

History

Repository files navigation

Self-supervised learning for Event Sequences on synthetic task of next item prediction (GPT-approach based module)

Idea:

Repo description

Prerequisites

Run experiments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages