SkPM is an open-source extension of the widely used Scikit-learn library, designed to meet the specific needs of Process Mining applications. It aims to provide a standard, reproducible, and easily accessible set of tools for PM research and practical applications.
- NEW ICPM/ML4PM 2024 Tutorial: A notebook highlighting all the available features in SkPM!
- Predictive Monitoring: Build end-to-end applications of traditional process mining tasks, such as remaining time and next activity prediction!
- Event Log Preprocessing: Several feature extraction and trace encoding techniques implemented!
- Download Public Event Logs: Download well-known event logs (e.g., BPI Challenges) from the 4tu repository!
- Unbiased Event Log Split: Temporal and unbiased split of event logs for train/validation.
Soon available on PyPI.
To install SkPM, you can clone the repository and install the required dependencies using pip
:
git clone https://github.com/raseidi/skpm.git
cd skpm
pip install .
Below is an example of how to use SkPM to build a pipeline for remaining time prediction.
# skpm modules
from skpm.encoding import Aggregation
from skpm.event_feature_extraction import (
TimestampExtractor,
ResourcePoolExtractor,
)
# sklearn modules
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
# Example pipeline for remaining time prediction
preprocessor = ColumnTransformer(
transformers=[
('timestamp', TimestampExtractor(), 'timestamp_column'),
('activity', OneHotEncoder(), 'activity_column'),
('resource', ResourcePoolExtractor(), 'resource_column'),
]
)
pipeline = Pipeline(steps=[
('preprocessor', preprocessor),
('aggregator', TraceAggregator()),
('standardization', StandardScaler()),
('regressor', RandomForestRegressor())
])
# Fit the pipeline to your event log data
pipeline.fit(X_train, y_train)
# Make predictions on new cases
predictions = pipeline.predict(X_test)
Detailed documentation and examples can be found here.
- Improving documentation by including examples.
- Implementing new applications and writing tutorials.
- Adding new methods (feature extraction, trace encoding, and models).
- Writing unit tests!
We welcome contributions from the community!
Check the sklearn guidelines to understand the fit
, predict
, and transform
APIs!
Check our guidelines as well to see how to open an issue or a PR. In summary:
- Fork the repository.
- Create a feature branch (
git checkout -b feature-branch
). - Commit your changes (
git commit -m 'feat: add new feature'
). - Push to the branch (
git push origin feature-branch
). - Open a pull request.
This project was created by Rafael Oyamada and is licensed under the CC BY 4.0 License. Feel free to use, modify, and distribute the code with attribution.
skpm
was created with cookiecutter
and the py-pkgs-cookiecutter
template.
@inproceedings{OyamadaTJC23,
author = {Rafael Seidi Oyamada and
Gabriel Marques Tavares and
Sylvio Barbon Junior and
Paolo Ceravolo},
editor = {Felix Mannhardt and
Nour Assy},
title = {A Scikit-learn Extension Dedicated to Process Mining Purposes},
booktitle = {Proceedings of the Demonstration Track co-located with the International
Conference on Cooperative Information Systems 2023, CoopIS 2023, Groningen,
The Netherlands, October 30 - November 3, 2023},
series = {{CEUR} Workshop Proceedings},
publisher = {CEUR-WS.org},
}