SkPM: a Scikit-learn Extension for Process Mining

Overview

SkPM is an open-source extension of the widely used Scikit-learn library, designed to meet the specific needs of Process Mining applications. It aims to provide a standard, reproducible, and easily accessible set of tools for PM research and practical applications.

Available examples

NEW ICPM/ML4PM 2024 Tutorial: A notebook highlighting all the available features in SkPM!
Predictive Monitoring: Build end-to-end applications of traditional process mining tasks, such as remaining time and next activity prediction!
Event Log Preprocessing: Several feature extraction and trace encoding techniques implemented!
Download Public Event Logs: Download well-known event logs (e.g., BPI Challenges) from the 4tu repository!
Unbiased Event Log Split: Temporal and unbiased split of event logs for train/validation.

Installation

Soon available on PyPI.

To install SkPM, you can clone the repository and install the required dependencies using pip:

git clone https://github.com/raseidi/skpm.git
cd skpm
pip install .

Usage

Below is an example of how to use SkPM to build a pipeline for remaining time prediction.

# skpm modules
from skpm.encoding import Aggregation
from skpm.event_feature_extraction import (
    TimestampExtractor,
    ResourcePoolExtractor,
)

# sklearn modules
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler

# Example pipeline for remaining time prediction
preprocessor = ColumnTransformer(
    transformers=[
        ('timestamp', TimestampExtractor(), 'timestamp_column'),
        ('activity', OneHotEncoder(), 'activity_column'),
        ('resource', ResourcePoolExtractor(), 'resource_column'),
    ]
)

pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('aggregator', TraceAggregator()),
    ('standardization', StandardScaler()),
    ('regressor', RandomForestRegressor())
])

# Fit the pipeline to your event log data
pipeline.fit(X_train, y_train)

# Make predictions on new cases
predictions = pipeline.predict(X_test)

Documentation

Detailed documentation and examples can be found here.

Roadmap, next steps, and help needed!

Improving documentation by including examples.
Implementing new applications and writing tutorials.
Adding new methods (feature extraction, trace encoding, and models).
Writing unit tests!

Contributing

We welcome contributions from the community!

Check the sklearn guidelines to understand the fit, predict, and transform APIs!

Check our guidelines as well to see how to open an issue or a PR. In summary:

Fork the repository.
Create a feature branch (git checkout -b feature-branch).
Commit your changes (git commit -m 'feat: add new feature').
Push to the branch (git push origin feature-branch).
Open a pull request.

License

This project was created by Rafael Oyamada and is licensed under the CC BY 4.0 License. Feel free to use, modify, and distribute the code with attribution.

Credits

skpm was created with cookiecutter and the py-pkgs-cookiecutter template.

Citation

@inproceedings{OyamadaTJC23,
  author       = {Rafael Seidi Oyamada and
                  Gabriel Marques Tavares and
                  Sylvio Barbon Junior and
                  Paolo Ceravolo},
  editor       = {Felix Mannhardt and
                  Nour Assy},
  title        = {A Scikit-learn Extension Dedicated to Process Mining Purposes},
  booktitle    = {Proceedings of the Demonstration Track co-located with the International
                  Conference on Cooperative Information Systems 2023, CoopIS 2023, Groningen,
                  The Netherlands, October 30 - November 3, 2023},
  series       = {{CEUR} Workshop Proceedings},
  publisher    = {CEUR-WS.org},
}

Name		Name	Last commit message	Last commit date
Latest commit History 154 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
src/skpm		src/skpm
tests		tests
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
CONDUCT.md		CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SkPM: a Scikit-learn Extension for Process Mining

Overview

Available examples

Installation

Usage

Documentation

Roadmap, next steps, and help needed!

Contributing

License

Credits

Citation

About

Releases

Packages

Contributors 2

Languages

License

raseidi/skpm

Folders and files

Latest commit

History

Repository files navigation

SkPM: a Scikit-learn Extension for Process Mining

Overview

Available examples

Installation

Usage

Documentation

Roadmap, next steps, and help needed!

Contributing

License

Credits

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages