Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Basic anomaly detector #393

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

ENH: Basic anomaly detector #393

wants to merge 5 commits into from

Conversation

tibkiss
Copy link
Contributor

@tibkiss tibkiss commented Nov 18, 2020

Basic anomaly detector with Z-Score and Fixed Pct based detection for the
data columns in Marketstore. Can be used to spot price or volume outliers
in the ingested data.

@tibkiss tibkiss requested review from dakimura and a team November 18, 2020 08:14
pctChange := make([]float64, size-1)

// pctChange = (a - b)/a
// floats.SubTo(pctChange, columnData[1:], columnData[:size-1])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: remove

uda/anomaly/anomaly.go Outdated Show resolved Hide resolved
if _, ok := a.AnomalyIdxsByColumn[epoch]; ok {
previousValue = a.AnomalyIdxsByColumn[epoch]
}
a.AnomalyIdxsByColumn[epoch] = previousValue | 1<<columnNr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If columnNr is big enough (which seems very unlikely btw) then an overflow might occur.

uda/anomaly/anomaly.go Show resolved Hide resolved
uda/anomaly/anomaly.go Outdated Show resolved Hide resolved
tests/integ/tests/test_anomaly_detector.py Outdated Show resolved Hide resolved
import pymarketstore as pymkts

# Constants
DATA_TYPE_TICK = [('Epoch', 'i8'), ('Bid', 'f4'), ('Ask', 'f4'), ('Nanoseconds', 'i4')]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i4 is the right type for Nanoseconds, but later in the tests, all nanosec data are given as floats, so numpy trucates all of them to 0. I'm don't think it's causing much trouble, but better be on the safe side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, we need to address that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants