-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question around VECTORIZE BY #79
Comments
Hey @Benoss , Thanks for your patience here - has been a very busy last few weeks with COVID. I think a vectorFind or Filter method sounds great. Perhaps the ClickHouse combiner concept would be the right way to go? vectorWindowAggIf(
avg, metric,
toStartOfDay(timestamp) = toStartOfDay(now() - interval 1 day)) There are vectorLag and vectorLead functions that should perform like the proposed vectorData but perhaps I should put a new function in that can move forward or back that aligns with the ClickHouse neighbor terminology - vectorNeighbor? Let me know what you think. I had a play with the problem you noted - I don't think this will really do what you want but it sort of achieves the previous week comparison, with the average and the last 3 days etc.. # Generate some data
key = (ArrayVector
.range(1,1).cast('String').map('concat', String('device-')).take(24*60*60)
.cache().alias('key'))
timestamp = (ArrayVector
.rand(DateTime('2019-01-07 00:00:00'), DateTime('2019-03-01 23:59:59'), 24*60*60)
.cache().alias('timestamp'))
metric = ArrayVector.rand(1, 8192, 24*60*60).cache().alias('metric')
data = v.table.fromVector('default.vector_example',
(key, timestamp, metric), engines.Memory(), replace=True)
# Aggregate values to a date and return a single key/series for examples
dt = v.q("""
SELECT
key,
toDate(toStartOfDay(timestamp)) AS t_d,
sum(metric) AS metric
FROM default.vector_example
WHERE t_d >= '2019-01-07'
GROUP BY key, t_d
ORDER BY key, t_d
LIMIT 1 BY key, t_d
LIMIT 100
""").cache()
# For the relative date - grab the value 1 week ago, calculate the moving average
# for the whole of the previous week, calculate the moving average for the last 3
# days.
# Next compare the relative difference to the current value and the calculated
# moving stats. Ofcourse can also compare realtive to future stats as well
# depending on the WindowAgg parameters.
(dt
.vectorizeBy('key', 't_d')
.select(
'key', 't_d', 'metric',
funcs.vector.vectorLag(c('metric'), -7).alias('lag_1week'),
funcs.vector.vectorWindowAgg(c('avg'), c('metric'), -7, -1).alias('mv_avg_prev_week'),
funcs.vector.vectorWindowAgg(c('avg'), c('metric'), -3, -1).alias('mv_avg_last_3days'))
.select(
'key', 't_d', c('metric').alias('current_value'),
c('metric/lag_1week').alias('change_vs_relative_day_1week'),
c('metric/mv_avg_prev_week').alias('change_vs_avg_1week'),
c('metric/mv_avg_last_3days').alias('change_vs_avg_last3days')
)
).s And thanks for the feedback on the project. This is still in a POC phase (need tests and fixes to brittle code). The actual frontend SQL introspection and rewriting itself will be moving to a dedicated high performance backend server in C with Python interlock that will sit with ClickHouse so the same SQL queries/extensions can be orchestrated by the backend for ODBC and other clients too. Have a full roadmap of ideas here... |
First, I think your project is brilliant, I am using ClickHouse for a few project and your idea fits really well with what I wanted to do!
My question is vectorDelta, is there a way to get the n-x value of column? maybe a function vectorData(metric, -5) would return the n-x value of metric. Then vectorDelta could just be calculated
Also maybe vectorFind(metric, timestamp=yesterday) would return the data of metric for a given timestamp in the vector?
Real world problems I am trying to solve.
Compare a value with last week, with the average of last week, with the average of the last 3 days etc etc
The text was updated successfully, but these errors were encountered: