Question around VECTORIZE BY #79

Benoss · 2020-04-22T20:26:43Z

First, I think your project is brilliant, I am using ClickHouse for a few project and your idea fits really well with what I wanted to do!

My question is vectorDelta, is there a way to get the n-x value of column? maybe a function vectorData(metric, -5) would return the n-x value of metric. Then vectorDelta could just be calculated

select
        key,
        timestamp,
        metric,
        vectorData(metric, -5) as metric_delta
    from default.vector_example
    vectorize by (key, timestamp)

Also maybe vectorFind(metric, timestamp=yesterday) would return the data of metric for a given timestamp in the vector?

Real world problems I am trying to solve.
Compare a value with last week, with the average of last week, with the average of the last 3 days etc etc

jason-godden · 2020-04-25T06:27:24Z

Hey @Benoss ,

Thanks for your patience here - has been a very busy last few weeks with COVID.

I think a vectorFind or Filter method sounds great. Perhaps the ClickHouse combiner concept would be the right way to go?

vectorWindowAggIf(
   avg, metric,
   toStartOfDay(timestamp) = toStartOfDay(now() - interval 1 day))

There are vectorLag and vectorLead functions that should perform like the proposed vectorData but perhaps I should put a new function in that can move forward or back that aligns with the ClickHouse neighbor terminology - vectorNeighbor? Let me know what you think.

I had a play with the problem you noted - I don't think this will really do what you want but it sort of achieves the previous week comparison, with the average and the last 3 days etc..

# Generate some data
key = (ArrayVector
    .range(1,1).cast('String').map('concat', String('device-')).take(24*60*60)
    .cache().alias('key'))
timestamp = (ArrayVector
    .rand(DateTime('2019-01-07 00:00:00'), DateTime('2019-03-01 23:59:59'), 24*60*60)
    .cache().alias('timestamp'))
metric = ArrayVector.rand(1, 8192, 24*60*60).cache().alias('metric')

data = v.table.fromVector('default.vector_example', 
    (key, timestamp, metric), engines.Memory(), replace=True)

# Aggregate values to a date and return a single key/series for examples

dt = v.q("""
    SELECT 
        key, 
        toDate(toStartOfDay(timestamp)) AS t_d, 
        sum(metric) AS metric
    FROM default.vector_example
    WHERE t_d >= '2019-01-07'
    GROUP BY key, t_d
    ORDER BY key, t_d
    LIMIT 1 BY key, t_d
    LIMIT 100
""").cache()

# For the relative date - grab the value 1 week ago, calculate the moving average 
# for the whole of the previous week, calculate the moving average for the last 3 
# days.

# Next compare the relative difference to the current value and the calculated 
# moving stats. Ofcourse can also compare realtive to future stats as well 
# depending on the WindowAgg parameters.

(dt
    .vectorizeBy('key', 't_d')
    .select(
        'key', 't_d', 'metric',
        funcs.vector.vectorLag(c('metric'), -7).alias('lag_1week'),
        funcs.vector.vectorWindowAgg(c('avg'), c('metric'), -7, -1).alias('mv_avg_prev_week'),
        funcs.vector.vectorWindowAgg(c('avg'), c('metric'), -3, -1).alias('mv_avg_last_3days'))
    .select(
        'key', 't_d', c('metric').alias('current_value'),
        c('metric/lag_1week').alias('change_vs_relative_day_1week'),
        c('metric/mv_avg_prev_week').alias('change_vs_avg_1week'),
        c('metric/mv_avg_last_3days').alias('change_vs_avg_last3days')
    )
).s

And thanks for the feedback on the project. This is still in a POC phase (need tests and fixes to brittle code). The actual frontend SQL introspection and rewriting itself will be moving to a dedicated high performance backend server in C with Python interlock that will sit with ClickHouse so the same SQL queries/extensions can be orchestrated by the backend for ODBC and other clients too. Have a full roadmap of ideas here...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question around VECTORIZE BY #79

Question around VECTORIZE BY #79

Benoss commented Apr 22, 2020

jason-godden commented Apr 25, 2020

Question around VECTORIZE BY #79

Question around VECTORIZE BY #79

Comments

Benoss commented Apr 22, 2020

jason-godden commented Apr 25, 2020