Define and collect main time series types #5

florian-huber · 2021-03-02T09:36:15Z

Times series are everywhere and they can include a lot of different things.
To better address this field and communicate our work, it is important to structure this a bit.

This is also to look at actual time series data and check what could be relevant. Possible resources are:

florian-huber · 2021-03-02T09:44:29Z

Possible categories to consider:

univariant (only one time-dependent variable) vs. multivariant (> 1 time-dependent variable)
multivariant could also be further divided into: same-type channels (e.g. EEG -> all channels are similar type of signals) vs different-type channels
absolute time (precise position in time matters) vs relative time (translational invariance, but potentially correlated across channels --> "same time" events or events with particular distance) vs time independent
absolute channel (important in which channel something happens) vs relative channel
local pattern (e.g. specific peak) vs global pattern (frequency, variance, trend etc.)
numerical vs categorical data

So far, that list above contains some redundancies:

different-type channels also implies absolute channel (but same_type channels could lead to both)

Maybe it is also good to decide that we focus on time series classification. And f we use such categories to assess a model regarding its performance for classifying time series, we could also think of other stuff, e.g.:

number of classes ?
number and/or dimension of samples ?

florian-huber · 2021-03-02T09:57:33Z

Here a first attempt to start a table for common data types

Data type	Description	Link to example data set	multivariate / univariate	absolute/relative time	same-type/different-type	absolute/relative channel	local/global pattern
EEG	data from electrodes placed on scalp	...	multivariate	can be both	same-type	absolute channel	can be both
Wearable motion-sensor data	accelerometer and gyroscope data	...	multivariate	can be both	different-type	absolute channel?	can be both

florian-huber · 2021-03-02T10:05:53Z

Here a first attempt to start a table for specific example datasets

Dataset	Description	Link to dataset	Citation	multivariate / univariate	time structure	same-type/different-type	absolute/relative channel	local/global pattern
3W Dataset	Various sensor data to detect rare undesirable real events in oil wells	https://github.com/ricardovvargas/3w_dataset	https://doi.org/10.1016/j.petrol.2019.106223	multivariate	relative time	different-type	absolute channel	local ?
Gas sensors for home activity monitoring	MOX gas sensors, and a temperature and humidity sensor	https://archive.ics.uci.edu/ml/datasets/Gas+sensors+for+home+activity+monitoring	see link	multivariate	?	different-type	absolute channel	?
EEG Steady-State Visual Evoked Potential	EEG data	https://archive.ics.uci.edu/ml/datasets/EEG+Steady-State+Visual+Evoked+Potential+Signals#	see link	multivariate	?	same-type	absolute channel	?
Human Activity Recognition from Continuous Ambient Sensor Data	Various "smart home" sensors	https://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+from+Continuous+Ambient+Sensor+Data	see link	multivariate	?	different-type	absolute channel	?
Air Quality Data Set	Various sensor data	https://archive.ics.uci.edu/ml/datasets/Air+Quality	https://www.sciencedirect.com/science/article/abs/pii/S0925400507007691	multivariate	?	different-type	absolute channel	?

jspaaks · 2021-03-12T14:28:22Z

Related to #5 (comment)

absolute time and relative time could probably be treated the same, by defining them as relative to an event external to the time series (e.g. the origin of the time axis, an event in another time series, an event internal to the time series, etc)
local pattern and global pattern are arbitrary, has more to do with how a process is sampled. Probably a more workable paradigm is to have users define the size of certain events with respect to time as well as with respect to what is on the vertical axis.. This would also take care of being able to deal with events of a certain duration.
numerical, categorical, etc: I believe this is referred to as 'scales' . Some other scales are ordinal, nominal, interval, ratio etc.

my feeling is that properties channels and n_ch should be kept separate of the signal|noise definitions. I'd prefer to

define a signal/model/deterministic component for example as "linear model with intercept 4 and slope -0.3", label this for examplesignal1
define a second signal/model/deterministic component for example as "linear model with intercept -30 and slope +3.4", label this for example signal2
define a stochastic signal for example as "time-independent gaussian noise with mean 4.56, std dev 2.3, kurtosis 0, skewness 0", label this for example noise1

with this interpretation of definitions, you could use names like random_walk, gaussian, etc, like we're doing now with signal_type. Each of these would need to be shorthand for an implementation somewhere (Python or elsewhere), and would take its function parameters from the corresponding yaml definition. This would mean that each of the clauses here

time_series_generator/ts_generator/TS_generator.py

Lines 268 to 332 in 6392320

    
           def add_signal(X, 
        
                          signal_dict): 
        
               """ 
        
               """ 
        
               n = X.shape[0] 
        
               length = signal_dict['length'] 
        
               position = signal_dict['position'] + signal_dict['extra_shift'] 
        
               amp = signal_dict['amp'] * signal_dict['sign'] 
        
               signal_type = signal_dict['signal_type'] 
        
               if signal_type == 'gaussian': 
        
                   # Gaussian peak. 
        
                   # Center of gaussian will be placed at given position. 
        
                   x0 = position - int(length/2) 
        
                   x1 = x0 + length 
        
                   dx0 = -x0 * (x0 < 0) 
        
                   dx1 = (n - x1)*(x1 >= n) 
        
                   X[x0 + dx0: x1 + dx1] += amp * scipysig.gaussian(length, std=length/7)[dx0:(x1 - x0 + dx1)] 
        
               elif signal_type == 'wave': 
        
                   # Two gaussian peaks in different directions ("wave"). 
        
                   # Center will be placed at given position. 
        
                   x0 = position - int(length/2) 
        
                   x1 = x0 + length 
        
                   dx0 = -x0 * (x0 < 0) 
        
                   dx1 = (n - x1)*(x1 >= n) 
        
                   signal = np.zeros((length)) 
        
                   signal[:int(0.7*length)] += amp * scipysig.gaussian(int(0.7*length), std=length/10) 
        
                   signal[-int(0.7*length):] -= amp * scipysig.gaussian(int(0.7*length), std=length/10) 
        
                   X[x0+dx0: x1+dx1] += signal[dx0:(x1-x0+dx1)] 
        
               elif signal_type == 'exponential': 
        
                   # Sudden peak + exponential decay. 
        
                   # Peak will be placed at given position. 
        
                   x0 = position 
        
                   x1 = x0 + length 
        
                   dx1 = (n - x1)*(x1 >= n) 
        
                   X[x0: x1 + dx1] += amp * scipysig.exponential(length, 0, length/5, False)[:(x1 - x0 + dx1)] 
        
               elif signal_type == 'peak_exponential': 
        
                   # Peak with two exponential flanks. 
        
                   # Center of eak will be placed at given position. 
        
                   x0 = position - int(length/2) 
        
                   x1 = x0 + length 
        
                   dx0 = -x0 * (x0 < 0) 
        
                   dx1 = (n - x1)*(x1 >= n) 
        
                   X[x0 + dx0: x1 + dx1] += amp * scipysig.exponential(length, tau=length/10)[dx0:(x1 - x0 + dx1)] 
        
               elif signal_type == 'triangle': 
        
                   # Triangular peak. 
        
                   # Center of peak will be placed at given position. 
        
                   x0 = position - int(length/2) 
        
                   x1 = x0 + length 
        
                   dx0 = -x0 * (x0 < 0) 
        
                   dx1 = (n - x1)*(x1 >= n) 
        
                   X[x0 + dx0: x1 + dx1] += amp * scipysig.triang(length)[dx0:(x1 - x0 + dx1)] 
        
               elif signal_type == 'box': 
        
                   # Box peak. 
        
                   # Center of peak will be placed at given position. 
        
                   x0 = position - int(length/2) 
        
                   x1 = x0 + length 
        
                   dx0 = -x0*(x0 < 0) 
        
                   dx1 = (n - x1)*(x1 >= n) 
        
                   X[x0 + dx0: x1 + dx1] += amp * np.ones((length))[dx0:(x1 - x0 + dx1)] 
        
               else: 
        
                   print("Signal type not found.") 
        
                   X = None 
        
               return X

and here

time_series_generator/ts_generator/TS_generator.py

Lines 335 to 362 in 6392320

    
           def add_noise(X: np.ndarray, noise_dict: dict): 
        
               """Add noise to signal. 
        
               Args: 
        
               ------- 
        
               X: 
        
                   Array of timeseries to apply noise to (n_channels, n_timepoints). 
        
               noise_dict: 
        
                   Dictionary containing noise information "noise_amp" and "noise_type". 
        
               """ 
        
               n = X.shape[1] 
        
               n_ch = X.shape[0] 
        
               noise_amp = noise_dict['noise_amp'] 
        
               noise_type = noise_dict['noise_type'] 
        
               assert noise_type in ["gaussian", "random_walk"], "Unknown noise type." 
        
               if noise_type == 'gaussian': 
        
                   noise = noise_amp * np.random.normal(0, 1, (n_ch, n)) 
        
                   return X + noise 
        
               if noise_type == 'random_walk': 
        
                   noise = noise_amp * np.random.normal(0, 1, (n_ch, n)) 
        
                   noise = np.array([np.sum(noise[:, :i], axis=1) for i in range(0, noise.shape[1])]).T 
        
                   return X + noise 
        
               return None

would become an individual function whose parameters are passed by kwargs taken from the yaml.

then you could use another section of the yaml to state that there are going to be, say, 5 channels, and define which channel has which combination of signal and noise using the labels. We would need to find a way to do definition expansion eventually, something like what we now have with channels: [1, 2, 3]. Perhaps this section of the yaml could be named composition. Or just channels.
this will likely mean that the signal_def and noise_def can be merged into definitions. We could optionally introduce a key stochastic: bool for each definition if we need to differentiate between these 2 types of model, not sure yet.
I need to think more on how well this all fits multivariate problems whose constituent time series are not independent of each other

Do we have any vocabulary related to sampling, e.g. equally spaced, burst, exponential backoff, state-dependent etc.?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define and collect main time series types #5

Define and collect main time series types #5

florian-huber commented Mar 2, 2021 •

edited

Loading

florian-huber commented Mar 2, 2021 •

edited

Loading

florian-huber commented Mar 2, 2021 •

edited

Loading

florian-huber commented Mar 2, 2021 •

edited

Loading

jspaaks commented Mar 12, 2021 •

edited

Loading

Define and collect main time series types #5

Define and collect main time series types #5

Comments

florian-huber commented Mar 2, 2021 • edited Loading

florian-huber commented Mar 2, 2021 • edited Loading

florian-huber commented Mar 2, 2021 • edited Loading

Here a first attempt to start a table for common data types

florian-huber commented Mar 2, 2021 • edited Loading

Here a first attempt to start a table for specific example datasets

jspaaks commented Mar 12, 2021 • edited Loading

florian-huber commented Mar 2, 2021 •

edited

Loading

florian-huber commented Mar 2, 2021 •

edited

Loading

florian-huber commented Mar 2, 2021 •

edited

Loading

florian-huber commented Mar 2, 2021 •

edited

Loading

jspaaks commented Mar 12, 2021 •

edited

Loading