Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define and collect main time series types #5

Open
florian-huber opened this issue Mar 2, 2021 · 4 comments
Open

Define and collect main time series types #5

florian-huber opened this issue Mar 2, 2021 · 4 comments

Comments

@florian-huber
Copy link
Member

florian-huber commented Mar 2, 2021

Times series are everywhere and they can include a lot of different things.
To better address this field and communicate our work, it is important to structure this a bit.

This is also to look at actual time series data and check what could be relevant. Possible resources are:

@florian-huber
Copy link
Member Author

florian-huber commented Mar 2, 2021

Possible categories to consider:

  • univariant (only one time-dependent variable) vs. multivariant (> 1 time-dependent variable)
  • multivariant could also be further divided into: same-type channels (e.g. EEG -> all channels are similar type of signals) vs different-type channels
  • absolute time (precise position in time matters) vs relative time (translational invariance, but potentially correlated across channels --> "same time" events or events with particular distance) vs time independent
  • absolute channel (important in which channel something happens) vs relative channel
  • local pattern (e.g. specific peak) vs global pattern (frequency, variance, trend etc.)
  • numerical vs categorical data

So far, that list above contains some redundancies:

  • different-type channels also implies absolute channel (but same_type channels could lead to both)

Maybe it is also good to decide that we focus on time series classification. And f we use such categories to assess a model regarding its performance for classifying time series, we could also think of other stuff, e.g.:

  • number of classes ?
  • number and/or dimension of samples ?

@florian-huber
Copy link
Member Author

florian-huber commented Mar 2, 2021

Here a first attempt to start a table for common data types

Data type Description Link to example data set multivariate / univariate absolute/relative time same-type/different-type absolute/relative channel local/global pattern
EEG data from electrodes placed on scalp ... multivariate can be both same-type absolute channel can be both
Wearable motion-sensor data accelerometer and gyroscope data ... multivariate can be both different-type absolute channel? can be both

@florian-huber
Copy link
Member Author

florian-huber commented Mar 2, 2021

Here a first attempt to start a table for specific example datasets

Dataset Description Link to dataset Citation multivariate / univariate time structure same-type/different-type absolute/relative channel local/global pattern
3W Dataset Various sensor data to detect rare undesirable real events in oil wells https://github.com/ricardovvargas/3w_dataset https://doi.org/10.1016/j.petrol.2019.106223 multivariate relative time different-type absolute channel local ?
Gas sensors for home activity monitoring MOX gas sensors, and a temperature and humidity sensor https://archive.ics.uci.edu/ml/datasets/Gas+sensors+for+home+activity+monitoring see link multivariate ? different-type absolute channel ?
EEG Steady-State Visual Evoked Potential EEG data https://archive.ics.uci.edu/ml/datasets/EEG+Steady-State+Visual+Evoked+Potential+Signals# see link multivariate ? same-type absolute channel ?
Human Activity Recognition from Continuous Ambient Sensor Data Various "smart home" sensors https://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+from+Continuous+Ambient+Sensor+Data see link multivariate ? different-type absolute channel ?
Air Quality Data Set Various sensor data https://archive.ics.uci.edu/ml/datasets/Air+Quality https://www.sciencedirect.com/science/article/abs/pii/S0925400507007691 multivariate ? different-type absolute channel ?

@jspaaks
Copy link

jspaaks commented Mar 12, 2021

Related to #5 (comment)

  • absolute time and relative time could probably be treated the same, by defining them as relative to an event external to the time series (e.g. the origin of the time axis, an event in another time series, an event internal to the time series, etc)
  • local pattern and global pattern are arbitrary, has more to do with how a process is sampled. Probably a more workable paradigm is to have users define the size of certain events with respect to time as well as with respect to what is on the vertical axis.. This would also take care of being able to deal with events of a certain duration.
  • numerical, categorical, etc: I believe this is referred to as 'scales' . Some other scales are ordinal, nominal, interval, ratio etc.
  • my feeling is that properties channels and n_ch should be kept separate of the signal|noise definitions. I'd prefer to
    • define a signal/model/deterministic component for example as "linear model with intercept 4 and slope -0.3", label this for examplesignal1
    • define a second signal/model/deterministic component for example as "linear model with intercept -30 and slope +3.4", label this for example signal2
    • define a stochastic signal for example as "time-independent gaussian noise with mean 4.56, std dev 2.3, kurtosis 0, skewness 0", label this for example noise1
    • with this interpretation of definitions, you could use names like random_walk, gaussian, etc, like we're doing now with signal_type. Each of these would need to be shorthand for an implementation somewhere (Python or elsewhere), and would take its function parameters from the corresponding yaml definition. This would mean that each of the clauses here
      def add_signal(X,
      signal_dict):
      """
      """
      n = X.shape[0]
      length = signal_dict['length']
      position = signal_dict['position'] + signal_dict['extra_shift']
      amp = signal_dict['amp'] * signal_dict['sign']
      signal_type = signal_dict['signal_type']
      if signal_type == 'gaussian':
      # Gaussian peak.
      # Center of gaussian will be placed at given position.
      x0 = position - int(length/2)
      x1 = x0 + length
      dx0 = -x0 * (x0 < 0)
      dx1 = (n - x1)*(x1 >= n)
      X[x0 + dx0: x1 + dx1] += amp * scipysig.gaussian(length, std=length/7)[dx0:(x1 - x0 + dx1)]
      elif signal_type == 'wave':
      # Two gaussian peaks in different directions ("wave").
      # Center will be placed at given position.
      x0 = position - int(length/2)
      x1 = x0 + length
      dx0 = -x0 * (x0 < 0)
      dx1 = (n - x1)*(x1 >= n)
      signal = np.zeros((length))
      signal[:int(0.7*length)] += amp * scipysig.gaussian(int(0.7*length), std=length/10)
      signal[-int(0.7*length):] -= amp * scipysig.gaussian(int(0.7*length), std=length/10)
      X[x0+dx0: x1+dx1] += signal[dx0:(x1-x0+dx1)]
      elif signal_type == 'exponential':
      # Sudden peak + exponential decay.
      # Peak will be placed at given position.
      x0 = position
      x1 = x0 + length
      dx1 = (n - x1)*(x1 >= n)
      X[x0: x1 + dx1] += amp * scipysig.exponential(length, 0, length/5, False)[:(x1 - x0 + dx1)]
      elif signal_type == 'peak_exponential':
      # Peak with two exponential flanks.
      # Center of eak will be placed at given position.
      x0 = position - int(length/2)
      x1 = x0 + length
      dx0 = -x0 * (x0 < 0)
      dx1 = (n - x1)*(x1 >= n)
      X[x0 + dx0: x1 + dx1] += amp * scipysig.exponential(length, tau=length/10)[dx0:(x1 - x0 + dx1)]
      elif signal_type == 'triangle':
      # Triangular peak.
      # Center of peak will be placed at given position.
      x0 = position - int(length/2)
      x1 = x0 + length
      dx0 = -x0 * (x0 < 0)
      dx1 = (n - x1)*(x1 >= n)
      X[x0 + dx0: x1 + dx1] += amp * scipysig.triang(length)[dx0:(x1 - x0 + dx1)]
      elif signal_type == 'box':
      # Box peak.
      # Center of peak will be placed at given position.
      x0 = position - int(length/2)
      x1 = x0 + length
      dx0 = -x0*(x0 < 0)
      dx1 = (n - x1)*(x1 >= n)
      X[x0 + dx0: x1 + dx1] += amp * np.ones((length))[dx0:(x1 - x0 + dx1)]
      else:
      print("Signal type not found.")
      X = None
      return X
      and here
      def add_noise(X: np.ndarray, noise_dict: dict):
      """Add noise to signal.
      Args:
      -------
      X:
      Array of timeseries to apply noise to (n_channels, n_timepoints).
      noise_dict:
      Dictionary containing noise information "noise_amp" and "noise_type".
      """
      n = X.shape[1]
      n_ch = X.shape[0]
      noise_amp = noise_dict['noise_amp']
      noise_type = noise_dict['noise_type']
      assert noise_type in ["gaussian", "random_walk"], "Unknown noise type."
      if noise_type == 'gaussian':
      noise = noise_amp * np.random.normal(0, 1, (n_ch, n))
      return X + noise
      if noise_type == 'random_walk':
      noise = noise_amp * np.random.normal(0, 1, (n_ch, n))
      noise = np.array([np.sum(noise[:, :i], axis=1) for i in range(0, noise.shape[1])]).T
      return X + noise
      return None
      would become an individual function whose parameters are passed by kwargs taken from the yaml.
    • then you could use another section of the yaml to state that there are going to be, say, 5 channels, and define which channel has which combination of signal and noise using the labels. We would need to find a way to do definition expansion eventually, something like what we now have with channels: [1, 2, 3]. Perhaps this section of the yaml could be named composition. Or just channels.
    • this will likely mean that the signal_def and noise_def can be merged into definitions. We could optionally introduce a key stochastic: bool for each definition if we need to differentiate between these 2 types of model, not sure yet.
    • I need to think more on how well this all fits multivariate problems whose constituent time series are not independent of each other
  • Do we have any vocabulary related to sampling, e.g. equally spaced, burst, exponential backoff, state-dependent etc.?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants