Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Penchants and leanings #153

Draft
wants to merge 30 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
c10ed32
Draft of Leanings module
kahaaga Nov 30, 2021
1711df7
Leanings, with tests and docs
kahaaga Dec 1, 2021
677a676
Correct function name
kahaaga Dec 1, 2021
208e4a5
Use testset
kahaaga Dec 1, 2021
a1c620b
Remove example arrays
kahaaga Dec 1, 2021
1b0cd8d
Add return statement
kahaaga Dec 1, 2021
0310e53
Add data requirement to docstring
kahaaga Dec 1, 2021
92b561e
`weighted` is a keyword
kahaaga Dec 1, 2021
944816a
Update test
kahaaga Dec 1, 2021
ab7447c
Simplify and shorten function names
kahaaga Dec 1, 2021
1b8d401
Docstring fix
kahaaga Dec 1, 2021
28920fc
Change function names in docs
kahaaga Dec 1, 2021
7c5d6cc
Update version
kahaaga Dec 1, 2021
c790973
Exclude failing tests for now (they are not relevant for this PR)
kahaaga Dec 1, 2021
78ea28f
Rename variables for clarity
kahaaga Dec 1, 2021
16c06d6
Documentation improvements
kahaaga Dec 1, 2021
7b74daf
Specify possible ranges for penchants and leanings
kahaaga Dec 1, 2021
d916cd7
Add test for lean
kahaaga Dec 1, 2021
bfec698
Definition of causal penchant
kahaaga Dec 1, 2021
2d19990
Rephrase docs
kahaaga Dec 1, 2021
4c9fbeb
All tests involving Dataset construction fails, reintroduce the file
kahaaga Dec 1, 2021
096faba
Trigger new CI run
kahaaga Dec 1, 2021
f7c3591
Require DelayEmbeddings 2.0.2 (breaks on Julia 1.7 otherwise)
kahaaga Dec 1, 2021
e21791f
Change heading in menu
kahaaga Dec 1, 2021
2d29f3c
Latex typo
kahaaga Dec 1, 2021
9bdb54b
Add penchants and leanings to overview
kahaaga Dec 1, 2021
235957a
`weighted` should be true by default for both `lean` and `penchant``
kahaaga Dec 1, 2021
9042531
example
kahaaga Dec 1, 2021
ce542de
Add example system
kahaaga Dec 6, 2021
c8607c8
Merge branch 'master' into leanings
kahaaga May 21, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Wavelets = "29a6e085-ba6d-5f35-a997-948ac2efa89a"

[compat]
ChaosTools = "^2.0"
DelayEmbeddings = "^2.0"
DelayEmbeddings = "^2.0.2"
Distances = "^0.10"
Distributions = "^0.24, 0.25"
DynamicalSystemsBase = "^2.0"
Expand All @@ -41,7 +41,6 @@ Neighborhood = "0.2.3"
Reexport = "0.2, 1"
Requires = "^1"
SimpleDiffEq = "^1"
StaticArrays = "^1"
StatsBase = "^0.33"
TimeseriesSurrogates = "^1.2, 2"
TransferEntropy = "^1.7"
Expand Down
3 changes: 3 additions & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,9 @@ PAGES = [
"generalized_entropy.md",
"info_estimators.md",
],
"Exploratory" => [
"leanings.md"
],

"example_systems.md",
"Utilities" => [
Expand Down
4 changes: 4 additions & 0 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,10 @@ We also provide estimators for generalized entropy and mutual information, thoug
- Generalized entropy ([`genentropy`](@ref)) and fast histograms ([`binhist`](@ref))
- Mutual information ([`mutualinfo`](@ref))

### Exploratory methods

- [Penchants and leanings](@ref leanings).

## Contributions/issues

Do you want additional methods or example systems to be implemented? Make a PR to the
Expand Down
10 changes: 10 additions & 0 deletions docs/src/leanings.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# [Penchants and leanings](@id leanings)

Penchants and leanings are probability-based exploratory causal inference measures that rely on the estimation of probabilities directly from the data. Here, we provide implementations of the *penchants and leanings*
algoriths from McCracken and Weigel (2016).

```@docs
penchant
lean
```

1 change: 1 addition & 0 deletions src/CausalityTools.jl
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ module CausalityTools
include("CrossMappings/CrossMappings.jl")
include("SMeasure/smeasure.jl")
include("PredictiveAsymmetry/PredictiveAsymmetry.jl")
include("Leanings/Leanings.jl")
include("example_systems/ExampleSystems.jl")

using Requires
Expand Down
5 changes: 5 additions & 0 deletions src/Leanings/Leanings.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
using Reexport

@reexport module Leanings
include("penchants.jl")
end
57 changes: 57 additions & 0 deletions src/Leanings/example.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
using CausalityTools, DynamicalSystems, Plots, Statistics, Distributions; gr()

npts, Ttr = 10000, 5000
x, y = columns(trajectory(henon2(c_xy = 0.0), npts - 1, Ttr = Ttr))
xr, yr = rand(Uniform(-1, 1), npts), rand(Uniform(-1, 1), npts)
plot()
plot!(x, label = "x")
plot!(y, label = "y")

import Entropies: symbolize, SymbolicPermutation

m = 2; τ = -1
n = 5
weighted = false
nreps = 50
ls = 0:10
leans_wn = zeros(nreps, length(ls))
leans_henon_c0 = zeros(nreps, length(ls))
leans_henon_c2 = zeros(nreps, length(ls))

function partition(x, n::Int)
xmin = minimum(x) # the reference point
Δx = (maximum(x) - xmin) / n
[floor(Int, (xᵢ - xmin) / Δx) for xᵢ in x]
end

for i = 1:nreps
xr, yr = rand(Uniform(-1, 1), npts), rand(Uniform(-1, 1), npts)
X = partition(xr, n)#symbolize(xr, SymbolicPermutation(m = m, τ = τ))
Y = partition(yr, n)#symbolize(yr, SymbolicPermutation(m = m, τ = τ))
for (j, l) in enumerate(ls)
leans_wn[i, j] = lean(X, Y, l, weighted = weighted)
end

x, y = columns(trajectory(henon2(c_xy = 0.0), npts - 1, Ttr = Ttr))
X = partition(x, n)#symbolize(x, SymbolicPermutation(m = m, τ = τ))
Y = partition(y, n)#symbolize(y, SymbolicPermutation(m = m, τ = τ))
for (j, l) in enumerate(ls)
leans_henon_c0[i, j] = lean(X, Y, l, weighted = weighted)
end

x, y = columns(trajectory(henon2(c_xy = 2.0), npts - 1, Ttr = Ttr))
X = partition(x, n)#symbolize(x, SymbolicPermutation(m = m, τ = τ))
Y = partition(y, n)#symbolize(y, SymbolicPermutation(m = m, τ = τ))
for (j, l) in enumerate(ls)
leans_henon_c2[i, j] = lean(X, Y, l, weighted = weighted)
end
end

mean_leans_wn = dropdims(mean(leans_wn, dims = 1), dims = 1)
mean_leans_henon_c0 = dropdims(mean(leans_henon_c0, dims = 1), dims = 1)
mean_leans_henon_c2 = dropdims(mean(leans_henon_c2, dims = 1), dims = 1)

plot(xlabel = "l", ylabel = "λx→y", ylims = (-1.1, 1.1))
#plot!(ls, mean_leans_wn, label = "white noise", marker = :star)
plot!(ls, mean_leans_henon_c0, label = "henon (c_xy = 0.0)", marker = :circle)
#plot!(ls, mean_leans_henon_c2, label = "henon (c_xy = 2.0)", marker = :star5)
184 changes: 184 additions & 0 deletions src/Leanings/penchants.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@

export penchant, lean

using Statistics

# We just assume that the time series have been pre-binned or symbolized
# However, `get_states`` might turn out much more complicated if we decide
# to implement more complicated cause-effect assignments
get_states(x, y) = unique([x; y])

# Getting the penchants might also become more complex for other
# complicated cause-effect assignments, so separate it out in a function.
function get_penchants(x, y)
states = get_states(x, y)

# All possible 2-permutations (with repetitions) of the unique
# states in `x ∪ y`.
return Iterators.product(states, states)
end

function penchant_counts(x, y, l = 1)
@assert length(x) == length(y)
n = length(x)

# Leave reasonably many for probability computations. `l` can't be too large.
@assert l < n ÷ 2

# Number of time the state y_t = 1 AND x_{t-1} = 1 appears in {x, y}
ns_ec = Vector{Int}(undef, 0)
ns_c = Vector{Int}(undef, 0)
ns_e = Vector{Int}(undef, 0)

# For each possible penchant, count occurrences.
penchants = get_penchants(x, y)

@inbounds for penchant in penchants
state_cause, state_effect = penchant[1], penchant[2]

# Number of times the assumed cause has appeared
n_c = 0

# Number of times the assumed effect has appeared
n_e = 0

# Number of times the cause AND effect has appeared simulaneously
n_ec = 0

for t = l+1:n
# "we required that the cause must precede the effect". This is the
# standard l-assignment
effect = y[t]
cause = x[t-l]

if effect == state_effect && cause == state_cause
n_ec += 1
end

if effect == state_effect
n_e += 1
end

if cause == state_cause
n_c += 1
end
end
push!(ns_ec, n_ec)
push!(ns_c, n_c)
push!(ns_e, n_e)
end

return ns_c, ns_e, ns_ec
end


function mean_observed_ρ(ρs, κs, ns_ec, L, weighted = false)
ρ = 0.0
ct = 0
@inbounds for i = 1:length(κs)
if κs[i] != 0.0
if weighted
ρ += ρs[i] * ns_ec[i]
else
ρ += ρs[i]
end
ct += 1
end
end

if weighted
return ρ / L
else
return ρ / ct
end
end

"""
penchant(x, y, l; weighted = false) → ρ̄

Computes the *mean observed penchant* `ρ̄` (McCracken & Weigel, 2016)[^McCrackenWeigel2016]
of the `l`-assignment ``\\{\\text{C}, \\text{E}\\} = \\{\\text{cause}, \\text{effect}\\} =\\{x_{t-l}, y_t\\}``.

If `weighted == true`, then compute the *weighted mean observed penchant*.

## Definition

The *causal penchant* , or causal tendency, is a causal indicator defined as
``\\rho_{CE} = P(E|C) - P(E|\\bar{C})``, where ``\\rho_{CE} \\in [-1, 1]`` (McCracken & Weigel, 2016).
If ``\\rho_{CE} > 0``, then ``C`` causes or drives ``E``, and if
``\\rho_{CE} \\leq 0``, then the direction of influence is undetermined.

A direct formula for ``\\rho_{CE}`` can be obtained using Bayes' theorem.

```math
P(E|C) = P(C|E) \\dfrac{P(E)}{P(C)}
```
Using the definitions of probability complements, one arrives at the following
expression (see the original paper for a detailed derivation):

```math
\\rho_{CE} = P(E|C) \\left[1 + \\dfrac{P(C)}{1-P(C)} \\right] - \\dfrac{P(E)}{1 - P(C)}.
```

Applying appropriate discretization schemes, these probabilities can be estimated
directly from time series using simple counting, which makes the method fast and
well-suited for exploratory causal inference analysis.

See also [`lean`](@ref) and data requirements discussed therein.

[^McCrackenWeigel2016]: McCracken, J. M., & Weigel, R. S. (2016). Nonparametric causal inference for bivariate time series. Physical Review E, 93(2), 022207.
"""
function penchant(x, y, l = 1; weighted = true)
n = length(x)
ns_c, ns_e, ns_ec = penchant_counts(x, y, l)

L = n - l # "library length", resulting from shifting the time series l steps
Ps_e = ns_e ./ L
Ps_c = ns_c ./ L
κs = ns_ec ./ ns_c
ρs = κs.* (1 .+ Ps_c ./ (1 .- Ps_c)) .- (Ps_e) ./ (1 .- Ps_c)

# Mean observed penchant. We can determine which penchants are observed
# by looking for zeros in `κs`. If `κs[i] == 0`, then the `i`-th penchant
# is unobserved. Why? When the penchant is unobserved, there are no
# instances of the cause and effect occurring together, so
# `P(effect | cause) = P(effect, cause) / P(cause) = 0 / P(cause) = 0`
# As explained in section D in the paper, including unobserved penchants
# would be meaningless for causal inference.
mean_observed_ρ(ρs, κs, ns_ec, L, weighted)
end

"""

lean(x, y, l = 1; weighted = true) → ρ̄

Compute the *mean observed leaning* `λ ∈ [-2, 2]` (McCracken & Weigel, 2016)[^McCrackenWeigel2016]
between scalar time series `x` and `y`.

If `ρ̄ > 0`, then the probability that `x` drives `y` is higher than the probability
that `y` drives `x`. Vice versa, if `ρ̄ > 0`, then the probability that `y` drives `x`
is higher than the probability that `x` drives `y`.

## Weighting

If `weighted == true`, then each penchant is weighted by the number of times it appears in
the data, which yields the *weighted mean observed leaning*. This the default behavior, since
"...[the weighted mean observed leaning] accounts for the frequency of observed cause-effect pairs
within the data, which is assumed to be a predictor of causal influence" (McCracken & Weigel, 2016).

## Data requirements

Both `x` and `y` *must* be discrete, and the number discrete states can't be too high
relative to the number of points in `x` and `y`. Otherwise, probabilities cannot be reliably estimated.
If the function returns `NaN`, then you probably either haven't discretized your time series,
or the partition is too fine-grained given the numbers of points available.

Hence, this implementation *does not* take care of the tolerance domains discussed in the paper.
Pre-process your time series using appropriate binning or symbolization schemes.

[^McCrackenWeigel2016]: McCracken, J. M., & Weigel, R. S. (2016). Nonparametric causal inference for bivariate time series. Physical Review E, 93(2), 022207.
"""
function lean(x, y, l = 1; weighted = true)
return penchant(x, y, l, weighted = weighted) - penchant(y, x, l, weighted = weighted)
end

34 changes: 34 additions & 0 deletions src/example_systems/discretemaps/nonlinear_sindriver_2d.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
function eom_nonlinear_sindriver_2d(dx, x, p, n)
a, b, c, t, Δt = (p...,)
x, y = x[1], x[2]
𝒩 = Normal(0, 1)

dx[1] = sin(t)
dx[2] = a*x * (1 - b*x) + c*rand(𝒩)
p[end-1] += Δt # update t

return
end

"""
nonlinear_sindriver_2d(;u₀ = 0, a = 1.0, b = 1.0, c = 2.0, Δt = 1)

A discrete, nonlinear 2D system from McCracken & Weigel (2016)[^McCrackenWeigel2016],
governed by the following equations of motion:

```math
\\begin{aligned}
x_{t+1} = \\sin(t) \\\\
y(t+1) &= a x_t (1 - B x_t) + C \\eta_t
\\end{aligned}
```

where ``\\eta_t \\sim \\mathcal{N}(\\mu = 0, \\sigma = 1)``, ``A, B, C \\in [0, 1]``.

[^McCrackenWeigel2016]: McCracken, J. M., & Weigel, R. S. (2016). Nonparametric causal inference for bivariate time series. Physical Review E, 93(2), 022207.
"""
function nonlinear_sindriver_2d(;u₀ = 0, a = 1.0, b = 1.0, c = 2.0, Δt = (1/30*π))
@assert a >= 0; @assert b >= 0; @assert c >= 0

DiscreteDynamicalSystem(eom_nonlinear_sindriver2d, [0, u₀], [a, b, c, 0, Δt])
end
12 changes: 12 additions & 0 deletions test/methods/test_leanings.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@

@testset "Leanings" begin
x = [0, 0, 1, 0, 0, 1, 0, 0, 1, 0]
y = [0, 0, 0, 1, 0, 0, 1, 0, 0, 1]
@test penchant(x, y, 1, weighted = false) ≈ 1.0
@test penchant(y, x, 1, weighted = false) ≈ 1/7

@test penchant(x, y, 1, weighted = true) ≈ 1.0
@test penchant(y, x, 1, weighted = true) ≈ 3/63

@test lean(x, y, 1, weighted = true) ≈ 60/63
end
1 change: 1 addition & 0 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ include("methods/test_smeasure.jl")
include("methods/test_joint_distance_distribution.jl")
include("methods/test_predictive_asymmetry.jl")
include("methods/test_crossmapping.jl")
include("methods/test_leanings.jl")

include("systems/discrete/test_discrete_systems.jl")
include("systems/continuous/test_continuous_systems.jl")
Expand Down