While the original implementation of the Echo State Network implemented the model using the equations of Recurrent Neural Networks to obtain non-linearity in the reservoir, other variations have been proposed in recent years. More specifically, the different drivers implemented in ReservoirComputing.jl are the multiple activation function RNN MRNN()
and the Gated Recurrent Unit GRU()
. To change them, it suffices to give the chosen method to the ESN
keyword argument reservoir_driver
. In this section, some examples, of their usage will be given, as well as a brief introduction to their equations.
Based on the double activation function ESN (DAFESN) proposed in , the Multiple Activation Function ESN expands the idea and allows a custom number of activation functions to be used in the reservoir dynamics. This can be thought of as a linear combination of multiple activation functions with corresponding parameters.
\[\mathbf{x}(t+1) = (1-\alpha)\mathbf{x}(t) + \lambda_1 f_1(\mathbf{W}\mathbf{x}(t)+\mathbf{W}_{in}\mathbf{u}(t)) + \dots + \lambda_D f_D(\mathbf{W}\mathbf{x}(t)+\mathbf{W}_{in}\mathbf{u}(t))\]
where $D$ is the number of activation functions and respective parameters chosen.
The method to call to use the multiple activation function ESN is MRNN(activation_function, leaky_coefficient, scaling_factor)
. The arguments can be used as both args
and kwargs
. activation_function
and scaling_factor
have to be vectors (or tuples) containing the chosen activation functions and respective scaling factors ($f_1,...,f_D$ and $\lambda_1,...,\lambda_D$ following the nomenclature introduced above). The leaky_coefficient
represents $\alpha$ and it is a single value.
Starting with the example, the data used is based on the following function based on the DAFESN paper . A full script of the example is available here. This example was run on Julia v1.7.2.
u(t) = sin(t)+sin(0.51*t)+sin(0.22*t)+sin(0.1002*t)+sin(0.05343*t)
u (generic function with 1 method)
For this example, the type of prediction will be one step ahead. The metric used to assure a good prediction will be the normalized root-mean-square deviation rmsd
from StatsBase. Like in the other examples, first it is needed to gather the data:
train_len = 3000
+predict_len = 2000
+shift = 1
+
+data = u.(collect(0.0:0.01:500))
+training_input = reduce(hcat, data[shift:shift+train_len-1])
+training_target = reduce(hcat, data[shift+1:shift+train_len])
+testing_input = reduce(hcat, data[shift+train_len:shift+train_len+predict_len-1])
+testing_target = reduce(hcat, data[shift+train_len+1:shift+train_len+predict_len])
1×2000 Matrix{Float64}:
+ 0.852897 0.850969 0.849127 0.847371 … -1.4348 -1.4206 -1.40638
To follow the paper more closely, it is necessary to define a couple of activation functions. The numbering of them follows the ones in the paper. Of course, one can also use any custom-defined function, available in the base language or any activation function from NNlib.
f2(x) = (1-exp(-x))/(2*(1+exp(-x)))
+f3(x) = (2/pi)*atan((pi/2)*x)
+f4(x) = x/sqrt(1+x*x)
f4 (generic function with 1 method)
It is now possible to build different drivers, using the parameters suggested by the paper. Also, in this instance, the numbering follows the test cases of the paper. In the end, a simple for loop is implemented to compare the different drivers and activation functions.
using ReservoirComputing, Random, StatsBase
+
+#fix seed for reproducibility
+Random.seed!(42)
+
+#baseline case with RNN() driver. Parameter given as args
+base_case = RNN(tanh, 0.85)
+
+#MRNN() test cases
+#Parameter given as kwargs
+case3 = MRNN(activation_function=[tanh, f2],
+ leaky_coefficient=0.85,
+ scaling_factor=[0.5, 0.3])
+
+#Parameter given as kwargs
+case4 = MRNN(activation_function=[tanh, f3],
+ leaky_coefficient=0.9,
+ scaling_factor=[0.45, 0.35])
+
+#Parameter given as args
+case5 = MRNN([tanh, f4], 0.9, [0.43, 0.13])
+
+#tests
+test_cases = [base_case, case3, case4, case5]
+for case in test_cases
+ esn = ESN(training_input,
+ input_layer = WeightedLayer(scaling=0.3),
+ reservoir = RandSparseReservoir(100, radius=0.4),
+ reservoir_driver = case,
+ states_type = ExtendedStates())
+ wout = train(esn, training_target, StandardRidge(10e-6))
+ output = esn(Predictive(testing_input), wout)
+ println(rmsd(testing_target, output, normalize=true))
+end
1.2856015653433233e-5
+8.911449874766619e-5
+0.00013300857184562025
+0.00014733027560825
In this example, it is also possible to observe the input of parameters to the methods RNN()
MRNN()
, both by argument and by keyword argument.
Gated Recurrent Units (GRUs) have been proposed in more recent years with the intent of limiting notable problems of RNNs, like the vanishing gradient. This change in the underlying equations can be easily transported into the Reservoir Computing paradigm, by switching the RNN equations in the reservoir with the GRU equations. This approach has been explored in and . Different variations of GRU have been proposed ; this section is subdivided into different sections that go into detail about the governing equations and the implementation of them into ReservoirComputing.jl. Like before, to access the GRU reservoir driver, it suffices to change the reservoir_diver
keyword argument for ESN
with GRU()
. All the variations that will be presented can be used in this package by leveraging the keyword argument variant
in the method GRU()
and specifying the chosen variant: FullyGated()
or Minimal()
. Other variations are possible by modifying the inner layers and reservoirs. The default is set to the standard version FullyGated()
. The first section will go into more detail about the default of the GRU()
method, and the following ones will refer to it to minimize repetitions. This example was run on Julia v1.7.2.
The equations for the standard GRU are as follows:
\[\mathbf{r}(t) = \sigma (\mathbf{W}^r_{\text{in}}\mathbf{u}(t)+\mathbf{W}^r\mathbf{x}(t-1)+\mathbf{b}_r) \\
+\mathbf{z}(t) = \sigma (\mathbf{W}^z_{\text{in}}\mathbf{u}(t)+\mathbf{W}^z\mathbf{x}(t-1)+\mathbf{b}_z) \\
+\tilde{\mathbf{x}}(t) = \text{tanh}(\mathbf{W}_{in}\mathbf{u}(t)+\mathbf{W}(\mathbf{r}(t) \odot \mathbf{x}(t-1))+\mathbf{b}) \\
+\mathbf{x}(t) = \mathbf{z}(t) \odot \mathbf{x}(t-1)+(1-\mathbf{z}(t)) \odot \tilde{\mathbf{x}}(t)\]
Going over the GRU
keyword argument, it will be explained how to feed the desired input to the model.
activation_function
is a vector with default values [NNlib.sigmoid, NNlib.sigmoid, tanh]
. This argument controls the activation functions of the GRU, going from top to bottom. Changing the first element corresponds to changing the activation function for $\mathbf{r}(t)$ and so on.inner_layer
is a vector with default values fill(DenseLayer(), 2)
. This keyword argument controls the $\mathbf{W}_{\text{in}}$s going from top to bottom like before.reservoir
is a vector with default value fill(RandSparseReservoir(), 2)
. In a similar fashion to inner_layer
, this keyword argument controls the reservoir matrix construction in a top to bottom order.bias
is again a vector with default value fill(DenseLayer(), 2)
. It is meant to control the $\mathbf{b}$s, going as usual from top to bottom.variant
controls the GRU variant. The default value is set to FullyGated()
.
It is important to notice that inner_layer
and reservoir
control every layer except $\mathbf{W}_{in}$ and $\mathbf{W}$ and $\mathbf{b}$. These arguments are given as input to the ESN()
call as input_layer
, reservoir
and bias
.
The following sections are going to illustrate the variations of the GRU architecture and how to obtain them in ReservoirComputing.jl
The first variation of the GRU is dependent only on the previous hidden state and the bias:
\[\mathbf{r}(t) = \sigma (\mathbf{W}^r\mathbf{x}(t-1)+\mathbf{b}_r) \\
+\mathbf{z}(t) = \sigma (\mathbf{W}^z\mathbf{x}(t-1)+\mathbf{b}_z) \\\]
To obtain this variation, it will suffice to set inner_layer = fill(NullLayer(), 2)
and leaving the variant = FullyGated()
.
The second variation only depends on the previous hidden state:
\[\mathbf{r}(t) = \sigma (\mathbf{W}^r\mathbf{x}(t-1)) \\
+\mathbf{z}(t) = \sigma (\mathbf{W}^z\mathbf{x}(t-1)) \\\]
Similarly to before, to obtain this variation, it is only required to set inner_layer = fill(NullLayer(), 2)
and bias = fill(NullLayer(), 2)
while keeping variant = FullyGated()
.
The final variation, before the minimal one, depends only on the biases
\[\mathbf{r}(t) = \sigma (\mathbf{b}_r) \\
+\mathbf{z}(t) = \sigma (\mathbf{b}_z) \\\]
This means that it is only needed to set inner_layer = fill(NullLayer(), 2)
and reservoir = fill(NullReservoir(), 2)
while keeping variant = FullyGated()
.
The minimal GRU variation merges two gates into one:
\[\mathbf{f}(t) = \sigma (\mathbf{W}^f_{\text{in}}\mathbf{u}(t)+\mathbf{W}^f\mathbf{x}(t-1)+\mathbf{b}_f) \\
+\tilde{\mathbf{x}}(t) = \text{tanh}(\mathbf{W}_{in}\mathbf{u}(t)+\mathbf{W}(\mathbf{f}(t) \odot \mathbf{x}(t-1))+\mathbf{b}) \\
+\mathbf{x}(t) = (1-\mathbf{f}(t)) \odot \mathbf{x}(t-1) + \mathbf{f}(t) \odot \tilde{\mathbf{x}}(t)\]
This variation can be obtained by setting variation=Minimal()
. The inner_layer
, reservoir
and bias
kwargs this time are not vectors, but must be defined like, for example inner_layer = DenseLayer()
or reservoir = SparseDenseReservoir()
.
To showcase the use of the GRU()
method, this section will only illustrate the standard FullyGated()
version. The full script for this example with the data can be found here.
The data used for this example is the Santa Fe laser dataset retrieved from here. The data is split to account for a next step prediction.
using DelimitedFiles
+
+data = reduce(hcat, readdlm("./data/santafe_laser.txt"))
+
+train_len = 5000
+predict_len = 2000
+
+training_input = data[:, 1:train_len]
+training_target = data[:, 2:train_len+1]
+testing_input = data[:,train_len+1:train_len+predict_len]
+testing_target = data[:,train_len+2:train_len+predict_len+1]
1×2000 Matrix{Float64}:
+ 49.0 107.0 126.0 67.0 31.0 22.0 … 9.0 7.0 6.0 6.0 17.0 95.0
The construction of the ESN proceeds as usual.
using ReservoirComputing, Random
+
+res_size = 300
+res_radius = 1.4
+
+Random.seed!(42)
+esn = ESN(training_input;
+ reservoir = RandSparseReservoir(res_size, radius=res_radius),
+ reservoir_driver = GRU())
ESN{Int64, Matrix{Float64}, Default, NLADefault, Matrix{Float64}, GRUParams{Vector{Function}, FullyGated, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}}, Matrix{Float64}, Matrix{Float64}, StandardStates, Int64, Matrix{Float64}}(300, [86.0 141.0 … 24.0 18.0], Default(), NLADefault(), [-0.06528508484109852; -0.03566767616843869; … ; -0.09396288999215839; -0.08413322211053186;;], GRUParams{Vector{Function}, FullyGated, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}}(Function[NNlib.σ, NNlib.σ, tanh], FullyGated(), [0.02230523660537638; -0.0034398719354316737; … ; -0.025897677257287074; -0.00039030445216325926;;], [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0], [-0.04216170957861583; -0.012797573671249937; … ; 0.09076249696989327; -0.03380397472919812;;], [-0.033430712234970586; 0.0559194931860528; … ; -0.07255435557240826; 0.02173815511365254;;], [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; -0.0862620639146491 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0], [0.09375072756642064; 0.0858822093946496; … ; -0.06776967230017705; -0.007077497647141856;;]), [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0], [0.0; 0.0; … ; 0.0; 0.0;;], StandardStates(), 0, [-0.8671381765889121 -0.9912595796764871 … -0.9401215374667762 -0.890617924591633; -0.4216195345774637 -0.7382105109506079 … -0.36449808737714867 -0.2397818007536538; … ; -0.10560536195803504 -0.1485932104374928 … -0.9317141112776713 -0.9584885231913787; -0.48316332628078706 -0.8283549589603598 … -0.9771679034729063 -0.9239246126227301])
The default inner reservoir and input layer for the GRU are the same defaults for the reservoir
and input_layer
of the ESN. One can use the explicit call if they choose to.
gru = GRU(reservoir=[RandSparseReservoir(res_size),
+ RandSparseReservoir(res_size)],
+ inner_layer=[DenseLayer(), DenseLayer()])
+esn = ESN(training_input;
+ reservoir = RandSparseReservoir(res_size, radius=res_radius),
+ reservoir_driver = gru)
ESN{Int64, Matrix{Float64}, Default, NLADefault, Matrix{Float64}, GRUParams{Vector{Function}, FullyGated, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}}, Matrix{Float64}, Matrix{Float64}, StandardStates, Int64, Matrix{Float64}}(300, [86.0 141.0 … 24.0 18.0], Default(), NLADefault(), [0.015163939076367122; -0.09466832324644306; … ; 0.026143706544396322; 0.060665587968020546;;], GRUParams{Vector{Function}, FullyGated, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}}(Function[NNlib.σ, NNlib.σ, tanh], FullyGated(), [-0.09438998233267762; 0.01516583940820862; … ; -0.05392412031316565; -0.09195105803106994;;], [0.0 0.24842966892116178 … 0.0 -0.09229664658055516; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … -0.21311971790003834 0.0; 0.0 0.0 … 0.0 0.0], [0.0553432511334184; 0.015480700825181631; … ; 0.0450005489737193; -0.03664697123487497;;], [-0.04681630562295354; -0.036838788436552417; … ; 0.04350366683086737; 0.033658276780241025;;], [0.0 0.0 … 0.0 0.15850905618330302; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.23254593096266057 … 0.0 0.0], [-0.0416942599602977; 0.03480966465862739; … ; 0.07482380591116372; -0.02395585723292859;;]), [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 -0.2690454240338555 … -0.17437433122896112 0.0], [0.0; 0.0; … ; 0.0; 0.0;;], StandardStates(), 0, [0.00027189020765169147 0.0002728286770121699 … 0.23957116678319149 0.23258298180768663; -0.789138793085977 -0.9799074681652455 … -0.9786897598402031 -0.9382922765933875; … ; 0.009805198301563412 0.010368220355595625 … 0.0998726568273561 0.07149824981688994; 0.0003544825880106766 0.00035529265906558385 … 0.1513058069338646 0.17832374112063185])
The training and prediction can proceed as usual:
training_method = StandardRidge(0.0)
+output_layer = train(esn, training_target, training_method)
+output = esn(Predictive(testing_input), output_layer)
1×2000 Matrix{Float64}:
+ 49.3531 105.759 120.853 60.2022 … 5.948 6.53765 13.9241 95.4178
The results can be plotted using Plots.jl
using Plots
+
+plot([testing_target' output'], label=["actual" "predicted"],
+ plot_title="Santa Fe Laser",
+ titlefontsize=20,
+ legendfontsize=12,
+ linewidth=2.5,
+ xtickfontsize = 12,
+ ytickfontsize = 12,
+ size=(1080, 720))
It is interesting to see a comparison of the GRU driven ESN and the standard RNN driven ESN. Using the same parameters defined before it is possible to do the following
using StatsBase
+
+esn_rnn = ESN(training_input;
+ reservoir = RandSparseReservoir(res_size, radius=res_radius),
+ reservoir_driver = RNN())
+
+output_layer = train(esn_rnn, training_target, training_method)
+output_rnn = esn_rnn(Predictive(testing_input), output_layer)
+
+println(msd(testing_target, output))
+println(msd(testing_target, output_rnn))
12.665328620306422
+20.96408330872688