Solving nonlinear equation to find two constants using an observational dataset #5

LucieVerm · 2017-01-04T15:35:08Z

Hello all,

I have an observational dataset of how the survival (Fs) of an organism depends on a number of variables (t, UV, z and DOC). Based on process knowledge, we have come up with the following equation to relate these variables:

     log(Fs)=-t/c*UV*exp(-k*DOC*z)

This equation has two constants (c and k) that I want to estimate, using the data.
The observational data are as follows, for 16 observations:

Fs<-c(2.0E-04,5.0E-05,3.4E-03,7.7E-07,4.5E-06,1.1E-14,3.2E-03,8.4E-04,1.7E-02,2.3E-02,1.8E-01,8.1E-07,1.0E-05,3.3E-06,6.4E-03,1.4E-01)
t<- c(6.67,7.32,6.43,5.50,5.35,5.58,8.00,8.00,8.00,8.00,8.00,5.48,5.48,5.48,5.48,5.48)
UV<-c(1.18,0.99,1.55,2.99,3.49,2.79,0.49,0.49,0.49,0.49,0.49,3.57,3.57,3.57,3.57,3.57)
z<-c(0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05)
DOC<-c(2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.6,3.6,8,12.3,2.8,2.6,3.6,8,12.3)
data<-data.frame(t,UV,Fs,z,DOC)

I am unsure how to go about estimating c and k.

I have tried the rootSolve package, by defining 16 equations, one for each of the observations:

equations<- function(c){
f1<- -data$t[1]/c[1]*data$UV[1]*exp(-c[2]*data$DOC[1]*data$z[1]) - log(data$Fs[1])
f2<- -data$t[2]/c[1]*data$UV[2]*exp(-c[2]*data$DOC[2]*data$z[2]) - log(data$Fs[2])
f3<- -data$t[3]/c[1]*data$UV[3]*exp(-c[2]*data$DOC[3]*data$z[3]) - log(data$Fs[3])
f4<- -data$t[4]/c[1]*data$UV[4]*exp(-c[2]*data$DOC[4]*data$z[4]) - log(data$Fs[4])
f5<- -data$t[5]/c[1]*data$UV[5]*exp(-c[2]*data$DOC[5]*data$z[5]) - log(data$Fs[5])
f6<- -data$t[6]/c[1]*data$UV[6]*exp(-c[2]*data$DOC[6]*data$z[6]) - log(data$Fs[6])
f7<- -data$t[7]/c[1]*data$UV[7]*exp(-c[2]*data$DOC[7]*data$z[7]) - log(data$Fs[7])
f8<- -data$t[8]/c[1]*data$UV[8]*exp(-c[2]*data$DOC[8]*data$z[8]) - log(data$Fs[8])
f9<- -data$t[9]/c[1]*data$UV[9]*exp(-c[2]*data$DOC[9]*data$z[9]) - log(data$Fs[9])
f10<- -data$t[10]/c[1]*data$UV[10]*exp(-c[2]*data$DOC[10]*data$z[10]) - log(data$Fs[10])
f11<- -data$t[11]/c[1]*data$UV[11]*exp(-c[2]*data$DOC[11]*data$z[11]) - log(data$Fs[11])
f12<- -data$t[12]/c[1]*data$UV[12]*exp(-c[2]*data$DOC[12]*data$z[12]) - log(data$Fs[12])
f13<- -data$t[13]/c[1]*data$UV[13]*exp(-c[2]*data$DOC[13]*data$z[13]) - log(data$Fs[13])
f14<- -data$t[14]/c[1]*data$UV[14]*exp(-c[2]*data$DOC[14]*data$z[14]) - log(data$Fs[14])
f15<- -data$t[15]/c[1]*data$UV[15]*exp(-c[2]*data$DOC[15]*data$z[15]) - log(data$Fs[15])
f16<- -data$t[16]/c[1]*data$UV[16]*exp(-c[2]*data$DOC[16]*data$z[16]) - log(data$Fs[16])
c(f1=f1, f2=f2,f3=f3,f4=f4,f5=f5,f6=f6,f7=f7,f8=f8,f9=f9,f10=f10,f11=f11,f12=f12,f13=f13,f14=f14,f15=f15,f16=f16 )
}

ss<- multiroot(f=equations, start=c(rep(1,16)))
ss

I get
I have tried this for multiple starting values. I always get the warning message:
Warning messages:
1: In stode(y, times, func, parms = parms, ...) :
error during factorisation of matrix (dgefa); singular matrix
2: In stode(y, times, func, parms = parms, ...) : steady-state not reached

And I have tried the nleqslv package, similarly:

equations2<- function(c){
y<-numeric(16)
y[1]<- -data$t[1]/c[1]*data$UV[1]*exp(-c[2]*data$DOC[1]*data$z[1]) - log(data$Fs[1])
y[2]<- -data$t[2]/c[1]*data$UV[2]*exp(-c[2]*data$DOC[2]*data$z[2]) - log(data$Fs[2])
y[3]<- -data$t[3]/c[1]*data$UV[3]*exp(-c[2]*data$DOC[3]*data$z[3]) - log(data$Fs[3])
y[4]<- -data$t[4]/c[1]*data$UV[4]*exp(-c[2]*data$DOC[4]*data$z[4]) - log(data$Fs[4])
y[5]<- -data$t[5]/c[1]*data$UV[5]*exp(-c[2]*data$DOC[5]*data$z[5]) - log(data$Fs[5])
y[6]<- -data$t[6]/c[1]*data$UV[6]*exp(-c[2]*data$DOC[6]*data$z[6]) - log(data$Fs[6])
y[7]<- -data$t[7]/c[1]*data$UV[7]*exp(-c[2]*data$DOC[7]*data$z[7]) - log(data$Fs[7])
y[8]<- -data$t[8]/c[1]*data$UV[8]*exp(-c[2]*data$DOC[8]*data$z[8]) - log(data$Fs[8])
y[9]<- -data$t[9]/c[1]*data$UV[9]*exp(-c[2]*data$DOC[9]*data$z[9]) - log(data$Fs[9])
y[10]<- -data$t[10]/c[1]*data$UV[10]*exp(-c[2]*data$DOC[10]*data$z[10]) - log(data$Fs[10])
y[11]<- -data$t[11]/c[1]*data$UV[11]*exp(-c[2]*data$DOC[11]*data$z[11]) - log(data$Fs[11])
y[12]<- -data$t[12]/c[1]*data$UV[12]*exp(-c[2]*data$DOC[12]*data$z[12]) - log(data$Fs[12])
y[13]<- -data$t[13]/c[1]*data$UV[13]*exp(-c[2]*data$DOC[13]*data$z[13]) - log(data$Fs[13])
y[14]<- -data$t[14]/c[1]*data$UV[14]*exp(-c[2]*data$DOC[14]*data$z[14]) - log(data$Fs[14])
y[15]<- -data$t[15]/c[1]*data$UV[15]*exp(-c[2]*data$DOC[15]*data$z[15]) - log(data$Fs[15])
y[16]<- -data$t[16]/c[1]*data$UV[16]*exp(-c[2]*data$DOC[16]*data$z[16]) - log(data$Fs[16])
y
}

cstart<-c(rep(1,16))
fstart <- equations2(cstart)

nleqslv(cstart, equations2)

I seem to always get out what I put in, so I have the impression that I am not approaching this problem in the right way. There probably is no real solution for c and k that fits all 16 observations, so I am looking for a sort of average. Can anyone give me any clues?

Best regards,
Lucie

The text was updated successfully, but these errors were encountered:

bbrede · 2017-01-04T16:16:20Z

I like to use optim (stats package) which is nice for general purpose optimization.
I would also recommend you to use Rs vectorisation and put problems into functions.

// this returns predicted Fs, basically the model you want to optimise
predict_Fs <- function(c, k, t, UV, z, DOC) {
exp(-t / c * UV * exp(-k * DOC * z))
}

// function that calculates error (in this case RMSE) for given set of parameters (c, k)
// par = c(c, k) -> optim requires all parameters in one vector
error_function <- function(par, t, UV, z, DOC) {
// predict with function
pred <- predict_Fs(par[1], par[2], t, UV, z, DOC)
// calc RMSE
sqrt(mean(Fs - pred) ^ 2)
}

Fs <- c(2.0E-04,5.0E-05,3.4E-03,7.7E-07,4.5E-06,1.1E-14,3.2E-03,8.4E-04,1.7E-02,2.3E-02,1.8E-01,8.1E-07,1.0E-05,3.3E-06,6.4E-03,1.4E-01)
t <- c(6.67,7.32,6.43,5.50,5.35,5.58,8.00,8.00,8.00,8.00,8.00,5.48,5.48,5.48,5.48,5.48)
UV <- c(1.18,0.99,1.55,2.99,3.49,2.79,0.49,0.49,0.49,0.49,0.49,3.57,3.57,3.57,3.57,3.57)
z <- c(0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05)
DOC <- c(2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.6,3.6,8,12.3,2.8,2.6,3.6,8,12.3)

// par gives initial estimates for the parameters , here you should have an idea in which region the parameters are
opt <- optim(par = c(1, 1),
fn = error_function,
t = t, UV = UV, z = z, DOC = DOC)

optim returns a list that contains the results. opt$par is the best estimate
I ended up with c=1.078247 k=1.031125, but I don't know the range of your parameters. You can also limit the range the parameters can take with the upper/lower arguments and method="L-BFGS-B". Have a look into the optim help.

Hope that helps!
Ben

LucieVerm · 2017-01-05T09:47:56Z

Hi Ben, Thanks a lot for your response! This seems to be much better than what I was trying indeed! I will think about what the likely range of my parameters is, because indeed I get different results with different starting parameters.

One question about the RMSE, shouldn't that take an extra set of brackets?
So instead of what you wrote: sqrt(mean(Fs - pred) ^ 2)
Maybe it should be: sqrt(mean((Fs - pred) ^ 2)), because otherwise the square and the root are just cancelling each other out?

Doing so leads to a different outcome.
Best,
Lucie

bbrede · 2017-01-05T10:18:48Z

Yes, true. My version is the mean absolute error. You can also use other error functions, but RMSE is the most widely used/accepted I would think.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solving nonlinear equation to find two constants using an observational dataset #5

Solving nonlinear equation to find two constants using an observational dataset #5

LucieVerm commented Jan 4, 2017

bbrede commented Jan 4, 2017

LucieVerm commented Jan 5, 2017

bbrede commented Jan 5, 2017

Solving nonlinear equation to find two constants using an observational dataset #5

Solving nonlinear equation to find two constants using an observational dataset #5

Comments

LucieVerm commented Jan 4, 2017

bbrede commented Jan 4, 2017

LucieVerm commented Jan 5, 2017

bbrede commented Jan 5, 2017