Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solving nonlinear equation to find two constants using an observational dataset #5

Open
LucieVerm opened this issue Jan 4, 2017 · 3 comments

Comments

@LucieVerm
Copy link

Hello all,

I have an observational dataset of how the survival (Fs) of an organism depends on a number of variables (t, UV, z and DOC). Based on process knowledge, we have come up with the following equation to relate these variables:

     log(Fs)=-t/c*UV*exp(-k*DOC*z) 

This equation has two constants (c and k) that I want to estimate, using the data.
The observational data are as follows, for 16 observations:

Fs<-c(2.0E-04,5.0E-05,3.4E-03,7.7E-07,4.5E-06,1.1E-14,3.2E-03,8.4E-04,1.7E-02,2.3E-02,1.8E-01,8.1E-07,1.0E-05,3.3E-06,6.4E-03,1.4E-01)
t<- c(6.67,7.32,6.43,5.50,5.35,5.58,8.00,8.00,8.00,8.00,8.00,5.48,5.48,5.48,5.48,5.48)
UV<-c(1.18,0.99,1.55,2.99,3.49,2.79,0.49,0.49,0.49,0.49,0.49,3.57,3.57,3.57,3.57,3.57)
z<-c(0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05)
DOC<-c(2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.6,3.6,8,12.3,2.8,2.6,3.6,8,12.3)
data<-data.frame(t,UV,Fs,z,DOC)

I am unsure how to go about estimating c and k.

I have tried the rootSolve package, by defining 16 equations, one for each of the observations:

equations<- function(c){
f1<- -data$t[1]/c[1]*data$UV[1]*exp(-c[2]*data$DOC[1]*data$z[1]) - log(data$Fs[1])
f2<- -data$t[2]/c[1]*data$UV[2]*exp(-c[2]*data$DOC[2]*data$z[2]) - log(data$Fs[2])
f3<- -data$t[3]/c[1]*data$UV[3]*exp(-c[2]*data$DOC[3]*data$z[3]) - log(data$Fs[3])
f4<- -data$t[4]/c[1]*data$UV[4]*exp(-c[2]*data$DOC[4]*data$z[4]) - log(data$Fs[4])
f5<- -data$t[5]/c[1]*data$UV[5]*exp(-c[2]*data$DOC[5]*data$z[5]) - log(data$Fs[5])
f6<- -data$t[6]/c[1]*data$UV[6]*exp(-c[2]*data$DOC[6]*data$z[6]) - log(data$Fs[6])
f7<- -data$t[7]/c[1]*data$UV[7]*exp(-c[2]*data$DOC[7]*data$z[7]) - log(data$Fs[7])
f8<- -data$t[8]/c[1]*data$UV[8]*exp(-c[2]*data$DOC[8]*data$z[8]) - log(data$Fs[8])
f9<- -data$t[9]/c[1]*data$UV[9]*exp(-c[2]*data$DOC[9]*data$z[9]) - log(data$Fs[9])
f10<- -data$t[10]/c[1]*data$UV[10]*exp(-c[2]*data$DOC[10]*data$z[10]) - log(data$Fs[10])
f11<- -data$t[11]/c[1]*data$UV[11]*exp(-c[2]*data$DOC[11]*data$z[11]) - log(data$Fs[11])
f12<- -data$t[12]/c[1]*data$UV[12]*exp(-c[2]*data$DOC[12]*data$z[12]) - log(data$Fs[12])
f13<- -data$t[13]/c[1]*data$UV[13]*exp(-c[2]*data$DOC[13]*data$z[13]) - log(data$Fs[13])
f14<- -data$t[14]/c[1]*data$UV[14]*exp(-c[2]*data$DOC[14]*data$z[14]) - log(data$Fs[14])
f15<- -data$t[15]/c[1]*data$UV[15]*exp(-c[2]*data$DOC[15]*data$z[15]) - log(data$Fs[15])
f16<- -data$t[16]/c[1]*data$UV[16]*exp(-c[2]*data$DOC[16]*data$z[16]) - log(data$Fs[16])
c(f1=f1, f2=f2,f3=f3,f4=f4,f5=f5,f6=f6,f7=f7,f8=f8,f9=f9,f10=f10,f11=f11,f12=f12,f13=f13,f14=f14,f15=f15,f16=f16 )
}

ss<- multiroot(f=equations, start=c(rep(1,16)))
ss

I get
I have tried this for multiple starting values. I always get the warning message:
Warning messages:
1: In stode(y, times, func, parms = parms, ...) :
error during factorisation of matrix (dgefa); singular matrix
2: In stode(y, times, func, parms = parms, ...) : steady-state not reached

And I have tried the nleqslv package, similarly:

equations2<- function(c){
y<-numeric(16)
y[1]<- -data$t[1]/c[1]*data$UV[1]*exp(-c[2]*data$DOC[1]*data$z[1]) - log(data$Fs[1])
y[2]<- -data$t[2]/c[1]*data$UV[2]*exp(-c[2]*data$DOC[2]*data$z[2]) - log(data$Fs[2])
y[3]<- -data$t[3]/c[1]*data$UV[3]*exp(-c[2]*data$DOC[3]*data$z[3]) - log(data$Fs[3])
y[4]<- -data$t[4]/c[1]*data$UV[4]*exp(-c[2]*data$DOC[4]*data$z[4]) - log(data$Fs[4])
y[5]<- -data$t[5]/c[1]*data$UV[5]*exp(-c[2]*data$DOC[5]*data$z[5]) - log(data$Fs[5])
y[6]<- -data$t[6]/c[1]*data$UV[6]*exp(-c[2]*data$DOC[6]*data$z[6]) - log(data$Fs[6])
y[7]<- -data$t[7]/c[1]*data$UV[7]*exp(-c[2]*data$DOC[7]*data$z[7]) - log(data$Fs[7])
y[8]<- -data$t[8]/c[1]*data$UV[8]*exp(-c[2]*data$DOC[8]*data$z[8]) - log(data$Fs[8])
y[9]<- -data$t[9]/c[1]*data$UV[9]*exp(-c[2]*data$DOC[9]*data$z[9]) - log(data$Fs[9])
y[10]<- -data$t[10]/c[1]*data$UV[10]*exp(-c[2]*data$DOC[10]*data$z[10]) - log(data$Fs[10])
y[11]<- -data$t[11]/c[1]*data$UV[11]*exp(-c[2]*data$DOC[11]*data$z[11]) - log(data$Fs[11])
y[12]<- -data$t[12]/c[1]*data$UV[12]*exp(-c[2]*data$DOC[12]*data$z[12]) - log(data$Fs[12])
y[13]<- -data$t[13]/c[1]*data$UV[13]*exp(-c[2]*data$DOC[13]*data$z[13]) - log(data$Fs[13])
y[14]<- -data$t[14]/c[1]*data$UV[14]*exp(-c[2]*data$DOC[14]*data$z[14]) - log(data$Fs[14])
y[15]<- -data$t[15]/c[1]*data$UV[15]*exp(-c[2]*data$DOC[15]*data$z[15]) - log(data$Fs[15])
y[16]<- -data$t[16]/c[1]*data$UV[16]*exp(-c[2]*data$DOC[16]*data$z[16]) - log(data$Fs[16])
y
}

cstart<-c(rep(1,16))
fstart <- equations2(cstart)

nleqslv(cstart, equations2)

I seem to always get out what I put in, so I have the impression that I am not approaching this problem in the right way. There probably is no real solution for c and k that fits all 16 observations, so I am looking for a sort of average. Can anyone give me any clues?

Best regards,
Lucie

@bbrede
Copy link
Contributor

bbrede commented Jan 4, 2017

I like to use optim (stats package) which is nice for general purpose optimization.
I would also recommend you to use Rs vectorisation and put problems into functions.

// this returns predicted Fs, basically the model you want to optimise
predict_Fs <- function(c, k, t, UV, z, DOC) {
exp(-t / c * UV * exp(-k * DOC * z))
}

// function that calculates error (in this case RMSE) for given set of parameters (c, k)
// par = c(c, k) -> optim requires all parameters in one vector
error_function <- function(par, t, UV, z, DOC) {
// predict with function
pred <- predict_Fs(par[1], par[2], t, UV, z, DOC)
// calc RMSE
sqrt(mean(Fs - pred) ^ 2)
}

Fs <- c(2.0E-04,5.0E-05,3.4E-03,7.7E-07,4.5E-06,1.1E-14,3.2E-03,8.4E-04,1.7E-02,2.3E-02,1.8E-01,8.1E-07,1.0E-05,3.3E-06,6.4E-03,1.4E-01)
t <- c(6.67,7.32,6.43,5.50,5.35,5.58,8.00,8.00,8.00,8.00,8.00,5.48,5.48,5.48,5.48,5.48)
UV <- c(1.18,0.99,1.55,2.99,3.49,2.79,0.49,0.49,0.49,0.49,0.49,3.57,3.57,3.57,3.57,3.57)
z <- c(0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05)
DOC <- c(2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.6,3.6,8,12.3,2.8,2.6,3.6,8,12.3)

// par gives initial estimates for the parameters , here you should have an idea in which region the parameters are
opt <- optim(par = c(1, 1),
fn = error_function,
t = t, UV = UV, z = z, DOC = DOC)

optim returns a list that contains the results. opt$par is the best estimate
I ended up with c=1.078247 k=1.031125, but I don't know the range of your parameters. You can also limit the range the parameters can take with the upper/lower arguments and method="L-BFGS-B". Have a look into the optim help.

Hope that helps!
Ben

@LucieVerm
Copy link
Author

Hi Ben, Thanks a lot for your response! This seems to be much better than what I was trying indeed! I will think about what the likely range of my parameters is, because indeed I get different results with different starting parameters.

One question about the RMSE, shouldn't that take an extra set of brackets?
So instead of what you wrote: sqrt(mean(Fs - pred) ^ 2)
Maybe it should be: sqrt(mean((Fs - pred) ^ 2)), because otherwise the square and the root are just cancelling each other out?

Doing so leads to a different outcome.
Best,
Lucie

@bbrede
Copy link
Contributor

bbrede commented Jan 5, 2017

Yes, true. My version is the mean absolute error. You can also use other error functions, but RMSE is the most widely used/accepted I would think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants