Dear all,
I am trying to evaluate the optimisation behaviour of a function. Originally
I have optimised a model with real data and got a set of parameters. Now I
am creating simulated data sets based on these estimates. With these
simulations I am estimating the parameters again to see how variable the
estimation is. To this end I have written a loop which should generate a new
simulated data set each time. However, the optimisation algorithm, which
works fine if only one data set is used, does not recognise the simulated
data in the loop. Can anyone tell me where the error is? The code is below.
Thanks for your help
Simon
# The loop in which nothing works anymore. The NLL in the optim function
appears not to recognise the "new" data set. The functions used by
this loop
are given below.
sim.estim=function(r)
{
res=matrix(nrow=r,ncol=5)
for (s in 1:r)
{
new=new.set()
Min=optim(c(0.5,0.5,0.01,0.1),NLL,method="L-BFGS-B",lower=c(0,0,0,0))
res[s,1]=Min$par[1]
res[s,2]=Min$par[2]
res[s,3]=Min$par[3]
res[s,4]=Min$par[4]
res[s,5]=Min$value[1]
}
return(res)
}
# function to create a new data set. Works fine on its own
new.set=function()
{
tmp=matrix(nrow=510,ncol=2)
v=numeric()
sim=Model_1(1.2761707, 0.1953354, 2.7351930, 0.1032929)
tmp[,1]=sample(sim[,1],510,replace=T)
v=runif(510,0,1)
for (k in 1 : 510)
{
for (n in 1 : length(sim[,1]))
{
if (tmp[k,1]==sim[n,1] &&
v[k]<=sim[n,2]){tmp[k,2]=1}
if(tmp[k,1]==sim[n,1] && v[k]>sim[n,2])
{tmp[k,2]=0}
}
}
return(tmp)
}
# The model to be fitted
Model_1=function(beta0_mod,mu_mod,k_mod,I_mod)
{
parms = c(beta0=beta0_mod, mu=mu_mod, k=k_mod)
my.atol = 1e-6
times1=c(0,0.002884615,0.003846154,0.005769231,0.009615385,0.01,0.019230769)
times2=c(0.028846154,0.038461538,0.057692308,
0.076923077,0.083333333,0.096153846)
times3=c(0.115384615,0.125,0.134615385,0.166666667,0.25,0.416666667,0.5)
times4=c(0.58333333,0.666666667,0.75,0.83333333,0.916666667,1)
times5=c( 1.166666667,1.25,1.5,1.6,1.75,2,2.5)
times6=seq(3,20)
times = c(times1,times2,times3,times4,times5,times6)
ODE_sys = function(t, x, w)
#w=c("beta0","mu","k")
{
I=x[1]
dI=-w["mu"]*I+w["beta0"]*exp(-w["k"]*t)*(1-I)
list(c(dI),c(S=1-I))
}
output = lsoda(c(I_mod),times,ODE_sys, parms, rtol=1e-6, atolmy.atol)
return(output)
}
# The negative log likelihood of this model given a data set "new".
This
works fine if it is used in the optim() routine with only one data set.
NLL=function(coef)
{
beta0=coef[1]
mu=coef[2]
k=coef[3]
I=coef[4]
data_sim=Model_1(beta0,mu,k,I)
LLsum=0 #log likelihood sum
for (j in 1:length(data_sim[,1]))
{
if (data_sim[j,2]<0 || data_sim[j,3]<0)
{return(1500)}
for (i in 1:length(new[,1]))
{
if (new[i,1]==data_sim[j,1] && is.na(new[i,1])==F
&&
is.na(data_sim[j,1])==F)
# if age of real data = age of simulated model and not
equal to NA
{
if (is.na(new[i,2])==F &&
is.na(data_sim[j,2])==F &&
is.na(data_sim[j,3])==F)
# if state of real data and state of simulated model
are not equal to NA
{
tmp1=new[i,2]*log(data_sim[j,2])
tmp2=(1-new[i,2])*log(data_sim[j,3])
if (tmp1=="-Inf" ||
tmp1=="NaN"){tmp1=-700}
if (tmp2=="-Inf" ||
tmp2=="NaN"){tmp2=-700}
LLsum=LLsum+tmp1+tmp2
}
}
}
}
return(-LLsum)
}
********************************************************************
Simon Ruegg
Dr.med.vet., PhD student
Institute for Parasitology
Winterthurstr. 266a
8057 Zurich
Switzerland
phone: +41 44 635 85 93
fax: +41 44 635 89 07
e-mail: s.ruegg@access.unizh.ch
[[alternative HTML version deleted]]
I think it's a scoping problem. Your function NLL() looks for
"new" in
the environment in which NLL() was defined, but you generate your
simulated datasets in a different environment (local to sim.estim()).
There are a number of ways to deal with this:
- pass the dataset as a parameter to NLL()
- manipulate environments with "assign()" or "<<-"
- put "new" in its own environment and use "assign()" and
"get()"
- define NLL() within sim.estim() after generating "new", like
"NLL <-
negative.log.likelihood(new)" where negative.log.likelihood() is a
function like
negative.log.likelihood <- function(new) {
NLL <- function(coef) {
... (your definition here)...
}
NLL
}
This defines a new NLL() each time, using the dataset "new" just
generated.
Reid Huntsinger
Simon Ruegg wrote:
>Dear all,
>
>
>
>I am trying to evaluate the optimisation behaviour of a function. Originally
>I have optimised a model with real data and got a set of parameters. Now I
>am creating simulated data sets based on these estimates. With these
>simulations I am estimating the parameters again to see how variable the
>estimation is. To this end I have written a loop which should generate a new
>simulated data set each time. However, the optimisation algorithm, which
>works fine if only one data set is used, does not recognise the simulated
>data in the loop. Can anyone tell me where the error is? The code is below.
>
>
>
>Thanks for your help
>
>
>
>Simon
>
>
>
>
>
># The loop in which nothing works anymore. The NLL in the optim function
>appears not to recognise the "new" data set. The functions used by
this loop
>are given below.
>
>
>
>sim.estim=function(r)
>
>{
>
> res=matrix(nrow=r,ncol=5)
>
> for (s in 1:r)
>
> {
>
> new=new.set()
>
>
>Min=optim(c(0.5,0.5,0.01,0.1),NLL,method="L-BFGS-B",lower=c(0,0,0,0))
>
> res[s,1]=Min$par[1]
>
> res[s,2]=Min$par[2]
>
> res[s,3]=Min$par[3]
>
> res[s,4]=Min$par[4]
>
> res[s,5]=Min$value[1]
>
> }
>
>return(res)
>
>}
>
>
>
>
>
>
>
># function to create a new data set. Works fine on its own
>
>
>
>new.set=function()
>
>{
>
> tmp=matrix(nrow=510,ncol=2)
>
> v=numeric()
>
> sim=Model_1(1.2761707, 0.1953354, 2.7351930, 0.1032929)
>
>
>
> tmp[,1]=sample(sim[,1],510,replace=T)
>
> v=runif(510,0,1)
>
>
>
> for (k in 1 : 510)
>
> {
>
> for (n in 1 : length(sim[,1]))
>
> {
>
> if (tmp[k,1]==sim[n,1] &&
v[k]<=sim[n,2]){tmp[k,2]=1}
>
> if(tmp[k,1]==sim[n,1] && v[k]>sim[n,2])
{tmp[k,2]=0}
>
> }
>
> }
>
>return(tmp)
>
>}
>
>
>
>
>
># The model to be fitted
>
>
>
>Model_1=function(beta0_mod,mu_mod,k_mod,I_mod)
>
>{
>
> parms = c(beta0=beta0_mod, mu=mu_mod, k=k_mod)
>
> my.atol = 1e-6
>
>
>times1=c(0,0.002884615,0.003846154,0.005769231,0.009615385,0.01,0.019230769)
>
>
> times2=c(0.028846154,0.038461538,0.057692308,
>0.076923077,0.083333333,0.096153846)
>
>
>times3=c(0.115384615,0.125,0.134615385,0.166666667,0.25,0.416666667,0.5)
>
> times4=c(0.58333333,0.666666667,0.75,0.83333333,0.916666667,1)
>
> times5=c( 1.166666667,1.25,1.5,1.6,1.75,2,2.5)
>
> times6=seq(3,20)
>
> times = c(times1,times2,times3,times4,times5,times6)
>
> ODE_sys = function(t, x, w)
#w=c("beta0","mu","k")
>
> {
>
> I=x[1]
>
>
>
>
dI=-w["mu"]*I+w["beta0"]*exp(-w["k"]*t)*(1-I)
>
>
>
> list(c(dI),c(S=1-I))
>
> }
>
>
>
> output = lsoda(c(I_mod),times,ODE_sys, parms, rtol=1e-6,
atol>my.atol)
>
> return(output)
>
>}
>
>
>
># The negative log likelihood of this model given a data set
"new". This
>works fine if it is used in the optim() routine with only one data set.
>
>
>
>NLL=function(coef)
>
>{
>
> beta0=coef[1]
>
> mu=coef[2]
>
> k=coef[3]
>
> I=coef[4]
>
>
>
> data_sim=Model_1(beta0,mu,k,I)
>
>
>
> LLsum=0 #log likelihood sum
>
>
>
> for (j in 1:length(data_sim[,1]))
>
> {
>
> if (data_sim[j,2]<0 || data_sim[j,3]<0)
>
> {return(1500)}
>
>
>
> for (i in 1:length(new[,1]))
>
> {
>
> if (new[i,1]==data_sim[j,1] && is.na(new[i,1])==F
&&
>is.na(data_sim[j,1])==F)
>
> # if age of real data = age of simulated model and not
>equal to NA
>
> {
>
> if (is.na(new[i,2])==F &&
is.na(data_sim[j,2])==F &&
>is.na(data_sim[j,3])==F)
>
> # if state of real data and state of simulated model
>are not equal to NA
>
> {
>
> tmp1=new[i,2]*log(data_sim[j,2])
>
> tmp2=(1-new[i,2])*log(data_sim[j,3])
>
>
>
> if (tmp1=="-Inf" ||
tmp1=="NaN"){tmp1=-700}
>
> if (tmp2=="-Inf" ||
tmp2=="NaN"){tmp2=-700}
>
> LLsum=LLsum+tmp1+tmp2
>
> }
>
> }
>
>
>
> }
>
> }
>
>
>
>return(-LLsum)
>
>}
>
>
>
>
>
>
>
>
>
>
>
>********************************************************************
>
>Simon Ruegg
>
>Dr.med.vet., PhD student
>
>Institute for Parasitology
>
>Winterthurstr. 266a
>
>8057 Zurich
>
>Switzerland
>
>
>
>phone: +41 44 635 85 93
>
>fax: +41 44 635 89 07
>
>e-mail: s.ruegg at access.unizh.ch
>
>
>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>
>