Hertzog Gladys
2013-Jun-07 12:00 UTC
[R] estimation of covariance matrix of a bivariate normal distribution using maximization of the log-likelihood
Dear all, I?m new in R and I?m trying to estimate the covariance matrix of a bivariate normal distribution by maximizing the log-likelihood. The maximization really has to be performed with the non-linear minimization routine (nlm). The 2 means of the distribution are known and equal to 1. I?ve already tried 2 different ways to compute this covariance but for each of them I obtained a lot of warnings and illogical values for the covariance matrix. In the first one, I defined the bivariate normal distribution with the command dmvnorm: x<-rnorm(6000, 2.4, 0.6) x <- matrix(c(x), ncol=1) y<-rlnorm(6000, 1.3,0.1) y <- matrix(c(y), ncol=1) XY <- cbind(x,y) L <- function(par,x,y) { return (-sum(log(par[4]*dmvnorm(XY, mean=c(1,1), sigma= matrix(c(par[1], par[1]*par[2]*par[3],par[1]*par[2]*par[3], par[2] ),nrow=2, ncol=2)) ))) } par.start<- c(0.5, 0.5 ,0.5 ,0.5) result<-nlm(L,par.start,y=y,x=x, hessian=TRUE) par.hat <- result$estimate par.hatIl y a eu 32 avis (utilisez warnings() pour les visionner)> par.hat <- result$estimate > par.hat[1] 5.149919e+01 2.520721e+02 8.734212e-03 3.996771e+02> warnings()Messages d'avis : 1: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) : production de NaN 2: In nlm(L, par.start, y = y, x = x, hessian = TRUE) : NA/Inf replaced by maximum positive value 3: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) : production de NaN 4: In nlm(L, par.start, y = y, x = x, hessian = TRUE) : NA/Inf replaced by maximum positive value 5: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) : production de NaN 6: In nlm(L, par.start, y = y, x = x, hessian = TRUE) : NA/Inf replaced by maximum positive value 7: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) : production de NaN 8: In nlm(L, par.start, y = y, x = x, hessian = TRUE) : NA/Inf replaced by maximum positive value 9: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) : production de NaN 10: In nlm(L, par.start, y = y, x = x, hessian = TRUE) : NA/Inf replaced by maximum positive value 11: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) : production de NaN 12: In nlm(L, par.start, y = y, x = x, hessian = TRUE) : NA/Inf replaced by maximum positive value 13: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) : production de NaN In the second one, I wrote step by step the bivariate normal distribution in order to have each parameter separately (not in a matrix) but it didn?t work as well: x<-rnorm(6000, 2.4, 0.6) y<-rlnorm(6000, 1.3,0.1) L <- function(par,x,y) { return (-sum(log((1-par[4])*( (1/(2*pi*par[1]*par[2]*sqrt(1-par[3])))*exp( (-1/2*(1-par[3]^2))* ((y-1)/par[2])^2 +((x-1)/par[1])^2 - 2*(y-1)*(x-1)/(par[2]*par[1]) )) ))) } #par [1]= sigma_x , par [2]= sigma_y par [3]= rho_xy par[4] is a mixing parameter. The final step of my calculation will be to have a mixture of bivariate normal and log-normal distributions. par.start<- c(0.5, 0.5 ,0.5 ,0.5) result<-nlm(L,par.start,y=y,x=x, hessian=T) par.hat <- result$estimate par.ha When I run this script, I get always 50 advices like those below: Messages d'avis : 1: In sqrt(1 - par[3]) : production de NaN 2: In nlm(L, par.start, y = y, x = x, hessian = T) : NA/Inf replaced by maximum positive value 3: In sqrt(1 - par[3]) : production de NaN 4: In nlm(L, par.start, y = y, x = x, hessian = T) : NA/Inf replaced by maximum positive value 5: In sqrt(1 - par[3]) : production de NaN 6: In nlm(L, par.start, y = y, x = x, hessian = T) : NA/Inf replaced by maximum positive value 7: In log((1 - par[4]) * ((1/(2 * pi * par[1] * par[2] * ... : production de NaN 8: In nlm(L, par.start, y = y, x = x, hessian = T) : NA/Inf replaced by maximum positive value 9: In log((1 - par[4]) * ((1/(2 * pi * par[1] * par[2] * ... : production de NaN 10: In nlm(L, par.start, y = y, x = x, hessian = T) : NA/Inf replaced by maximum positive value 11: In log((1 - par[4]) * ((1/(2 * pi * par[1] * par[2] * ... : production de NaN 12: In nlm(L, par.start, y = y, x = x, hessian = T) : NA/Inf replaced by maximum positive value ?? Does one of you know how to use the nlm method to estimate the covariance matrix (and mixing parameter ) of a bivariate normal distribution? Thank you in advance for your help and answers. Best regards, Gladys Hertzog Master student in environmental engineering at ETH Zurich -------------- next part -------------- A non-text attachment was scrubbed... Name: message_to_mailing_list.pdf Type: application/pdf Size: 200565 bytes Desc: message_to_mailing_list.pdf URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130607/3fcf8ef9/attachment.pdf>
Rolf Turner
2013-Jun-08 00:21 UTC
[R] estimation of covariance matrix of a bivariate normal distribution using maximization of the log-likelihood
See in line below. On 08/06/13 00:00, Hertzog Gladys wrote:> Dear all, > I?m new in R and I?m trying to estimate the covariance matrix of a bivariate normal distribution by maximizing the log-likelihood. The maximization really has to be performed with the non-linear minimization routine (nlm).Why?> The 2 means of the distribution are known and equal to 1.But you simulate your data using means 2.4 and 1.3.> I?ve already tried 2 different ways to compute this covariance but for each of them I obtained a lot of warnings and illogical values for the covariance matrix. > > In the first one, I defined the bivariate normal distribution with the command dmvnorm: > > x<-rnorm(6000, 2.4, 0.6) > x <- matrix(c(x), ncol=1)Why do you use matrix() and c() here? Totally unnecessary (and confusing).> y<-rlnorm(6000, 1.3,0.1) > y <- matrix(c(y), ncol=1) > XY <- cbind(x,y)To estimate the covariance matrix underlying the data XY you could simply do: M <- var(XY). If you want to impose the constraint that the true mean vector is c(1,1) you could do: mu <- c(1,1) xbar <- apply(XY,2,mean) M <- (5999/6000)*var(XY) + (xbar-mu)%*%t(xbar-mu) It would of course make more sense in terms of your simulated data to use: mu <- c(2.4,1.3) I haven't looked at your code below any further. Too much of a mess. cheers, Rolf Turner> > L <- function(par,x,y) { > return (-sum(log(par[4]*dmvnorm(XY, mean=c(1,1), sigma= matrix(c(par[1], par[1]*par[2]*par[3],par[1]*par[2]*par[3], par[2] ),nrow=2, ncol=2)) ))) > } > par.start<- c(0.5, 0.5 ,0.5 ,0.5) > result<-nlm(L,par.start,y=y,x=x, hessian=TRUE) > par.hat <- result$estimate > > par.hatIl y a eu 32 avis (utilisez warnings() pour les visionner) >> par.hat <- result$estimate >> par.hat > [1] 5.149919e+01 2.520721e+02 8.734212e-03 3.996771e+02 >> warnings() > Messages d'avis : > 1: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) : > production de NaN > 2: In nlm(L, par.start, y = y, x = x, hessian = TRUE) : > NA/Inf replaced by maximum positive value > 3: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) : > production de NaN > 4: In nlm(L, par.start, y = y, x = x, hessian = TRUE) : > NA/Inf replaced by maximum positive value > 5: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) : > production de NaN > 6: In nlm(L, par.start, y = y, x = x, hessian = TRUE) : > NA/Inf replaced by maximum positive value > 7: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) : > production de NaN > 8: In nlm(L, par.start, y = y, x = x, hessian = TRUE) : > NA/Inf replaced by maximum positive value > 9: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) : > production de NaN > 10: In nlm(L, par.start, y = y, x = x, hessian = TRUE) : > NA/Inf replaced by maximum positive value > 11: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) : > production de NaN > 12: In nlm(L, par.start, y = y, x = x, hessian = TRUE) : > NA/Inf replaced by maximum positive value > 13: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) : > production de NaN > > In the second one, I wrote step by step the bivariate normal distribution in order to have each parameter separately (not in a matrix) but it didn?t work as well: > > x<-rnorm(6000, 2.4, 0.6) > y<-rlnorm(6000, 1.3,0.1) > L <- function(par,x,y) { > return (-sum(log((1-par[4])*( (1/(2*pi*par[1]*par[2]*sqrt(1-par[3])))*exp( (-1/2*(1-par[3]^2))* ((y-1)/par[2])^2 +((x-1)/par[1])^2 - 2*(y-1)*(x-1)/(par[2]*par[1]) )) ))) > } > #par [1]= sigma_x , par [2]= sigma_y par [3]= rho_xy par[4] is a mixing parameter. The final step of my calculation will be to have a mixture of bivariate normal and log-normal distributions. > par.start<- c(0.5, 0.5 ,0.5 ,0.5) > result<-nlm(L,par.start,y=y,x=x, hessian=T) > par.hat <- result$estimate > par.ha > > When I run this script, I get always 50 advices like those below: > Messages d'avis : > 1: In sqrt(1 - par[3]) : production de NaN > 2: In nlm(L, par.start, y = y, x = x, hessian = T) : > NA/Inf replaced by maximum positive value > 3: In sqrt(1 - par[3]) : production de NaN > 4: In nlm(L, par.start, y = y, x = x, hessian = T) : > NA/Inf replaced by maximum positive value > 5: In sqrt(1 - par[3]) : production de NaN > 6: In nlm(L, par.start, y = y, x = x, hessian = T) : > NA/Inf replaced by maximum positive value > 7: In log((1 - par[4]) * ((1/(2 * pi * par[1] * par[2] * ... : production de NaN > 8: In nlm(L, par.start, y = y, x = x, hessian = T) : > NA/Inf replaced by maximum positive value > 9: In log((1 - par[4]) * ((1/(2 * pi * par[1] * par[2] * ... : production de NaN > 10: In nlm(L, par.start, y = y, x = x, hessian = T) : > NA/Inf replaced by maximum positive value > 11: In log((1 - par[4]) * ((1/(2 * pi * par[1] * par[2] * ... : production de NaN > 12: In nlm(L, par.start, y = y, x = x, hessian = T) : > NA/Inf replaced by maximum positive value > ?? > Does one of you know how to use the nlm method to estimate the covariance matrix (and mixing parameter ) of a bivariate normal distribution? > > Thank you in advance for your help and answers.