Hi, R users
I am trying to use EM algorithmto estimate a latent class model of discrete
choice. The basic model is a logit model which has two variables X=(X1,X2) and
the utility is defined as v = Beta*X. There are 3 classes of individuals each
has its own Beta values, so Beta is a 3*2 matrix. For each individual, (there
are 1000), he make a choice between two randomly generated choice alternatives
and has to choose one. The proportion of classes is set to (0.2,0.5,0.3). based
on this seting, the proportion parameters alpha always end up with one of them
goes to 0 in EM estimation. I checked the code and can not find what's the
problem. Maybe this is a bit off topic in R help, but I code it in R and hence
in hope that some one here can figure it out.
Plus, I have a normal mixture model with success EM result.
Thanks in advance.
Following is the code:
#========================
X = array(,dim=c(1000,2))#first alternative
b= c(2.3,0.7,0.3,0.7,0.1,0.8)
alpha = c(0.3,0.5,0.2)#proportion
X[,1] = runif(1000,min=5,max=20)
X[,2] = runif(1000,min=5,max=20)
##bi = 0.3
Y = array(,dim=c(1000,2))#second alternative
Y[,1] = runif(1000,min=5,max=20)
Y[,2] = runif(1000,min=5,max=20)
V11 = X[1:300,]%*%b[1:2]
V12 = Y[1:300,]%*%b[1:2]
V21 = X[301:800,]%*%b[3:4]
V22 = Y[301:800,]%*%b[3:4]
V31 = X[801:1000,]%*%b[5:6]
V32 = Y[801:1000,]%*%b[5:6]
V1 = rbind(V11,V21,V31)+rnorm(1000)
V2 = rbind(V12,V22,V32)+rnorm(1000)
oo = exp(V1)+exp(V2)
P1 = exp(V1)/oo
P2 = exp(V2)/oo
D = ifelse(V1>V2,0,1)
#second part of Q function
Q2 = function(Beta,H){
probs = logProbInd(Beta)
li = sum(H*probs)
return(li)
}
logProbInd=function(Beta){#X, Y, D values take as is from environment
dim(Beta) = c(2,3)
Beta = t(Beta)
probs = matrix(,nrow = 1000, ncol = 3)
for(i in 1:3){
v1 = X%*%Beta[(i-1)*2+1:2]
v2 = Y%*%Beta[(i-1)*2+1:2]
p1 = exp(v1)/(exp(v1)+exp(v2))
p2 = exp(v2)/(exp(v1)+exp(v2))
probs[,i] = ifelse (D==0,log(p1),log(p2))
}
return (probs)
}
#H [individuals][class]
E_step = function(alpha,Beta){#calc posterior of H
tmpH = matrix(,nrow = 1000,ncol =3)
lprobs = logProbInd(Beta)
for(i in 1:3){#classes
tmpH[,i] = alpha[i]*exp(lprobs[,i])
}
H = tmpH /apply(tmpH,1,sum)
return( H)
}
M_step = function(H,Beta){
#first part use direct estimation
aita = apply(H,2,sum)/1000
opt.c = optim(Beta,Q2,H=H,method="BFGS",control = list(fnscale = -1))
lik = opt.c$value
return(c(aita,opt.c$par,lik))
}
#EM loops
alf = c(0.33,0.33,0.33)
Bt = seq(0.1,0.6,by=0.1)
sc = rep(-8000,5)
i=1
while(T){
H= E_step(alpha=alf,Beta=Bt)
theta = M_step(H=H,Beta=Bt)
print(theta)
alf = theta[1:3]
Bt = theta[4:9]
#check convergence
sc[(i%%5) +1] = theta[10]
i=i+1
#if((theta[10] - mean(sc) ) < 0.0005) break
}
[[alternative HTML version deleted]]