As part of carrying out a complicated maximum likelihood estimation, I am trying to learn to program likelihoods in R. I started with a simple probit model but am unable to get the code to work. Any help or suggestions are most welcome. I give my code below: ************************************ mlogl <- function(mu, y, X) { n <- nrow(X) zeta <- X%*%mu llik <- 0 for (i in 1:n) { if (y[i]==1) llik <- llik + log(pnorm(zeta[i,], mean=0, sd=1)) else llik <- llik + log(1-pnorm(zeta[i,], mean=0, sd=1)) } return(-llik) } women <- read.table("~/R/Examples/Women13.txt", header=TRUE) # DATA # THE DATA SET CAN BE ACCESSED HERE # women <- read.table("http://wps.aw.com/wps/media/objects/2228/2281678/Data_Sets/ASCII/Women13.txt", header=TRUE) # I HAVE CHANGED THE NAMES OF THE VARIABLES # J is changed to "work" # M is changed to "mar" # S is changed to "school" attach(women) # THE VARIABLES OF USE ARE # work: binary dependent variable # mar: whether married or not # school: years of schooling mu.start <- c(3, -1.5, 10) data <- cbind(1, mar, school) out <- nlm(mlogl, mu.start, y=work, X=data) cat("Results", "\n") out$estimate detach(women) ************************************* When I try to run the code, this is what I get:> source("probit.R")Results Warning messages: 1: NA/Inf replaced by maximum positive value 2: NA/Inf replaced by maximum positive value 3: NA/Inf replaced by maximum positive value 4: NA/Inf replaced by maximum positive value Thanks in advance. Deepankar
Deepankar, Some general advice from a non-expert: Write your likelihoods without a for loop. This is important because the likelihood is evaluated multiple times in the maximization process and you don't want to be looping looping looping ... Always try multiple starting values Sometimes it helps to try different optimization functions (e.g., optim) Make sure your likelihood is correct. Check it against existing software if possible If necessary, simplify your model, to a single parameter even, and build it up from there Generate data under the model and see if your estimates are getting close to the truth Good luck, Stephen Rochester, Minn. USA On 4/18/07, Deepankar Basu <basu.15 at osu.edu> wrote:> As part of carrying out a complicated maximum likelihood estimation, I > am trying to learn to program likelihoods in R. I started with a simple > probit model but am unable to get the code to work. Any help or > suggestions are most welcome. I give my code below: > > ************************************ > mlogl <- function(mu, y, X) { > n <- nrow(X) > zeta <- X%*%mu > llik <- 0 > for (i in 1:n) { > if (y[i]==1) > llik <- llik + log(pnorm(zeta[i,], mean=0, sd=1)) > else > llik <- llik + log(1-pnorm(zeta[i,], mean=0, sd=1)) > } > return(-llik) > } > > women <- read.table("~/R/Examples/Women13.txt", header=TRUE) # DATA > > # THE DATA SET CAN BE ACCESSED HERE > # women <- > read.table("http://wps.aw.com/wps/media/objects/2228/2281678/Data_Sets/ASCII/Women13.txt", header=TRUE) > # I HAVE CHANGED THE NAMES OF THE VARIABLES > # J is changed to "work" > # M is changed to "mar" > # S is changed to "school" > > attach(women) > > # THE VARIABLES OF USE ARE > # work: binary dependent variable > # mar: whether married or not > # school: years of schooling > > mu.start <- c(3, -1.5, 10) > data <- cbind(1, mar, school) > out <- nlm(mlogl, mu.start, y=work, X=data) > cat("Results", "\n") > out$estimate > > detach(women) > > ************************************* > > When I try to run the code, this is what I get: > > > source("probit.R") > Results > Warning messages: > 1: NA/Inf replaced by maximum positive value > 2: NA/Inf replaced by maximum positive value > 3: NA/Inf replaced by maximum positive value > 4: NA/Inf replaced by maximum positive value > > Thanks in advance. > Deepankar > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
try the following: mlogl <- function (mu, y, X) { zeta <- as.vector(X %*% mu) y.logic <- as.logical(y) lgLik <- numeric(length(y)) lgLik[y.logic] <- pnorm(zeta[y.logic], log.p = TRUE) lgLik[!y.logic] <- pnorm(zeta[!y.logic], lower.tail = FALSE, log.p = TRUE) -sum(lgLik) } women <- read.table("http://wps.aw.com/wps/media/objects/2228/2281678/Data_Sets/ASCII/Women13.txt", header=TRUE) mu.start <- c(0, -1.5, 0.01) out <- optim(mu.start, mlogl, method = "BFGS", y = women$J, X = cbind(1, women$M, women$S)) out glm.fit(x = cbind(1, women$M, women$S), y = women$J, family = binomial(link = "probit"))$coefficients I hope it helps. Best, Dimitris ---- Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm ----- Original Message ----- From: "Deepankar Basu" <basu.15 at osu.edu> To: <r-help at stat.math.ethz.ch> Sent: Thursday, April 19, 2007 12:38 AM Subject: [R] Problems in programming a simple likelihood> As part of carrying out a complicated maximum likelihood estimation, > I > am trying to learn to program likelihoods in R. I started with a > simple > probit model but am unable to get the code to work. Any help or > suggestions are most welcome. I give my code below: > > ************************************ > mlogl <- function(mu, y, X) { > n <- nrow(X) > zeta <- X%*%mu > llik <- 0 > for (i in 1:n) { > if (y[i]==1) > llik <- llik + log(pnorm(zeta[i,], mean=0, sd=1)) > else > llik <- llik + log(1-pnorm(zeta[i,], mean=0, sd=1)) > } > return(-llik) > } > > women <- read.table("~/R/Examples/Women13.txt", header=TRUE) # DATA > > # THE DATA SET CAN BE ACCESSED HERE > # women <- > read.table("http://wps.aw.com/wps/media/objects/2228/2281678/Data_Sets/ASCII/Women13.txt", > header=TRUE) > # I HAVE CHANGED THE NAMES OF THE VARIABLES > # J is changed to "work" > # M is changed to "mar" > # S is changed to "school" > > attach(women) > > # THE VARIABLES OF USE ARE > # work: binary dependent variable > # mar: whether married or not > # school: years of schooling > > mu.start <- c(3, -1.5, 10) > data <- cbind(1, mar, school) > out <- nlm(mlogl, mu.start, y=work, X=data) > cat("Results", "\n") > out$estimate > > detach(women) > > ************************************* > > When I try to run the code, this is what I get: > >> source("probit.R") > Results > Warning messages: > 1: NA/Inf replaced by maximum positive value > 2: NA/Inf replaced by maximum positive value > 3: NA/Inf replaced by maximum positive value > 4: NA/Inf replaced by maximum positive value > > Thanks in advance. > Deepankar > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
Dimitris, Thanks a lot for your suggestion and also for suggestions that others have provided. I am learning fast and with the help of the R community will be able to get this going pretty soon. Of course, right now I am just trying to learn the language; so I am trying to program a standard probit model for which I know the answers. I could easily get the estimates with "glm". But I want to program the probit model to make sure I understand the subtleties of R. I tried running the alternate script that you provided, but it is still not working. Am I making some mistake? Here is what I get when I run your script (which shows that the maximum number of iterations was reached without convergence):> source("probit1.R") > summary(out)Length Class Mode par 3 -none- numeric value 1 -none- numeric counts 2 -none- numeric convergence 1 -none- numeric message 0 -none- NULL Here is the script (exactly what you had suggested): mlogl <- function (mu, y, X) { zeta <- as.vector(X %*% mu) y.logic <- as.logical(y) lgLik <- numeric(length(y)) lgLik[y.logic] <- pnorm(zeta[y.logic], log.p = TRUE) lgLik[!y.logic] <- pnorm(zeta[!y.logic], lower.tail = FALSE, log.p TRUE) -sum(lgLik) } women <- read.table("http://wps.aw.com/wps/media/objects/2228/2281678/Data_Sets/ASCII/Women13.txt", header=TRUE) mu.start <- c(-3, -1.5, 0.5) out <- optim(mu.start, mlogl, method = "BFGS", y = women$J, X = cbind(1, women$M, women$S)) out glm.fit(x = cbind(1, women$M, women$S), y = women$J, family binomial(link = "probit"))$coefficients Thanks. Deepankar On Thu, 2007-04-19 at 09:26 +0200, Dimitris Rizopoulos wrote:> try the following: > > mlogl <- function (mu, y, X) { > zeta <- as.vector(X %*% mu) > y.logic <- as.logical(y) > lgLik <- numeric(length(y)) > lgLik[y.logic] <- pnorm(zeta[y.logic], log.p = TRUE) > lgLik[!y.logic] <- pnorm(zeta[!y.logic], lower.tail = FALSE, log.p > = TRUE) > -sum(lgLik) > } > > women <- > read.table("http://wps.aw.com/wps/media/objects/2228/2281678/Data_Sets/ASCII/Women13.txt", > header=TRUE) > > mu.start <- c(0, -1.5, 0.01) > out <- optim(mu.start, mlogl, method = "BFGS", y = women$J, X = > cbind(1, women$M, women$S)) > out > > glm.fit(x = cbind(1, women$M, women$S), y = women$J, family = > binomial(link = "probit"))$coefficients > > > I hope it helps. > > Best, > Dimitris > > ---- > Dimitris Rizopoulos > Ph.D. Student > Biostatistical Centre > School of Public Health > Catholic University of Leuven > > Address: Kapucijnenvoer 35, Leuven, Belgium > Tel: +32/(0)16/336899 > Fax: +32/(0)16/337015 > Web: http://med.kuleuven.be/biostat/ > http://www.student.kuleuven.be/~m0390867/dimitris.htm > > > ----- Original Message ----- > From: "Deepankar Basu" <basu.15 at osu.edu> > To: <r-help at stat.math.ethz.ch> > Sent: Thursday, April 19, 2007 12:38 AM > Subject: [R] Problems in programming a simple likelihood > > > > As part of carrying out a complicated maximum likelihood estimation, > > I > > am trying to learn to program likelihoods in R. I started with a > > simple > > probit model but am unable to get the code to work. Any help or > > suggestions are most welcome. I give my code below: > > > > ************************************ > > mlogl <- function(mu, y, X) { > > n <- nrow(X) > > zeta <- X%*%mu > > llik <- 0 > > for (i in 1:n) { > > if (y[i]==1) > > llik <- llik + log(pnorm(zeta[i,], mean=0, sd=1)) > > else > > llik <- llik + log(1-pnorm(zeta[i,], mean=0, sd=1)) > > } > > return(-llik) > > } > > > > women <- read.table("~/R/Examples/Women13.txt", header=TRUE) # DATA > > > > # THE DATA SET CAN BE ACCESSED HERE > > # women <- > > read.table("http://wps.aw.com/wps/media/objects/2228/2281678/Data_Sets/ASCII/Women13.txt", > > header=TRUE) > > # I HAVE CHANGED THE NAMES OF THE VARIABLES > > # J is changed to "work" > > # M is changed to "mar" > > # S is changed to "school" > > > > attach(women) > > > > # THE VARIABLES OF USE ARE > > # work: binary dependent variable > > # mar: whether married or not > > # school: years of schooling > > > > mu.start <- c(3, -1.5, 10) > > data <- cbind(1, mar, school) > > out <- nlm(mlogl, mu.start, y=work, X=data) > > cat("Results", "\n") > > out$estimate > > > > detach(women) > > > > ************************************* > > > > When I try to run the code, this is what I get: > > > >> source("probit.R") > > Results > > Warning messages: > > 1: NA/Inf replaced by maximum positive value > > 2: NA/Inf replaced by maximum positive value > > 3: NA/Inf replaced by maximum positive value > > 4: NA/Inf replaced by maximum positive value > > > > Thanks in advance. > > Deepankar > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm >