Xu Jun
2012-Aug-01 00:57 UTC
[R] optim() for ordered logit model with parallel regression assumption
Dear R listers, I am learning the MLE utility optim() in R to program ordered logit models just as an exercise. See below I have three independent variables, x1, x2, and x3. Y is coded as ordinal from 1 to 4. Y is not yet a factor variable here. The ordered logit model satisfies the parallel regression assumption. The following codes can run through, but results were totally different from what I got using the polr function from the MASS package. I think it might be due to the way how the p is constructed in the ologit.lf function. I am relatively new to R, and here I would guess probably something related to the class type (numeric vs. matrix) or something along that line among those if conditions. Thanks in advance for any suggestion. Jun Xu, PhD Assistant Professor Department of Sociology Ball State University Muncie, IN 47306 #################################################################### library(foreign) readin <- read.dta("ordfile.dta", convert.factors=FALSE) myvars <- c("depvar", "x1", "x2", "x3") mydta <- readin[myvars] # remove all missings mydta <- na.omit(mydta) # theta is the parameter vector ologit.lf <- function(theta, y, X) { n <- nrow(X) k <- ncol(X) # b is the coefficient vector for independent variables b <- theta[1:k] # tau1 is cut-point 1 tau1 <- theta [k+1] # tau2 is cut-point 2 tau2 <- theta [k+2] # tau3 is cut-point 1 tau3 <- theta [k+3] if (y == 1){ p <- (1/(1+exp( - tau1 + X %*% b))) } if (y == 2) { p <- (1/(1+exp( - tau2 + X %*% b))) - (1/(1+exp( - tau1 + X %*% b))) } if (y == 3) { p <- (1/(1+exp( - tau3 + X %*% b))) - (1/(1+exp( - tau2 + X %*% b))) } if (y == 4) { p <- 1 - (1/(1+exp( - tau3 + X %*% b))) } - sum(p) } y <- as.numeric(mydta$depvar) X <- cbind (mydta$x1, mydta$x2, mydta$x3) runopt <- optim(rep(0, ncol(X)+4), ologit.lf, method="BFGS", hessian=T, y=y, X=X) There were 50 or more warnings (use warnings() to see the first 50)> warnings()1: In if (y == 1) { ... : the condition has length > 1 and only the first element will be used 2: In if (y == 2) { ... : the condition has length > 1 and only the first element will be used
R. Michael Weylandt
2012-Aug-01 02:07 UTC
[R] optim() for ordered logit model with parallel regression assumption
On Tue, Jul 31, 2012 at 7:57 PM, Xu Jun <junxu.r at gmail.com> wrote:> Dear R listers, > > I am learning the MLE utility optim() in R to program ordered logit > models just as an exercise. See below I have three independent > variables, x1, x2, and x3. Y is coded as ordinal from 1 to 4. Y is not > yet a factor variable here. The ordered logit model satisfies the > parallel regression assumption. The following codes can run through, > but results were totally different from what I got using the polr > function from the MASS package. I think it might be due to the way how > the p is constructed in the ologit.lf function. I am relatively new to > R, and here I would guess probably something related to the class type > (numeric vs. matrix) or something along that line among those if > conditions. Thanks in advance for any suggestion. > > Jun Xu, PhD > Assistant Professor > Department of Sociology > Ball State University > Muncie, IN 47306 > > > #################################################################### > > library(foreign) > readin <- read.dta("ordfile.dta", convert.factors=FALSE) > myvars <- c("depvar", "x1", "x2", "x3") > mydta <- readin[myvars] > # remove all missings > mydta <- na.omit(mydta) > > # theta is the parameter vector > ologit.lf <- function(theta, y, X) { > n <- nrow(X) > k <- ncol(X) > # b is the coefficient vector for independent variables > b <- theta[1:k] > # tau1 is cut-point 1 > tau1 <- theta [k+1] > # tau2 is cut-point 2 > tau2 <- theta [k+2] > # tau3 is cut-point 1 > tau3 <- theta [k+3] > > if (y == 1){ > p <- (1/(1+exp( - tau1 + X %*% b))) > } > if (y == 2) { > p <- (1/(1+exp( - tau2 + X %*% b))) - (1/(1+exp( - tau1 + X %*% b))) > } > if (y == 3) { > p <- (1/(1+exp( - tau3 + X %*% b))) - (1/(1+exp( - tau2 + X %*% b))) > } > if (y == 4) { > p <- 1 - (1/(1+exp( - tau3 + X %*% b))) > } > - sum(p) > } > > y <- as.numeric(mydta$depvar) > X <- cbind (mydta$x1, mydta$x2, mydta$x3) > runopt <- optim(rep(0, ncol(X)+4), ologit.lf, method="BFGS", > hessian=T, y=y, X=X) > > > There were 50 or more warnings (use warnings() to see the first 50) >> warnings() > > 1: In if (y == 1) { ... : > the condition has length > 1 and only the first element will be used > 2: In if (y == 2) { ... : > the condition has length > 1 and only the first element will be usedIt looks like you've got a fundamental problem in your if/else statements. if and else are control structures and so they operate on the whole program flow -- I think you want the ifelse() function here instead. Take a look at this example: x <- c(1, 5, 9) if(x < 3) {y <- x^2} else {y <- 2} z <- ifelse(x < 3, x^2, 2) print(x) print(y) print(z) See also ?ifelse Best, Michael> > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.