thr3ads.net - R help - [R] bugs and misfeatures in polr(MASS).... fixed! [Nov 2010]

If this information is useful, please help other people find it:
Share via:

Mr Timothy James BENHAM

2010-Nov-03 00:03 UTC

[R] bugs and misfeatures in polr(MASS).... fixed!

In polr.R the (several) functions gmin and fmin contain the code
>         theta <- beta[pc + 1L:q]
>         gamm <- c(-100, cumsum(c(theta[1L], exp(theta[-1L]))), 100)
That's bad. There's no reason to suppose beta[pc+1L] is larger than
-100 or that the cumulative sum is smaller than 100. For practical
datasets those assumptions are frequently violated, causing the
optimization to fail. A work-around is to center the explanatory
variables. This helps keep the zetas small.

The correct approach is to use the values -Inf and Inf as the first
and last cut points. The functions plogis, dnorm, etc all behave
correctly when the input is one of these values. The dgumbel function
does not, returning NaN for -Inf. Correct this as follows

dgumbel <- function (x, loc = 0, scale = 1, log = FALSE)
{
    x <- (x - loc)/scale
    d <- log(1/scale) - x - exp(-x)
    d[is.nan(d)] <- -Inf                # -tjb
    if (!log) exp(d) else d
}

The documentation states
>    start: initial values for the parameters.  This is in the format
>           'c(coefficients, zeta)': see the Values section. 
The relevant code is
>       s0 <- if(pc > 0) c(start[seq_len(pc+1)],
diff(start[-seq_len(pc)]))
>       else c(start[1L], diff(start))
This doesn't take the logs of the differences as required to repose
the zetas into the form used in the optimization. The fix is
obvious. polr.fit has the same problem which is responsible for
summary() frequently failing when the Hessian is not provided.

I'm not convinced the t values reported by summary() are
reliable. I've noticed that a one dimensional linear transformation
the independent variables can cause the reported t values to change by
a factor of more than 100, which doesn't seem right.

        --tjb

tjb

2010-Nov-06 22:00 UTC

head link

[R] bugs and misfeatures in polr(MASS).... fixed!

The start value generation code in polr is also known to fail quite
frequently. For example, against the Iris data as recently posted to this
list by blackscorpio (	Sep 6, 2010).
> polr(Species~Sepal_Length+Sepal_Width+Petal_Length+Petal_Width,data=iris)Error in polr(Species ~ Sepal_Length + Sepal_Width + Petal_Length +
Petal_Width,  : 
  attempt to find suitable starting values failed
In addition: Warning messages:
1: In glm.fit(X, y1, wt, family = binomial(), offset = offset) :
  algorithm did not converge
2: In glm.fit(X, y1, wt, family = binomial(), offset = offset) :
  fitted probabilities numerically 0 or 1 occurred

I suggest that simply setting the coefficients beta to zero and the
cutpoints zeta to sensible values will always produce a feasible starting
point for non-pathological data. Here is my code that does this:

    if(missing(start)) {
      # try something that should always work -tjb
      u <- as.integer(table(y))
      u <- (cumsum(u)/sum(u))[1:q]
      zetas <-
         switch(method,
                "logistic"= qlogis(u),
                "probit"=   qnorm(u),
                "cauchit"=  qcauchy(u),
                "cloglog"=  -log(-log(u)) )
      s0 <- c(rep(0,pc),zetas[1],log(diff(zetas)))

Using this start code the problem is not manifested. 
> source('fixed-polr.R')
> polr(Species~Sepal_Length+Sepal_Width+Petal_Length+Petal_Width,data=iris)Call:
polr(formula = Species ~ Sepal_Length + Sepal_Width + Petal_Length + 
    Petal_Width, data = iris)

Coefficients:
Sepal_Length  Sepal_Width Petal_Length  Petal_Width 
   -2.466724    -6.671515     9.431689    18.270058 

Intercepts:
   setosa|versicolor versicolor|virginica 
            4.080189            42.639320 

Residual Deviance: 11.89857 
AIC: 23.89857 

My change would also likely fix the problem reported by Kevin Coombes on May
6, 2010.
-- 
View this message in context:
http://r.789695.n4.nabble.com/bugs-and-misfeatures-in-polr-MASS-fixed-tp3024677p3030405.html
Sent from the R help mailing list archive at Nabble.com.

tjb

2010-Nov-06 22:12 UTC

head link

[R] bugs and misfeatures in polr(MASS).... fixed!

Note that that the enhancements in my original post solve the unresolved
problem of Chaehyung Ahn (22 Mar 2005) whose data I reproduce:

y,x,lx
0,3.2e-02,-1.49485
0,3.2e-02,-1.49485
0,1.0e-01,-1.00000
0,1.0e-01,-1.00000
0,3.2e-01,-0.49485
0,3.2e-01,-0.49485
1,1.0e+00,0.00000
0,1.0e+00,0.00000
1,3.2e+00,0.50515
1,3.2e+00,0.50515
0,1.0e+01,1.00000
1,1.0e+01,1.00000
1,3.2e+01,1.50515
2,3.2e+01,1.50515
2,1.0e+02,2.00000
1,1.0e+02,2.00000
2,3.2e+02,2.50515
1,3.2e+02,2.50515
2,1.0e+03,3.00000
2,1.0e+03,3.00000

Using the MASS version we get
> ahn$y<-as.factor(ahn$y)
> summary(polr(y~lx,data=ahn))
Re-fitting to get Hessian

Error in optim(s0, fmin, gmin, method = "BFGS", hessian = Hess, ...) :
  initial value in 'vmmin' is not finite

Whereas,
> source('fixed-polr.R')
> summary(polr(y~lx,data=ahn))
Re-fitting to get Hessian

Call:
polr(formula = y ~ lx, data = ahn)

Coefficients:
   Value Std. Error t value
lx 2.421     0.8146   2.971

Intercepts:
    Value  Std. Error t value
0|1 0.5865 0.8118     0.7224 
1|2 4.8966 1.7422     2.8106 

Residual Deviance: 20.43286 
AIC: 26.43286 

-- 
View this message in context:
http://r.789695.n4.nabble.com/bugs-and-misfeatures-in-polr-MASS-fixed-tp3024677p3030411.html
Sent from the R help mailing list archive at Nabble.com.

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Nov 2010 - bugs and misfeatures in polr(MASS).... fixed!

[R] bugs and misfeatures in polr(MASS).... fixed!

[R] bugs and misfeatures in polr(MASS).... fixed!

[R] bugs and misfeatures in polr(MASS).... fixed!

Possibly Parallel Threads