thr3ads.net - R help - [R] Checking the proportional odds assumption holds in an ordinal logistic regression using polr function [Nov 2014]

If this information is useful, please help other people find it:
Share via:

Charlotte Whitham

2014-Nov-25 16:21 UTC

[R] Checking the proportional odds assumption holds in an ordinal logistic regression using polr function

Dear list,

I have used the ?polr? function in the MASS package to run an ordinal logistic
regression for an ordinal categorical response variable with 15 continuous
explanatory variables.
I have used the code (shown below) to check that my model meets the proportional
odds assumption following advice provided at
(http://www.ats.ucla.edu/stat/r/dae/ologit.htm) ? which has been extremely
helpful, thank you to the authors! However, I?m a little worried about the
output implying that not only are the coefficients across various cutpoints
similar, but they are exactly the same (see graphic below).

Here is the code I used (and see attached for the output graphic)

FGV1b<-data.frame(FG1_val_cat=factor(FGV1b[,"FG1_val_cat"]),scale(FGV1[,c("X","Y","Slope","Ele","Aspect","Prox_to_for_FG","Prox_to_for_mL","Prox_to_nat_border","Prox_to_village","Prox_to_roads","Prox_to_rivers","Prox_to_waterFG","Prox_to_watermL","Prox_to_core","Prox_to_NR","PCA1","PCA2","PCA3")]))

b<-polr(FGV1b$FG1_val_cat ~ FGV1b$X + FGV1b$Y + FGV1b$Slope + FGV1b$Ele +
FGV1b$Aspect + FGV1b$Prox_to_for_FG + FGV1b$Prox_to_for_mL +
FGV1b$Prox_to_nat_border + FGV1b$Prox_to_village + FGV1b$Prox_to_roads +
FGV1b$Prox_to_rivers + FGV1b$Prox_to_waterFG + FGV1b$Prox_to_watermL +
FGV1b$Prox_to_core + FGV1b$Prox_to_NR, data = FGV1b, Hess=TRUE)

#Checking the assumption. So the following code will estimate the values to be
graphed. First it shows us #the logit transformations of the probabilities of
being greater than or equal to each value of the target #variable

FGV1b$FG1_val_cat<-as.numeric(FGV1b$FG1_val_cat) 

sf <- function(y) {

  c('VC>=1' = qlogis(mean(FGV1b$FG1_val_cat >= 1)),

    'VC>=2' = qlogis(mean(FGV1b$FG1_val_cat >= 2)),

    'VC>=3' = qlogis(mean(FGV1b$FG1_val_cat >= 3)),

    'VC>=4' = qlogis(mean(FGV1b$FG1_val_cat >= 4)),

    'VC>=5' = qlogis(mean(FGV1b$FG1_val_cat >= 5)),

    'VC>=6' = qlogis(mean(FGV1b$FG1_val_cat >= 6)),

    'VC>=7' = qlogis(mean(FGV1b$FG1_val_cat >= 7)),

    'VC>=8' = qlogis(mean(FGV1b$FG1_val_cat >= 8)))

}

  (t <- with(FGV1b, summary(as.numeric(FGV1b$FG1_val_cat) ~ FGV1b$X + FGV1b$Y
+ FGV1b$Slope + FGV1b$Ele + FGV1b$Aspect + FGV1b$Prox_to_for_FG +
FGV1b$Prox_to_for_mL + FGV1b$Prox_to_nat_border + FGV1b$Prox_to_village +
FGV1b$Prox_to_roads + FGV1b$Prox_to_rivers + FGV1b$Prox_to_waterFG +
FGV1b$Prox_to_watermL + FGV1b$Prox_to_core + FGV1b$Prox_to_NR, fun=sf)))

 

#The table displays the (linear) predicted values we would get if we regressed
our

#dependent variable on our predictor variables one at a time, without the
parallel slopes

#assumption. So now, we can run a series of binary logistic regressions with
varying cutpoints

#on the dependent variable to check the equality of coefficients across
cutpoints

par(mfrow=c(1,1))

plot(t, which=1:8, pch=1:8, xlab='logit', main=' ',
xlim=range(s[,7:8]))

 

Apologies that I am no statistics expert and perhaps I am missing something
obvious here. However, I have spent a long time trying to figure out if there is
a problem in how I tested the model assumption and also trying to figure out
other ways to run the same kind of model.

 For example, I read in many help mailing lists that others use the vglm
function (in the VGAM package) and the lrm function (in the rms package) (for
example see here: 
http://stats.stackexchange.com/questions/25988/proportional-odds-assumption-in-ordinal-logistic-regression-in-r-with-the-packag).
I have tried to run the same models but am continuously coming up against
warnings and errors.

 For example, when I try to fit the vglm model with the ?parallel=FALSE?
argument (as the previous link mentions is important for testing the
proportional odds assumption), I encounter the following error:

 

Error in lm.fit(X.vlm, y = z.vlm, ...) : NA/NaN/Inf in 'y'

In addition: Warning message:

In Deviance.categorical.data.vgam(mu = mu, y = y, w = w, residuals = residuals, 
:

  fitted values close to 0 or 1

 

And after many searches for help, I can?t seem to find a way to fix this
problem.

I would like to ask please if there is anyone who might understand and be able
to explain to me why the graph I produced above looks as it does. If indeed it
means that something isn?t right, could you please help me find a way to test
the proportional odds assumption when just using the polr function. Or if that
is just not possible, then I will resort to trying to use the vglm function, but
would then need some help to explain why I keep getting the error given above.

I hope this is clear. Please do let me know if I should provide some more
information that would help address this query.

NOTE: As a background, there are 1000 datapoints here, which are actually
location points across a study area. I am looking to see if there are any
relationships between the categorical response variable and these 15 explanatory
variables. All of those 15 explanatory variables are spatial characteristics
(for example, elevation, x-y coordinates, proximity to forest etc.). The 1000
datapoints were randomly allocated using a GIS, but I took a stratified sampling
approach. I made sure that 125 points were randomly chosen within each of the 8
different categorical response levels. I hope this information is also helpful.

I am extremely grateful to anyone who could please give me some guidance with
this.

Thank you very much for your time,

Charlie

Rune Haubo

2014-Nov-26 14:08 UTC

head link

[R] Checking the proportional odds assumption holds in an ordinal logistic regression using polr function

Dear Charlie,

I admit that I haven't read your email closely, but here is a way to
test for non-proportional odds using the ordinal package (warning:
self-promotion) using the wine data set also from the ordinal package.
There is more information in the package vignettes

Hope this is something you can use.
Cheers,
Rune
> library(ordinal)
> ## Fit model:
> fm <- clm(rating ~ temp + contact, data=wine)
> summary(fm)formula: rating ~ temp + contact
data:    wine

 link  threshold nobs logLik AIC    niter max.grad cond.H
 logit flexible  72   -86.49 184.98 6(0)  4.64e-15 2.7e+01

Coefficients:
           Estimate Std. Error z value Pr(>|z|)
tempwarm     2.5031     0.5287   4.735 2.19e-06 ***
contactyes   1.5278     0.4766   3.205  0.00135 **
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Threshold coefficients:
    Estimate Std. Error z value
1|2  -1.3444     0.5171  -2.600
2|3   1.2508     0.4379   2.857
3|4   3.4669     0.5978   5.800
4|5   5.0064     0.7309   6.850> ## Model with non-proportional odds for contact:
> fm2 <- clm(rating ~ temp, nominal=~contact, data=wine)
> ## Likelihood ratio test of non-proportional odds:
> anova(fm, fm2)Likelihood ratio tests of cumulative link models:

    formula:                nominal: link: threshold:
fm  rating ~ temp + contact ~1       logit flexible
fm2 rating ~ temp           ~contact logit flexible

    no.par    AIC  logLik LR.stat df Pr(>Chisq)
fm       6 184.98 -86.492
fm2      9 190.42 -86.209  0.5667  3      0.904> ## Automatic tests of non-proportional odds for all varibles:
> nominal_test(fm)Tests of nominal effects

formula: rating ~ temp + contact
        Df  logLik    AIC    LRT Pr(>Chi)
<none>     -86.492 184.98
temp     3 -84.904 187.81 3.1750   0.3654
contact  3 -86.209 190.42 0.5667   0.9040

On 25 November 2014 at 17:21, Charlotte Whitham
<charlotte.whitham at gmail.com> wrote:> Dear list,
>
> I have used the ?polr? function in the MASS package to run an ordinal
logistic regression for an ordinal categorical response variable with 15
continuous explanatory variables.
> I have used the code (shown below) to check that my model meets the
proportional odds assumption following advice provided at
(http://www.ats.ucla.edu/stat/r/dae/ologit.htm) ? which has been extremely
helpful, thank you to the authors! However, I?m a little worried about the
output implying that not only are the coefficients across various cutpoints
similar, but they are exactly the same (see graphic below).
>
> Here is the code I used (and see attached for the output graphic)
>
>
FGV1b<-data.frame(FG1_val_cat=factor(FGV1b[,"FG1_val_cat"]),scale(FGV1[,c("X","Y","Slope","Ele","Aspect","Prox_to_for_FG","Prox_to_for_mL","Prox_to_nat_border","Prox_to_village","Prox_to_roads","Prox_to_rivers","Prox_to_waterFG","Prox_to_watermL","Prox_to_core","Prox_to_NR","PCA1","PCA2","PCA3")]))
>
> b<-polr(FGV1b$FG1_val_cat ~ FGV1b$X + FGV1b$Y + FGV1b$Slope + FGV1b$Ele
+ FGV1b$Aspect + FGV1b$Prox_to_for_FG + FGV1b$Prox_to_for_mL +
FGV1b$Prox_to_nat_border + FGV1b$Prox_to_village + FGV1b$Prox_to_roads +
FGV1b$Prox_to_rivers + FGV1b$Prox_to_waterFG + FGV1b$Prox_to_watermL +
FGV1b$Prox_to_core + FGV1b$Prox_to_NR, data = FGV1b, Hess=TRUE)
>
> #Checking the assumption. So the following code will estimate the values to
be graphed. First it shows us #the logit transformations of the probabilities of
being greater than or equal to each value of the target #variable
>
> FGV1b$FG1_val_cat<-as.numeric(FGV1b$FG1_val_cat)
>
> sf <- function(y) {
>
>   c('VC>=1' = qlogis(mean(FGV1b$FG1_val_cat >= 1)),
>
>     'VC>=2' = qlogis(mean(FGV1b$FG1_val_cat >= 2)),
>
>     'VC>=3' = qlogis(mean(FGV1b$FG1_val_cat >= 3)),
>
>     'VC>=4' = qlogis(mean(FGV1b$FG1_val_cat >= 4)),
>
>     'VC>=5' = qlogis(mean(FGV1b$FG1_val_cat >= 5)),
>
>     'VC>=6' = qlogis(mean(FGV1b$FG1_val_cat >= 6)),
>
>     'VC>=7' = qlogis(mean(FGV1b$FG1_val_cat >= 7)),
>
>     'VC>=8' = qlogis(mean(FGV1b$FG1_val_cat >= 8)))
>
> }
>
>   (t <- with(FGV1b, summary(as.numeric(FGV1b$FG1_val_cat) ~ FGV1b$X +
FGV1b$Y + FGV1b$Slope + FGV1b$Ele + FGV1b$Aspect + FGV1b$Prox_to_for_FG +
FGV1b$Prox_to_for_mL + FGV1b$Prox_to_nat_border + FGV1b$Prox_to_village +
FGV1b$Prox_to_roads + FGV1b$Prox_to_rivers + FGV1b$Prox_to_waterFG +
FGV1b$Prox_to_watermL + FGV1b$Prox_to_core + FGV1b$Prox_to_NR, fun=sf)))
>
>
>
> #The table displays the (linear) predicted values we would get if we
regressed our
>
> #dependent variable on our predictor variables one at a time, without the
parallel slopes
>
> #assumption. So now, we can run a series of binary logistic regressions
with varying cutpoints
>
> #on the dependent variable to check the equality of coefficients across
cutpoints
>
> par(mfrow=c(1,1))
>
> plot(t, which=1:8, pch=1:8, xlab='logit', main=' ',
xlim=range(s[,7:8]))
>
>
>
> Apologies that I am no statistics expert and perhaps I am missing something
obvious here. However, I have spent a long time trying to figure out if there is
a problem in how I tested the model assumption and also trying to figure out
other ways to run the same kind of model.
>
>  For example, I read in many help mailing lists that others use the vglm
function (in the VGAM package) and the lrm function (in the rms package) (for
example see here: 
http://stats.stackexchange.com/questions/25988/proportional-odds-assumption-in-ordinal-logistic-regression-in-r-with-the-packag).
I have tried to run the same models but am continuously coming up against
warnings and errors.
>
>  For example, when I try to fit the vglm model with the ?parallel=FALSE?
argument (as the previous link mentions is important for testing the
proportional odds assumption), I encounter the following error:
>
>
>
> Error in lm.fit(X.vlm, y = z.vlm, ...) : NA/NaN/Inf in 'y'
>
> In addition: Warning message:
>
> In Deviance.categorical.data.vgam(mu = mu, y = y, w = w, residuals =
residuals,  :
>
>   fitted values close to 0 or 1
>
>
>
> And after many searches for help, I can?t seem to find a way to fix this
problem.
>
> I would like to ask please if there is anyone who might understand and be
able to explain to me why the graph I produced above looks as it does. If indeed
it means that something isn?t right, could you please help me find a way to test
the proportional odds assumption when just using the polr function. Or if that
is just not possible, then I will resort to trying to use the vglm function, but
would then need some help to explain why I keep getting the error given above.
>
> I hope this is clear. Please do let me know if I should provide some more
information that would help address this query.
>
> NOTE: As a background, there are 1000 datapoints here, which are actually
location points across a study area. I am looking to see if there are any
relationships between the categorical response variable and these 15 explanatory
variables. All of those 15 explanatory variables are spatial characteristics
(for example, elevation, x-y coordinates, proximity to forest etc.). The 1000
datapoints were randomly allocated using a GIS, but I took a stratified sampling
approach. I made sure that 125 points were randomly chosen within each of the 8
different categorical response levels. I hope this information is also helpful.
>
> I am extremely grateful to anyone who could please give me some guidance
with this.
>
> Thank you very much for your time,
>
> Charlie
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Charlotte Whitham

2014-Nov-26 16:55 UTC

head link

[R] Checking the proportional odds assumption holds in an ordinal logistic regression using polr function

Dear Rune,

Thank you for your prompt reply and it looks like the ordinal package could be
the answer I was looking for!

If you don't mind, I'd also like to know please what to do if the tests
show the proportional odds assumption is NOT met. (Unfortunately I notice
effects from almost all variables that breach the proportional odds assumption
in my dataset)

Would you recommend a multinomial logistic model? Or re-scaling of the data?

Thank you for your time,
Best wishes,

Charlie

On 26 Nov 2014, at 14:08, Rune Haubo <rune.haubo at gmail.com> wrote:
> Dear Charlie,
> 
> I admit that I haven't read your email closely, but here is a way to
> test for non-proportional odds using the ordinal package (warning:
> self-promotion) using the wine data set also from the ordinal package.
> There is more information in the package vignettes
> 
> Hope this is something you can use.
> Cheers,
> Rune
> 
>> library(ordinal)
>> ## Fit model:
>> fm <- clm(rating ~ temp + contact, data=wine)
>> summary(fm)
> formula: rating ~ temp + contact
> data:    wine
> 
> link  threshold nobs logLik AIC    niter max.grad cond.H
> logit flexible  72   -86.49 184.98 6(0)  4.64e-15 2.7e+01
> 
> Coefficients:
>           Estimate Std. Error z value Pr(>|z|)
> tempwarm     2.5031     0.5287   4.735 2.19e-06 ***
> contactyes   1.5278     0.4766   3.205  0.00135 **
> ---
> Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
> 
> Threshold coefficients:
>    Estimate Std. Error z value
> 1|2  -1.3444     0.5171  -2.600
> 2|3   1.2508     0.4379   2.857
> 3|4   3.4669     0.5978   5.800
> 4|5   5.0064     0.7309   6.850
>> ## Model with non-proportional odds for contact:
>> fm2 <- clm(rating ~ temp, nominal=~contact, data=wine)
>> ## Likelihood ratio test of non-proportional odds:
>> anova(fm, fm2)
> Likelihood ratio tests of cumulative link models:
> 
>    formula:                nominal: link: threshold:
> fm  rating ~ temp + contact ~1       logit flexible
> fm2 rating ~ temp           ~contact logit flexible
> 
>    no.par    AIC  logLik LR.stat df Pr(>Chisq)
> fm       6 184.98 -86.492
> fm2      9 190.42 -86.209  0.5667  3      0.904
>> ## Automatic tests of non-proportional odds for all varibles:
>> nominal_test(fm)
> Tests of nominal effects
> 
> formula: rating ~ temp + contact
>        Df  logLik    AIC    LRT Pr(>Chi)
> <none>     -86.492 184.98
> temp     3 -84.904 187.81 3.1750   0.3654
> contact  3 -86.209 190.42 0.5667   0.9040
> 
> On 25 November 2014 at 17:21, Charlotte Whitham
> <charlotte.whitham at gmail.com> wrote:
>> Dear list,
>> 
>> I have used the ?polr? function in the MASS package to run an ordinal
logistic regression for an ordinal categorical response variable with 15
continuous explanatory variables.
>> I have used the code (shown below) to check that my model meets the
proportional odds assumption following advice provided at
(http://www.ats.ucla.edu/stat/r/dae/ologit.htm) ? which has been extremely
helpful, thank you to the authors! However, I?m a little worried about the
output implying that not only are the coefficients across various cutpoints
similar, but they are exactly the same (see graphic below).
>> 
>> Here is the code I used (and see attached for the output graphic)
>> 
>>
FGV1b<-data.frame(FG1_val_cat=factor(FGV1b[,"FG1_val_cat"]),scale(FGV1[,c("X","Y","Slope","Ele","Aspect","Prox_to_for_FG","Prox_to_for_mL","Prox_to_nat_border","Prox_to_village","Prox_to_roads","Prox_to_rivers","Prox_to_waterFG","Prox_to_watermL","Prox_to_core","Prox_to_NR","PCA1","PCA2","PCA3")]))
>> 
>> b<-polr(FGV1b$FG1_val_cat ~ FGV1b$X + FGV1b$Y + FGV1b$Slope +
FGV1b$Ele + FGV1b$Aspect + FGV1b$Prox_to_for_FG + FGV1b$Prox_to_for_mL +
FGV1b$Prox_to_nat_border + FGV1b$Prox_to_village + FGV1b$Prox_to_roads +
FGV1b$Prox_to_rivers + FGV1b$Prox_to_waterFG + FGV1b$Prox_to_watermL +
FGV1b$Prox_to_core + FGV1b$Prox_to_NR, data = FGV1b, Hess=TRUE)
>> 
>> #Checking the assumption. So the following code will estimate the
values to be graphed. First it shows us #the logit transformations of the
probabilities of being greater than or equal to each value of the target
#variable
>> 
>> FGV1b$FG1_val_cat<-as.numeric(FGV1b$FG1_val_cat)
>> 
>> sf <- function(y) {
>> 
>>  c('VC>=1' = qlogis(mean(FGV1b$FG1_val_cat >= 1)),
>> 
>>    'VC>=2' = qlogis(mean(FGV1b$FG1_val_cat >= 2)),
>> 
>>    'VC>=3' = qlogis(mean(FGV1b$FG1_val_cat >= 3)),
>> 
>>    'VC>=4' = qlogis(mean(FGV1b$FG1_val_cat >= 4)),
>> 
>>    'VC>=5' = qlogis(mean(FGV1b$FG1_val_cat >= 5)),
>> 
>>    'VC>=6' = qlogis(mean(FGV1b$FG1_val_cat >= 6)),
>> 
>>    'VC>=7' = qlogis(mean(FGV1b$FG1_val_cat >= 7)),
>> 
>>    'VC>=8' = qlogis(mean(FGV1b$FG1_val_cat >= 8)))
>> 
>> }
>> 
>>  (t <- with(FGV1b, summary(as.numeric(FGV1b$FG1_val_cat) ~ FGV1b$X +
FGV1b$Y + FGV1b$Slope + FGV1b$Ele + FGV1b$Aspect + FGV1b$Prox_to_for_FG +
FGV1b$Prox_to_for_mL + FGV1b$Prox_to_nat_border + FGV1b$Prox_to_village +
FGV1b$Prox_to_roads + FGV1b$Prox_to_rivers + FGV1b$Prox_to_waterFG +
FGV1b$Prox_to_watermL + FGV1b$Prox_to_core + FGV1b$Prox_to_NR, fun=sf)))
>> 
>> 
>> 
>> #The table displays the (linear) predicted values we would get if we
regressed our
>> 
>> #dependent variable on our predictor variables one at a time, without
the parallel slopes
>> 
>> #assumption. So now, we can run a series of binary logistic regressions
with varying cutpoints
>> 
>> #on the dependent variable to check the equality of coefficients across
cutpoints
>> 
>> par(mfrow=c(1,1))
>> 
>> plot(t, which=1:8, pch=1:8, xlab='logit', main=' ',
xlim=range(s[,7:8]))
>> 
>> 
>> 
>> Apologies that I am no statistics expert and perhaps I am missing
something obvious here. However, I have spent a long time trying to figure out
if there is a problem in how I tested the model assumption and also trying to
figure out other ways to run the same kind of model.
>> 
>> For example, I read in many help mailing lists that others use the vglm
function (in the VGAM package) and the lrm function (in the rms package) (for
example see here: 
http://stats.stackexchange.com/questions/25988/proportional-odds-assumption-in-ordinal-logistic-regression-in-r-with-the-packag).
I have tried to run the same models but am continuously coming up against
warnings and errors.
>> 
>> For example, when I try to fit the vglm model with the ?parallel=FALSE?
argument (as the previous link mentions is important for testing the
proportional odds assumption), I encounter the following error:
>> 
>> 
>> 
>> Error in lm.fit(X.vlm, y = z.vlm, ...) : NA/NaN/Inf in 'y'
>> 
>> In addition: Warning message:
>> 
>> In Deviance.categorical.data.vgam(mu = mu, y = y, w = w, residuals =
residuals,  :
>> 
>>  fitted values close to 0 or 1
>> 
>> 
>> 
>> And after many searches for help, I can?t seem to find a way to fix
this problem.
>> 
>> I would like to ask please if there is anyone who might understand and
be able to explain to me why the graph I produced above looks as it does. If
indeed it means that something isn?t right, could you please help me find a way
to test the proportional odds assumption when just using the polr function. Or
if that is just not possible, then I will resort to trying to use the vglm
function, but would then need some help to explain why I keep getting the error
given above.
>> 
>> I hope this is clear. Please do let me know if I should provide some
more information that would help address this query.
>> 
>> NOTE: As a background, there are 1000 datapoints here, which are
actually location points across a study area. I am looking to see if there are
any relationships between the categorical response variable and these 15
explanatory variables. All of those 15 explanatory variables are spatial
characteristics (for example, elevation, x-y coordinates, proximity to forest
etc.). The 1000 datapoints were randomly allocated using a GIS, but I took a
stratified sampling approach. I made sure that 125 points were randomly chosen
within each of the 8 different categorical response levels. I hope this
information is also helpful.
>> 
>> I am extremely grateful to anyone who could please give me some
guidance with this.
>> 
>> Thank you very much for your time,
>> 
>> Charlie
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

R help - Nov 2014 - Checking the proportional odds assumption holds in an ordinal logistic regression using polr function

[R] Checking the proportional odds assumption holds in an ordinal logistic regression using polr function

[R] Checking the proportional odds assumption holds in an ordinal logistic regression using polr function

[R] Checking the proportional odds assumption holds in an ordinal logistic regression using polr function