Hi, I am trying to test the proportional-odds model using the "polr" function in the MASS library with the dataset of "housing" contained in the MASS book ("Sat" (factor: low, medium, high) is the dependent variable, "Infl" (low, medium, high), "Type" (tower, apartment, atrium, terrace) and "Cont" (low, high) are the predictor variables (factors). And I have some questions, hope someone could help me out. The following commands are from the MASS book as well. > house.plr<-polr(Sat~Infl+Type+Cont,data=housing,weights=Freq)> summary(house.plr)Re-fitting to get HessianCall:polr(formula = Sat ~ Infl + Type + Cont, data = housing, weights = Freq)Coefficients: Value Std. Error t value InflMedium 0.5663922 0.10465276 5.412109 InflHigh 1.2888137 0.12715609 10.135682 TypeApartment -0.5723552 0.11923800 -4.800107 TypeAtrium -0.3661907 0.15517331 -2.359882 TypeTerrace -1.0910073 0.15148595 -7.202036 ContHigh 0.3602803 0.09553577 3.771156 Intercepts: Value Std. Error t value Low|Medium -0.4961 0.1248 -3.9740 Medium|High 0.6907 0.1255 5.5049 Residual Deviance: 3479.149 AIC: 3495.149 I also tried to predict the probabilities of the 3 categories of "Sat" using the predict function: > hnames<-lapply(housing[,-5],levels)> house.pr1<-predict(house.plr,expand.grid(hnames[-1]),type="probs")> cbind(expand.grid(hnames[-1]),round(house.pr1,2))Infl Type Cont Low Medium High 1 Low Tower Low 0.38 0.29 0.33 2 Medium Tower Low 0.26 0.27 0.47 3 High Tower Low 0.14 0.21 0.65 4 Low Apartment Low 0.52 0.26 0.22 5 Medium Apartment Low 0.38 0.29 0.33 6 High Apartment Low 0.23 0.26 0.51 7 Low Atrium Low 0.47 0.27 0.26 8 Medium Atrium Low 0.33 0.29 0.38 9 High Atrium Low 0.19 0.25 0.56 10 Low Terrace Low 0.64 0.21 0.14 11 Medium Terrace Low 0.51 0.26 0.23 12 High Terrace Low 0.33 0.29 0.38 13 Low Tower High 0.30 0.28 0.42 14 Medium Tower High 0.19 0.25 0.56 15 High Tower High 0.10 0.17 0.72 16 Low Apartment High 0.43 0.28 0.29 17 Medium Apartment High 0.30 0.28 0.42 18 High Apartment High 0.17 0.23 0.60 19 Low Atrium High 0.38 0.29 0.33 20 Medium Atrium High 0.26 0.27 0.47 21 High Atrium High 0.14 0.21 0.64 22 Low Terrace High 0.56 0.25 0.19 23 Medium Terrace High 0.42 0.28 0.30 24 High Terrace High 0.26 0.27 0.47 what I am confused is that when I tried to reproduce these predicted probabilities using the model coefficients, I can sometimes get different results: for example, for low Infl, Type tower, cont low, logit(P(low))=P(low)/(1-P(low))=-0.4961, solve for P(low)=exp(-0.4961)/(1+exp(-0.4961))=0.38;andlogit(P(low, medium))=P(low,medium)/(1-P(low,medium))=0.6907, solve for P(low,medium)=exp(0.6907)/(1+exp(0.6907))=0.67, which is the sum of 0.38 plus 0.29 The above 2 examples showed that I can reproduce the predicted probabilities when using the intercept alone. However, for other combinations of the predictor variables, I can NOT repreduce the results. For example, for medium Infl, Type tower, cont low, logit(P(low))=P(low)/(1-P(low))=-0.4961+0.56639=0.07029, solve for P(low)=exp(0.07029)/(1+exp(0.07029))=0.52; but the probability using "predict" function is 0.26.andlogit(P(low, medium))=P(low,medium)/(1-P(low,medium))=0.6907+0.56639=1.25709, solve for P(low,medium)=exp(1.25709)/(1+exp(1.25709))=0.78, which certainly is NOT the sum of 0.26 plus 0.27, and is not the sum of 0.52 and 0.27. Did I misunderstand something here? Or the the way I used to reproduce the predicted probabilities is not correct? Thanks --------------------------------- [[alternate HTML version deleted]]
Hi, is anyone using R for DNA sequence analyses / population genetics? Thanks, Dan ________________________________________________________________________________ Daniel Davison email: davison at uchicago.edu tel.: +1 312 665 7010 fax: +1 312 665 7754 http://home.uchicago.edu/~davison/ Committee on Evolutionary Biology & Division of Birds University of Chicago Field Museum of Natural History 1025 E. 57th Street 1400 S. Lake Shore Drive Culver Hall 402 Chicago, IL 60605-2496 Chicago, IL 60637 U.S.A. U.S.A.
--- Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:> On Mon, 5 May 2003, array chip wrote: > > > I misunderstand something here? Or the the way I > used to reproduce the > > predicted probabilities is not correct? Thanks > > Yes: > > 1) How to wrap lines in emails > > 2) How to post plain text to mailing lists. > > 3) What the model actually is. > > -- > Brian D. Ripley,Sorry that I didn't realize the formatting problem. Hopefully this time it works: Hi, I am trying to use the proportional-odds model using the "polr" function in the MASS library. I tried with the dataset of "housing" contained in the MASS book ("Sat" (factor: low, medium, high) is the dependent variable, "Infl" (low, medium, high), "Type" (tower, apartment, atrium, terrace) and "Cont" (low, high) are the predictor variables (factors). And I have some questions, hope someone could help me out. The following commands are from the MASS book as well.>house.plr<-polr(Sat~Infl+Type+Cont,data=housing,weights=Freq)> summary(house.plr)Call: polr(formula = Sat ~ Infl + Type + Cont, data housing, weights = Freq) Coefficients: Value Std. Error t value InflMedium 0.5663922 0.10465276 5.412109 InflHigh 1.2888137 0.12715609 10.135682 TypeApartment -0.5723552 0.11923800 -4.800107 TypeAtrium -0.3661907 0.15517331 -2.359882 TypeTerrace -1.0910073 0.15148595 -7.202036 ContHigh 0.3602803 0.09553577 3.771156 Intercepts: Value Std. Error t value Low|Medium -0.4961 0.1248 -3.9740 Medium|High 0.6907 0.1255 5.5049 Residual Deviance: 3479.149 AIC: 3495.149 I also tried to predict the probabilities of the 3 categories of "Sat" using the predict function:> hnames<-lapply(housing[,-5],levels) >house.pr1<-predict(house.plr,expand.grid(hnames[-1]),type="probs")> cbind(expand.grid(hnames[-1]),round(house.pr1,2)) Infl Type Cont Low Medium High 1 Low Tower Low 0.38 0.29 0.33 2 Medium Tower Low 0.26 0.27 0.47 3 High Tower Low 0.14 0.21 0.65 4 Low Apartment Low 0.52 0.26 0.22 5 Medium Apartment Low 0.38 0.29 0.33 6 High Apartment Low 0.23 0.26 0.51 7 Low Atrium Low 0.47 0.27 0.26 8 Medium Atrium Low 0.33 0.29 0.38 9 High Atrium Low 0.19 0.25 0.56 10 Low Terrace Low 0.64 0.21 0.14 11 Medium Terrace Low 0.51 0.26 0.23 12 High Terrace Low 0.33 0.29 0.38 13 Low Tower High 0.30 0.28 0.42 14 Medium Tower High 0.19 0.25 0.56 15 High Tower High 0.10 0.17 0.72 16 Low Apartment High 0.43 0.28 0.29 17 Medium Apartment High 0.30 0.28 0.42 18 High Apartment High 0.17 0.23 0.60 19 Low Atrium High 0.38 0.29 0.33 20 Medium Atrium High 0.26 0.27 0.47 21 High Atrium High 0.14 0.21 0.64 22 Low Terrace High 0.56 0.25 0.19 23 Medium Terrace High 0.42 0.28 0.30 24 High Terrace High 0.26 0.27 0.47 what I am confused is that when I tried to reproduce these predicted probabilities using the model coefficients, I can sometimes get different results: for example, for low Infl, Type tower, cont low, logit(P(low))=P(low)/(1-P(low))=-0.4961, solve for P(low)=exp(-0.4961)/(1+exp(-0.4961))=0.38; and logit(P(low, medium))=P(low,medium)/(1-P(low,medium))=0.6907, solve for P(low,medium)=exp(0.6907)/(1+exp(0.6907))=0.67, which is the sum of 0.38 plus 0.29. The above 2 examples showed that I can reproduce the predicted probabilities when using the intercept alone. However, for other combinations of the predictor variables, I can NOT repreduce the results. For example, for medium Infl, Type tower, cont low, logit(P(low))=P(low)/(1-P(low))=-0.4961+0.56639=0.07029, solve for P(low)=exp(0.07029)/(1+exp(0.07029))=0.52; but the probability using "predict" function is 0.26. and logit(P(low, medium))=P(low,medium)/(1-P(low,medium))=0.6907+0.56639=1.25709, solve for P(low,medium)=exp(1.25709)/(1+exp(1.25709))=0.78, which certainly is NOT the sum of 0.26 plus 0.27, and is not the sum of 0.52 and 0.27. Did I misunderstand something here? Or the the way I used to reproduce the predicted probabilities is not correct? Thanks very much!
Hi, It seems my last post had formatting problem, so hopefully this time it works: I am trying to use the proportional-odds model (cumulative logistic regression) using the "polr" function in the MASS library of R. I tried with the dataset of "housing" contained in the MASS book where "Sat" (3 levels: low, medium, high) is the dependent variable, "Infl" (low, medium, high), "Type" (tower, apartment, atrium, terrace) and "Cont" (low, high) are the predictor variables. And I have some questions; hope someone could help me out. The following commands are taken from the MASS book as well. If you know R, it might be better to understand my problem, if not, I hope it's still ok - my problem is actually a statistically problem, not a programming problem, ignore the codes, but just look at the output. For the dataset, the proportional odds model tried to run cumulative logistic regression of the dependent variable "Sat" on the 3 predictor variables "Infl", "Type", and "Cont" without interactions. In R, the lowest level of each variable is treated as the reference level.> house.plr<-polr(Sat~Infl+Type+Cont,data=housing,weights=Freq)> summary(house.plr)Call: polr(formula = Sat ~ Infl + Type + Cont, data housing, weights = Freq) Coefficients: Value Std. Error t value InflMedium 0.5663922 0.10465276 5.412109 InflHigh 1.2888137 0.12715609 10.135682 TypeApartment -0.5723552 0.11923800 -4.800107 TypeAtrium -0.3661907 0.15517331 -2.359882 TypeTerrace -1.0910073 0.15148595 -7.202036 ContHigh 0.3602803 0.09553577 3.771156 Intercepts: Value Std. Error t value Low|Medium -0.4961 0.1248 -3.9740 Medium|High 0.6907 0.1255 5.5049 Residual Deviance: 3479.149 AIC: 3495.149 I also tried to predict the probabilities of the 3 categories of the dependent variable "Sat" for every combination of the 3 predictor variables using the predict function in R:> hnames<-lapply(housing[,-5],levels) >house.pr1<-predict(house.plr,expand.grid(hnames[-1]),type="probs")> cbind(expand.grid(hnames[-1]),round(house.pr1,2))Infl Type Cont Low Medium High 1 Low Tower Low 0.38 0.29 0.33 2 Medium Tower Low 0.26 0.27 0.47 3 High Tower Low 0.14 0.21 0.65 4 Low Apartment Low 0.52 0.26 0.22 5 Medium Apartment Low 0.38 0.29 0.33 6 High Apartment Low 0.23 0.26 0.51 7 Low Atrium Low 0.47 0.27 0.26 8 Medium Atrium Low 0.33 0.29 0.38 9 High Atrium Low 0.19 0.25 0.56 10 Low Terrace Low 0.64 0.21 0.14 11 Medium Terrace Low 0.51 0.26 0.23 12 High Terrace Low 0.33 0.29 0.38 13 Low Tower High 0.30 0.28 0.42 14 Medium Tower High 0.19 0.25 0.56 15 High Tower High 0.10 0.17 0.72 16 Low Apartment High 0.43 0.28 0.29 17 Medium Apartment High 0.30 0.28 0.42 18 High Apartment High 0.17 0.23 0.60 19 Low Atrium High 0.38 0.29 0.33 20 Medium Atrium High 0.26 0.27 0.47 21 High Atrium High 0.14 0.21 0.64 22 Low Terrace High 0.56 0.25 0.19 23 Medium Terrace High 0.42 0.28 0.30 24 High Terrace High 0.26 0.27 0.47 What I am confused is that when I tried to reproduce these predicted probabilities manually using the model coefficients, I can sometimes get different results: According to cumulative logistic regression, the model is: logit(P(Y<=k | x1,x2,x3) = a + b1*x1 + b2*x2 + b3*x3 for k=1,2,3 For example, for low Infl, Type tower, Cont low (all on reference level), logit(P(Sat=low))=P(Sat=low)/(1-P(Sat=low))=-0.4961, solve for P(Sat=low)=exp(-0.4961)/(1+exp(-0.4961))=0.38; and logit(P(Sat=low, medium)) P(Sat=low,medium)/(1-P(Sat=low,medium)) = 0.6907, solve for P(Sat=low,medium) exp(0.6907)/(1+exp(0.6907))=0.67, which is the sum of 0.38 plus 0.29. The above 2 examples showed that I can reproduce the predicted probabilities when using the intercept alone. However, for other combinations of the predictor variables, I can NOT reproduce the results. For example, for medium Infl, Type tower, cont low, logit(P(Sat=low))=P(Sat=low)/(1-P(Sat=low))=-0.4961+0.56639=0.07029, solve for P(Sat=low)=exp(0.07029)/(1+exp(0.07029))=0.52; but the probability using "predict" function is 0.26. and logit(P(Sat=low, medium)) P(Sat=low,medium)/(1-P(Sat=low,medium)) 0.6907+0.56639=1.25709, solve for P(Sat=low,medium) exp(1.25709)/(1+exp(1.25709))=0.78, which certainly is NOT the sum of 0.26 plus 0.27, and is not the sum of 0.52 and 0.27. Did I misunderstand something here? Or the way I used to reproduce the predicted probabilities is not correct? Thanks very much!