In Venables \& Ripley 3rd edition (p. 231) the proportional odds model is described as: logit(p<=k) = zeta_k + eta but polr apparently thinks there is a minus in front of eta, as is apprent below. Is this a bug og a feature I have overlooked? Here is the naked code for reproduction, below the results. ------------------------------------------------------------------------ --- version library( MASS ) data( housing ) hnames <- lapply( housing[,-5], levels ) house.plr <- polr( Sat ~ Infl + Type + Cont, data=housing, weights=Freq ) summary( house.plr ) newdat <- expand.grid( hnames[-1] )[1:5,] cbind( newdat, predict( house.plr, newdat, type="probs" ) ) # Baseline probs: diff( c(0,tigol( c(-0.4961,0.6907) ), 1) ) # First level of Infl: diff( c(0,tigol( c(-0.4961,0.6907) + 0.5663922 ), 1) ) # But the change of sign for eta is needed to reproduce the fitted values: # Line 2: diff( c(0,tigol( c(-0.4961,0.6907) - 0.5663922 ), 1) ) # Line 5: diff( c(0,tigol( c(-0.4961,0.6907) - 0.5663922 + 0.5723552 ), 1) ) ------------------------------------------------------------------------ --- Here is the resulting output: ------------------------------------------------------------------------ ---> version_ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 1 minor 8.1 year 2003 month 11 day 21 language R> library( MASS ) > data( housing ) > hnames <- lapply( housing[,-5], levels ) > house.plr <- polr( Sat ~ Infl + Type + Cont, data=housing,weights=Freq )> summary( house.plr )Re-fitting to get Hessian Call: polr(formula = Sat ~ Infl + Type + Cont, data = housing, weights = Freq) Coefficients: Value Std. Error t value InflMedium 0.5663922 0.10465276 5.412109 InflHigh 1.2888137 0.12715609 10.135682 TypeApartment -0.5723552 0.11923800 -4.800107 TypeAtrium -0.3661908 0.15517331 -2.359882 TypeTerrace -1.0910073 0.15148595 -7.202036 ContHigh 0.3602803 0.09553577 3.771156 Intercepts: Value Std. Error t value Low|Medium -0.4961 0.1248 -3.9740 Medium|High 0.6907 0.1255 5.5049 Residual Deviance: 3479.149 AIC: 3495.149> newdat <- expand.grid( hnames[-1] )[1:5,] > cbind( newdat, predict( house.plr, newdat, type="probs" ) )Infl Type Cont Low Medium High 1 Low Tower Low 0.3784485 0.2876755 0.3338760 2 Medium Tower Low 0.2568261 0.2742125 0.4689614 3 High Tower Low 0.1436927 0.2110841 0.6452232 4 Low Apartment Low 0.5190450 0.2605077 0.2204473 5 Medium Apartment Low 0.3798522 0.2875967 0.3325511> # Baseline probs: > diff( c(0,tigol( c(-0.4961,0.6907) ), 1) )[1] 0.3784576 0.2876650 0.3338774> # First level of Infl: > diff( c(0,tigol( c(-0.4961,0.6907) + 0.5663922 ), 1) )[1] 0.5175658 0.2609593 0.2214749> # But the change of sign for eta is needed to reproduce the fittedvalues:> # Line 2: > diff( c(0,tigol( c(-0.4961,0.6907) - 0.5663922 ), 1) )[1] 0.2568335 0.2742035 0.4689630> # Line 5: > diff( c(0,tigol( c(-0.4961,0.6907) - 0.5663922 + 0.5723552 ), 1) )[1] 0.3798613 0.2875862 0.3325525>------------------------------------------------------------------------ ----- ---------------------- Bendix Carstensen Senior Statistician Steno Diabetes Center Niels Steensens Vej 2 DK-2820 Gentofte Denmark tel: +45 44 43 87 38 mob: +45 30 75 87 38 fax: +45 44 43 07 06 bxc at steno.dk www.biostat.ku.dk/~bxc
On Thu, 8 Jan 2004, BXC (Bendix Carstensen) wrote:> In Venables \& Ripley 3rd edition (p. 231) the proportional odds model > is described as: > > logit(p<=k) = zeta_k + eta > > but polr apparently thinks there is a minus in front of eta, > as is apprent below. > > Is this a bug og a feature I have overlooked?If there is really a bug I would guess that it was in the book rather than the code. This is not an unusual parametrisation for this model. It is the parametrisation that reduces to logistic regression for binary data, and makes the regression coefficients positive when the association is positive. -thomas> > Here is the naked code for reproduction, below the results. > ------------------------------------------------------------------------ > --- > version > library( MASS ) > data( housing ) > hnames <- lapply( housing[,-5], levels ) > house.plr <- polr( Sat ~ Infl + Type + Cont, data=housing, weights=Freq > ) > summary( house.plr ) > newdat <- expand.grid( hnames[-1] )[1:5,] > cbind( newdat, predict( house.plr, newdat, type="probs" ) ) > # Baseline probs: > diff( c(0,tigol( c(-0.4961,0.6907) ), 1) ) > # First level of Infl: > diff( c(0,tigol( c(-0.4961,0.6907) + 0.5663922 ), 1) ) > # But the change of sign for eta is needed to reproduce the fitted > values: > # Line 2: > diff( c(0,tigol( c(-0.4961,0.6907) - 0.5663922 ), 1) ) > # Line 5: > diff( c(0,tigol( c(-0.4961,0.6907) - 0.5663922 + 0.5723552 ), 1) ) > ------------------------------------------------------------------------ > --- > > Here is the resulting output: > ------------------------------------------------------------------------ > --- > > version > _ > platform i386-pc-mingw32 > arch i386 > os mingw32 > system i386, mingw32 > status > major 1 > minor 8.1 > year 2003 > month 11 > day 21 > language R > > library( MASS ) > > data( housing ) > > hnames <- lapply( housing[,-5], levels ) > > house.plr <- polr( Sat ~ Infl + Type + Cont, data=housing, > weights=Freq ) > > summary( house.plr ) > > Re-fitting to get Hessian > > Call: > polr(formula = Sat ~ Infl + Type + Cont, data = housing, weights = Freq) > > Coefficients: > Value Std. Error t value > InflMedium 0.5663922 0.10465276 5.412109 > InflHigh 1.2888137 0.12715609 10.135682 > TypeApartment -0.5723552 0.11923800 -4.800107 > TypeAtrium -0.3661908 0.15517331 -2.359882 > TypeTerrace -1.0910073 0.15148595 -7.202036 > ContHigh 0.3602803 0.09553577 3.771156 > > Intercepts: > Value Std. Error t value > Low|Medium -0.4961 0.1248 -3.9740 > Medium|High 0.6907 0.1255 5.5049 > > Residual Deviance: 3479.149 > AIC: 3495.149 > > newdat <- expand.grid( hnames[-1] )[1:5,] > > cbind( newdat, predict( house.plr, newdat, type="probs" ) ) > Infl Type Cont Low Medium High > 1 Low Tower Low 0.3784485 0.2876755 0.3338760 > 2 Medium Tower Low 0.2568261 0.2742125 0.4689614 > 3 High Tower Low 0.1436927 0.2110841 0.6452232 > 4 Low Apartment Low 0.5190450 0.2605077 0.2204473 > 5 Medium Apartment Low 0.3798522 0.2875967 0.3325511 > > # Baseline probs: > > diff( c(0,tigol( c(-0.4961,0.6907) ), 1) ) > [1] 0.3784576 0.2876650 0.3338774 > > # First level of Infl: > > diff( c(0,tigol( c(-0.4961,0.6907) + 0.5663922 ), 1) ) > [1] 0.5175658 0.2609593 0.2214749 > > # But the change of sign for eta is needed to reproduce the fitted > values: > > # Line 2: > > diff( c(0,tigol( c(-0.4961,0.6907) - 0.5663922 ), 1) ) > [1] 0.2568335 0.2742035 0.4689630 > > # Line 5: > > diff( c(0,tigol( c(-0.4961,0.6907) - 0.5663922 + 0.5723552 ), 1) ) > [1] 0.3798613 0.2875862 0.3325525 > > > ------------------------------------------------------------------------ > ----- > > ---------------------- > Bendix Carstensen > Senior Statistician > Steno Diabetes Center > Niels Steensens Vej 2 > DK-2820 Gentofte > Denmark > tel: +45 44 43 87 38 > mob: +45 30 75 87 38 > fax: +45 44 43 07 06 > bxc at steno.dk > www.biostat.ku.dk/~bxc > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle
"BXC (Bendix Carstensen)" <bxc at steno.dk> writes:> Here is the naked code for reproduction, below the results.That would be easier to reproduce if you had remembered to define logit <- function(p)log(p/(1-p)) tigol <- function(x)exp(x)/(1+exp(x)) #inverse logit first... Also, beware the line-breaking Jabberwock, my friend: The line with "values:" is syntactically incomplete. -p> ------------------------------------------------------------------------ > --- > version > library( MASS ) > data( housing ) > hnames <- lapply( housing[,-5], levels ) > house.plr <- polr( Sat ~ Infl + Type + Cont, data=housing, weights=Freq > ) > summary( house.plr ) > newdat <- expand.grid( hnames[-1] )[1:5,] > cbind( newdat, predict( house.plr, newdat, type="probs" ) ) > # Baseline probs: > diff( c(0,tigol( c(-0.4961,0.6907) ), 1) ) > # First level of Infl: > diff( c(0,tigol( c(-0.4961,0.6907) + 0.5663922 ), 1) ) > # But the change of sign for eta is needed to reproduce the fitted > values: > # Line 2: > diff( c(0,tigol( c(-0.4961,0.6907) - 0.5663922 ), 1) ) > # Line 5: > diff( c(0,tigol( c(-0.4961,0.6907) - 0.5663922 + 0.5723552 ), 1) ) > ------------------------------------------------------------------------ > --- > > Here is the resulting output: > ------------------------------------------------------------------------ > --- > > version > _ > platform i386-pc-mingw32 > arch i386 > os mingw32 > system i386, mingw32 > status > major 1 > minor 8.1 > year 2003 > month 11 > day 21 > language R > > library( MASS ) > > data( housing ) > > hnames <- lapply( housing[,-5], levels ) > > house.plr <- polr( Sat ~ Infl + Type + Cont, data=housing, > weights=Freq ) > > summary( house.plr ) > > Re-fitting to get Hessian > > Call: > polr(formula = Sat ~ Infl + Type + Cont, data = housing, weights = Freq) > > Coefficients: > Value Std. Error t value > InflMedium 0.5663922 0.10465276 5.412109 > InflHigh 1.2888137 0.12715609 10.135682 > TypeApartment -0.5723552 0.11923800 -4.800107 > TypeAtrium -0.3661908 0.15517331 -2.359882 > TypeTerrace -1.0910073 0.15148595 -7.202036 > ContHigh 0.3602803 0.09553577 3.771156 > > Intercepts: > Value Std. Error t value > Low|Medium -0.4961 0.1248 -3.9740 > Medium|High 0.6907 0.1255 5.5049 > > Residual Deviance: 3479.149 > AIC: 3495.149 > > newdat <- expand.grid( hnames[-1] )[1:5,] > > cbind( newdat, predict( house.plr, newdat, type="probs" ) ) > Infl Type Cont Low Medium High > 1 Low Tower Low 0.3784485 0.2876755 0.3338760 > 2 Medium Tower Low 0.2568261 0.2742125 0.4689614 > 3 High Tower Low 0.1436927 0.2110841 0.6452232 > 4 Low Apartment Low 0.5190450 0.2605077 0.2204473 > 5 Medium Apartment Low 0.3798522 0.2875967 0.3325511 > > # Baseline probs: > > diff( c(0,tigol( c(-0.4961,0.6907) ), 1) ) > [1] 0.3784576 0.2876650 0.3338774 > > # First level of Infl: > > diff( c(0,tigol( c(-0.4961,0.6907) + 0.5663922 ), 1) ) > [1] 0.5175658 0.2609593 0.2214749 > > # But the change of sign for eta is needed to reproduce the fitted > values: > > # Line 2: > > diff( c(0,tigol( c(-0.4961,0.6907) - 0.5663922 ), 1) ) > [1] 0.2568335 0.2742035 0.4689630 > > # Line 5: > > diff( c(0,tigol( c(-0.4961,0.6907) - 0.5663922 + 0.5723552 ), 1) ) > [1] 0.3798613 0.2875862 0.3325525 > > > ------------------------------------------------------------------------ > ----- > > ---------------------- > Bendix Carstensen > Senior Statistician > Steno Diabetes Center > Niels Steensens Vej 2 > DK-2820 Gentofte > Denmark > tel: +45 44 43 87 38 > mob: +45 30 75 87 38 > fax: +45 44 43 07 06 > bxc at steno.dk > www.biostat.ku.dk/~bxc > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
The problem is not in the book nor the code, but in Mr Carstensen not looking up the actual reference given in ?polr. There was a + in early printings of MASS3, and that difference is in the on-line Errata. But both the DESCRIPTION file and ?polr are explicitly to the fourth edition. As Thomas says, the minus seemed a more natural parametrization and so we changed to it. On Thu, 8 Jan 2004, Thomas Lumley wrote:> On Thu, 8 Jan 2004, BXC (Bendix Carstensen) wrote: > > > In Venables \& Ripley 3rd edition (p. 231) the proportional odds model > > is described as: > > > > logit(p<=k) = zeta_k + eta > > > > but polr apparently thinks there is a minus in front of eta, > > as is apprent below. > > > > Is this a bug og a feature I have overlooked? > > If there is really a bug I would guess that it was in the book rather than > the code. This is not an unusual parametrisation for this model. It is > the parametrisation that reduces to logistic regression for binary data, > and makes the regression coefficients positive when the association is > positive.-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595