Paul Sanfilippo
2016-Oct-03 22:30 UTC
[R] Multivariable Wald to test equality of multinomial coefficients
Hi, I am trying to replicate a test in the Hosmer - Applied Logistic regression text (pp 289, 3rd ed) that uses a Multivariable Wald test to test the equality of coefficients across the 2 logits of a 3 category response multinomial model. I?d like to see whether (from a ?statistical standpoint) it is acceptable to collapse the 2 response categories and then simply use a binary logistic regression. The idea is that if the coefficients across the 2 logits are similar (non-significant p value with Wald test), then it is reasonable to pool the categories. There does not seem to be a built in way to do this in R? Using the mtcars dataset as an example (for the sake of the example, using cyl as a 3-factor response), does anyone have any ideas how to do this library(nnet) data(mtcars) mtcars$cyl <- as.factor(mtcars$cyl) ? mtcars$am <- as.factor(mtcars$am) ? mod <- multinom(cyl ~ am + hp, data=mtcars) summary(mod)> summary(mod)Call: multinom(formula = cyl ~ am + hp, data = mtcars) Coefficients: ? (Intercept) ? ? ? am1 ? ? ? ?hp 6 ? -42.03847 ?-3.77398 0.4147498 8 ? -92.30944 -26.27554 0.7836576 So, I want to simultaneously test whether the 3 coefficients across the 2 logits are similar. Thank you. [[alternative HTML version deleted]]
Bert Gunter
2016-Oct-03 23:08 UTC
[R] Multivariable Wald to test equality of multinomial coefficients
See inline. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Oct 3, 2016 at 3:30 PM, Paul Sanfilippo <prseye at gmail.com> wrote:> Hi, > > I am trying to replicate a test in the Hosmer - Applied Logistic regression text (pp 289, 3rd ed) that uses a Multivariable Wald test to test the equality of coefficients across the 2 logits of a 3 category response multinomial model. I?d like to see whether (from a statistical standpoint) it is acceptable to collapse the 2 response categories and then simply use a binary logistic regression."The idea is that if the coefficients across the 2 logits are similar (non-significant p value with Wald test), then it is reasonable to pool the categories." IMHO, this is a bad idea. See http://www.nature.com/news/statisticians-issue-warning-over-misuse-of-p-values-1.19503. Significance or lack of it is not a legitimate criterion on which to base scientific decisions.> > There does not seem to be a built in way to do this in R? > > Using the mtcars dataset as an example (for the sake of the example, using cyl as a 3-factor response), does anyone have any ideas how to do this > > library(nnet) > data(mtcars) > mtcars$cyl <- as.factor(mtcars$cyl) > mtcars$am <- as.factor(mtcars$am) > mod <- multinom(cyl ~ am + hp, data=mtcars) > summary(mod) > >> summary(mod) > Call: > multinom(formula = cyl ~ am + hp, data = mtcars) > > Coefficients: > (Intercept) am1 hp > 6 -42.03847 -3.77398 0.4147498 > 8 -92.30944 -26.27554 0.7836576 > > So, I want to simultaneously test whether the 3 coefficients across the 2 logits are similar. > > Thank you. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Paul Sanfilippo
2016-Oct-03 23:15 UTC
[R] Multivariable Wald to test equality of multinomial coefficients
Thanks Bert, I realise that and am not distilling it down to the p value. I am primarily considering the issue of collapsing down in the larger context of the added value/information that the extra categories give. However, I am curious to see (as I said from a statistical standpoint) whether the coefficients are sufficiently dissimilar. Regards, Paul Sanfilippo? On 4 October 2016 at 10:08:12 am, Bert Gunter (bgunter.4567 at gmail.com) wrote: See inline. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Oct 3, 2016 at 3:30 PM, Paul Sanfilippo <prseye at gmail.com> wrote:> Hi, > > I am trying to replicate a test in the Hosmer - Applied Logistic regression text (pp 289, 3rd ed) that uses a Multivariable Wald test to test the equality of coefficients across the 2 logits of a 3 category response multinomial model. I?d like to see whether (from a statistical standpoint) it is acceptable to collapse the 2 response categories and then simply use a binary logistic regression."The idea is that if the coefficients across the 2 logits are similar (non-significant p value with Wald test), then it is reasonable to pool the categories." IMHO, this is a bad idea. See http://www.nature.com/news/statisticians-issue-warning-over-misuse-of-p-values-1.19503. Significance or lack of it is not a legitimate criterion on which to base scientific decisions.> > There does not seem to be a built in way to do this in R? > > Using the mtcars dataset as an example (for the sake of the example, using cyl as a 3-factor response), does anyone have any ideas how to do this > > library(nnet) > data(mtcars) > mtcars$cyl <- as.factor(mtcars$cyl) > mtcars$am <- as.factor(mtcars$am) > mod <- multinom(cyl ~ am + hp, data=mtcars) > summary(mod) > >> summary(mod) > Call: > multinom(formula = cyl ~ am + hp, data = mtcars) > > Coefficients: > (Intercept) am1 hp > 6 -42.03847 -3.77398 0.4147498 > 8 -92.30944 -26.27554 0.7836576 > > So, I want to simultaneously test whether the 3 coefficients across the 2 logits are similar. > > Thank you. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
peter dalgaard
2016-Oct-03 23:23 UTC
[R] Multivariable Wald to test equality of multinomial coefficients
> On 04 Oct 2016, at 00:30 , Paul Sanfilippo <prseye at gmail.com> wrote: > > Hi, > > I am trying to replicate a test in the Hosmer - Applied Logistic regression text (pp 289, 3rd ed) that uses a Multivariable Wald test to test the equality of coefficients across the 2 logits of a 3 category response multinomial model. I?d like to see whether (from a statistical standpoint) it is acceptable to collapse the 2 response categories and then simply use a binary logistic regression. The idea is that if the coefficients across the 2 logits are similar (non-significant p value with Wald test), then it is reasonable to pool the categories. > > There does not seem to be a built in way to do this in R? > > Using the mtcars dataset as an example (for the sake of the example, using cyl as a 3-factor response), does anyone have any ideas how to do this > > library(nnet) > data(mtcars) > mtcars$cyl <- as.factor(mtcars$cyl) > mtcars$am <- as.factor(mtcars$am) > mod <- multinom(cyl ~ am + hp, data=mtcars) > summary(mod) > >> summary(mod) > Call: > multinom(formula = cyl ~ am + hp, data = mtcars) > > Coefficients: > (Intercept) am1 hp > 6 -42.03847 -3.77398 0.4147498 > 8 -92.30944 -26.27554 0.7836576 > > So, I want to simultaneously test whether the 3 coefficients across the 2 logits are similar.R and R-packages do not always produce every single test that someone have thought up. Sometimes, you just get the tools to roll your own, and are expected to know enough theory to do so. The generic technique for a Wald test would be to (a) extract the variance-covariance matrix (V) of the coefficients (beta) (b) write the linear relation that you wish to test in matrix form A beta = 0 (c) the variance-covariance matrix of A beta will be A V A' (d) the Wald test is (A beta)' (A V A')^{-1} (A beta) For the case of multinom(), a little extra care is needed because it presents the coefficients as a matrix, and the variance covariance matrix is ordered as if the coefficients were organized as a vector _by row_:> vcov(mod)6:(Intercept) 6:am1 6:hp 8:(Intercept) 8:am1 6:(Intercept) 771.682250 69.0782649 -7.61945359 168.212743 -463.676388 6:am1 69.078265 10.6015542 -0.71221686 13.069175 -38.850954 6:hp -7.619454 -0.7122169 0.07550636 -1.647338 4.560144 8:(Intercept) 168.212743 13.0691754 -1.64733776 1019.860307 -1473.837691 8:am1 -463.676388 -38.8509537 4.56014436 -1473.837691 2195.306673 8:hp -3.169803 -0.2992327 0.03147124 -7.719801 11.719345 8:hp 6:(Intercept) -3.16980299 6:am1 -0.29923273 6:hp 0.03147124 8:(Intercept) -7.71980097 8:am1 11.71934471 8:hp 0.06548745> coef(mod)(Intercept) am1 hp 6 -42.03847 -3.77398 0.4147498 8 -92.30944 -26.27554 0.7836576 So the thing to do would be roughly like this (the code can surely be improved):> beta <- as.vector(t(coef(mod))) > A <- rbind(c(1,0,0,-1,0,0), c(0,1,0,0,-1,0), c(0,0,1,0,0,-1)) > A[,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 0 0 -1 0 0 [2,] 0 1 0 0 -1 0 [3,] 0 0 1 0 0 -1> A %*% beta[,1] [1,] 50.2709610 [2,] 22.5015565 [3,] -0.3689079> t(A %*% beta) %*% solve(A %*% vcov(mod) %*% t(A), A %*% beta)[,1] [1,] 3.592326 And then get the p-value from the asymptotic chi-square on 3-df> pchisq(3.59, 3, lower=FALSE)[1] 0.3092756 -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Paul Sanfilippo
2016-Oct-04 00:03 UTC
[R] Multivariable Wald to test equality of multinomial coefficients
Thank you very much Peter - very enlightening to do it from first principles! Regards, Paul On 4 October 2016 at 10:23:52 am, peter dalgaard (pdalgd at gmail.com) wrote:> On 04 Oct 2016, at 00:30 , Paul Sanfilippo <prseye at gmail.com> wrote: > > Hi, > > I am trying to replicate a test in the Hosmer - Applied Logistic regression text (pp 289, 3rd ed) that uses a Multivariable Wald test to test the equality of coefficients across the 2 logits of a 3 category response multinomial model. I?d like to see whether (from a statistical standpoint) it is acceptable to collapse the 2 response categories and then simply use a binary logistic regression. The idea is that if the coefficients across the 2 logits are similar (non-significant p value with Wald test), then it is reasonable to pool the categories. > > There does not seem to be a built in way to do this in R? > > Using the mtcars dataset as an example (for the sake of the example, using cyl as a 3-factor response), does anyone have any ideas how to do this > > library(nnet) > data(mtcars) > mtcars$cyl <- as.factor(mtcars$cyl) > mtcars$am <- as.factor(mtcars$am) > mod <- multinom(cyl ~ am + hp, data=mtcars) > summary(mod) > >> summary(mod) > Call: > multinom(formula = cyl ~ am + hp, data = mtcars) > > Coefficients: > (Intercept) am1 hp > 6 -42.03847 -3.77398 0.4147498 > 8 -92.30944 -26.27554 0.7836576 > > So, I want to simultaneously test whether the 3 coefficients across the 2 logits are similar.R and R-packages do not always produce every single test that someone have thought up. Sometimes, you just get the tools to roll your own, and are expected to know enough theory to do so. The generic technique for a Wald test would be to (a) extract the variance-covariance matrix (V) of the coefficients (beta) (b) write the linear relation that you wish to test in matrix form A beta = 0 (c) the variance-covariance matrix of A beta will be A V A' (d) the Wald test is (A beta)' (A V A')^{-1} (A beta) For the case of multinom(), a little extra care is needed because it presents the coefficients as a matrix, and the variance covariance matrix is ordered as if the coefficients were organized as a vector _by row_:> vcov(mod)6:(Intercept) 6:am1 6:hp 8:(Intercept) 8:am1 6:(Intercept) 771.682250 69.0782649 -7.61945359 168.212743 -463.676388 6:am1 69.078265 10.6015542 -0.71221686 13.069175 -38.850954 6:hp -7.619454 -0.7122169 0.07550636 -1.647338 4.560144 8:(Intercept) 168.212743 13.0691754 -1.64733776 1019.860307 -1473.837691 8:am1 -463.676388 -38.8509537 4.56014436 -1473.837691 2195.306673 8:hp -3.169803 -0.2992327 0.03147124 -7.719801 11.719345 8:hp 6:(Intercept) -3.16980299 6:am1 -0.29923273 6:hp 0.03147124 8:(Intercept) -7.71980097 8:am1 11.71934471 8:hp 0.06548745> coef(mod)(Intercept) am1 hp 6 -42.03847 -3.77398 0.4147498 8 -92.30944 -26.27554 0.7836576 So the thing to do would be roughly like this (the code can surely be improved):> beta <- as.vector(t(coef(mod))) > A <- rbind(c(1,0,0,-1,0,0), c(0,1,0,0,-1,0), c(0,0,1,0,0,-1)) > A[,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 0 0 -1 0 0 [2,] 0 1 0 0 -1 0 [3,] 0 0 1 0 0 -1> A %*% beta[,1] [1,] 50.2709610 [2,] 22.5015565 [3,] -0.3689079> t(A %*% beta) %*% solve(A %*% vcov(mod) %*% t(A), A %*% beta)[,1] [1,] 3.592326 And then get the p-value from the asymptotic chi-square on 3-df> pchisq(3.59, 3, lower=FALSE)[1] 0.3092756 -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com [[alternative HTML version deleted]]