John Fox
2023-Sep-26 13:49 UTC
[R] car::deltaMethod() fails when a particular combination of categorical variables is not present
Dear Michael, You're testing a linear hypothesis, so there's no need to use the delta method, but the linearHypothesis() function in the car package also fails in your case: > linearHypothesis(minimal_model, "bt2 + csent + bt2:csent = 0") Error in linearHypothesis.lm(minimal_model, "bt2 + csent + bt2:csent = 0") : there are aliased coefficients in the model. One work-around is to ravel the two factors into a single factor with 5 levels: > df$bc <- factor(with(df, paste(b, c, sep=":"))) > df$bc [1] t2:unsent t2:unsent t2:unsent t2:unsent t2:sent t2:unsent [7] t2:unsent t1:sent t2:unsent t2:unsent t2:other t2:unsent [13] t1:unsent t1:sent t2:unsent t2:other t1:unsent t2:sent [19] t2:sent t2:unsent Levels: t1:sent t1:unsent t2:other t2:sent t2:unsent > m <- lm(a ~ bc, data=df) > summary(m) Call: lm(formula = a ~ bc, data = df) Residuals: Min 1Q Median 3Q Max -57.455 -11.750 0.439 14.011 37.545 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 20.50 17.57 1.166 0.2617 bct1:unsent 37.50 24.85 1.509 0.1521 bct2:other 32.00 24.85 1.287 0.2174 bct2:sent 17.17 22.69 0.757 0.4610 bct2:unsent 38.95 19.11 2.039 0.0595 Residual standard error: 24.85 on 15 degrees of freedom Multiple R-squared: 0.2613, Adjusted R-squared: 0.06437 F-statistic: 1.327 on 4 and 15 DF, p-value: 0.3052 Then the hypothesis is tested directly by the t-value for the coefficient bct2:sent. I hope that this helps, John -- John Fox, Professor Emeritus McMaster University Hamilton, Ontario, Canada web: https://www.john-fox.ca/ On 2023-09-26 1:12 a.m., Michael Cohn wrote:> Caution: External email. > > > I'm running a linear regression with two categorical predictors and their > interaction. One combination of levels does not occur in the data, and as > expected, no parameter is estimated for it. I now want to significance test > a particular combination of levels that does occur in the data (ie, I want > to get a confidence interval for the total prediction at given levels of > each variable). > > In the past I've done this using car::deltaMethod() but in this dataset > that does not work, as shown in the example below: The regression model > gives the expected output, but deltaMethod() gives this error: > > error in t(gd) %*% vcov. : non-conformable arguments > > I believe this is because there is no parameter estimate for when the > predictors have the values 't1' and 'other'. In the df_fixed dataframe, > putting one person into that combination of categories causes deltaMethod() > to work as expected. > > I don't know of any theoretical reason that missing one interaction > parameter estimate should prevent getting a confidence interval for a > different combination of predictors. Is there a way to use deltaMethod() or > some other function to do this without changing my data? > > Thank you, > > - Michael Cohn > Vote Rev (http://voterev.org) > > > Demonstration: > ------ > > library(car) > # create dataset with outcome and two categorical predictors > outcomes <- c(91,2,60,53,38,78,48,33,97,41,64,84,64,8,66,41,52,18,57,34) > persontype <- > c("t2","t2","t2","t2","t2","t2","t2","t1","t2","t2","t2","t2","t1","t1","t2","t2","t1","t2","t2","t2") > arm_letter <- > c("unsent","unsent","unsent","unsent","sent","unsent","unsent","sent","unsent","unsent","other","unsent","unsent","sent","unsent","other","unsent","sent","sent","unsent") > df <- data.frame(a = outcomes, b=persontype, c=arm_letter) > > # note: there are no records with the combination 't1' + 'other' > table(df$b,df$c) > > > #regression works as expected > minimal_formula <- formula("a ~ b*c") > minimal_model <- lm(minimal_formula, data=df) > summary(minimal_model) > > #use deltaMethod() to get a prediction for individuals with the combination > 'b2' and 'sent' > # deltaMethod() fails with "error in t(gd) %*% vcov. : non-conformable > arguments." > deltaMethod(minimal_model, "bt2 + csent + `bt2:csent`", rhs=0) > > # duplicate the dataset and change one record to be in the previously empty > cell > df_fixed <- df > df_fixed[c(13),"c"] <- 'other' > table(df_fixed$b,df_fixed$c) > > #deltaMethod() now works > minimal_model_fixed <- lm(minimal_formula, data=df_fixed) > deltaMethod(minimal_model_fixed, "bt2 + csent + `bt2:csent`", rhs=0) > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
John Fox
2023-Sep-26 15:39 UTC
[R] car::deltaMethod() fails when a particular combination of categorical variables is not present
Dear Michael, My previous response was inaccurate: First, linearHypothesis() *is* able to accommodate aliased coefficients by setting the argument singular.ok = TRUE: > linearHypothesis(minimal_model, "bt2 + csent + bt2:csent = 0", + singular.ok=TRUE) Linear hypothesis test: bt2 + csent + bt2:csent = 0 Model 1: restricted model Model 2: a ~ b * c Res.Df RSS Df Sum of Sq F Pr(>F) 1 16 9392.1 2 15 9266.4 1 125.67 0.2034 0.6584 Moreover, when there is an empty cell, this F-test is (for a reason that I haven't worked out, but is almost surely due to how the rank-deficient model is parametrized) *not* equivalent to the t-test for the corresponding coefficient in the raveled version of the two factors: > df$bc <- factor(with(df, paste(b, c, sep=":"))) > m <- lm(a ~ bc, data=df) > summary(m) Call: lm(formula = a ~ bc, data = df) Residuals: Min 1Q Median 3Q Max -57.455 -11.750 0.439 14.011 37.545 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 20.50 17.57 1.166 0.2617 bct1:unsent 37.50 24.85 1.509 0.1521 bct2:other 32.00 24.85 1.287 0.2174 bct2:sent 17.17 22.69 0.757 0.4610 <<< cf. F = 0.2034, p = 0.6584 bct2:unsent 38.95 19.11 2.039 0.0595 Residual standard error: 24.85 on 15 degrees of freedom Multiple R-squared: 0.2613, Adjusted R-squared: 0.06437 F-statistic: 1.327 on 4 and 15 DF, p-value: 0.3052 In the full-rank case, however, what I said is correct -- that is, the F-test for the 1 df hypothesis on the three coefficients is equivalent to the t-test for the corresponding coefficient when the two factors are raveled: > linearHypothesis(minimal_model_fixed, "bt2 + csent + bt2:csent = 0") Linear hypothesis test: bt2 + csent + bt2:csent = 0 Model 1: restricted model Model 2: a ~ b * c Res.Df RSS Df Sum of Sq F Pr(>F) 1 15 9714.5 2 14 9194.4 1 520.08 0.7919 0.3886 > df_fixed$bc <- factor(with(df_fixed, paste(b, c, sep=":"))) > m <- lm(a ~ bc, data=df_fixed) > summary(m) Call: lm(formula = a ~ bc, data = df_fixed) Residuals: Min 1Q Median 3Q Max -57.455 -11.750 0.167 14.011 37.545 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 64.000 25.627 2.497 0.0256 bct1:sent -43.500 31.387 -1.386 0.1874 bct1:unsent -12.000 36.242 -0.331 0.7455 bct2:other -11.500 31.387 -0.366 0.7195 bct2:sent -26.333 29.591 -0.890 0.3886 << cf. bct2:unsent -4.545 26.767 -0.170 0.8676 Residual standard error: 25.63 on 14 degrees of freedom Multiple R-squared: 0.2671, Adjusted R-squared: 0.005328 F-statistic: 1.02 on 5 and 14 DF, p-value: 0.4425 So, to summarize: (1) You can use linearHypothesis() with singular.ok=TRUE to test the hypothesis that you specified, though I suspect that this hypothesis probably isn't testing what you think in the rank-deficient case. I suspect that the hypothesis that you want to test is obtained by raveling the two factors. (2) There is no reason to use deltaMethod() for a linear hypothesis, but there is also no intrinsic reason that deltaMethod() shouldn't be able to handle a rank-deficient model. We'll probably fix that. My apologies for the confusion, John -- John Fox, Professor Emeritus McMaster University Hamilton, Ontario, Canada web: https://www.john-fox.ca/ On 2023-09-26 9:49 a.m., John Fox wrote:> Caution: External email. > > > Dear Michael, > > You're testing a linear hypothesis, so there's no need to use the delta > method, but the linearHypothesis() function in the car package also > fails in your case: > > > linearHypothesis(minimal_model, "bt2 + csent + bt2:csent = 0") > Error in linearHypothesis.lm(minimal_model, "bt2 + csent + bt2:csent = > 0") : > there are aliased coefficients in the model. > > One work-around is to ravel the two factors into a single factor with 5 > levels: > > > df$bc <- factor(with(df, paste(b, c, sep=":"))) > > df$bc > ?[1] t2:unsent t2:unsent t2:unsent t2:unsent t2:sent?? t2:unsent > ?[7] t2:unsent t1:sent?? t2:unsent t2:unsent t2:other? t2:unsent > [13] t1:unsent t1:sent?? t2:unsent t2:other? t1:unsent t2:sent > [19] t2:sent?? t2:unsent > Levels: t1:sent t1:unsent t2:other t2:sent t2:unsent > > > m <- lm(a ~ bc, data=df) > > summary(m) > > Call: > lm(formula = a ~ bc, data = df) > > Residuals: > ??? Min????? 1Q? Median????? 3Q???? Max > -57.455 -11.750?? 0.439? 14.011? 37.545 > > Coefficients: > ??????????? Estimate Std. Error t value Pr(>|t|) > (Intercept)??? 20.50????? 17.57?? 1.166?? 0.2617 > bct1:unsent??? 37.50????? 24.85?? 1.509?? 0.1521 > bct2:other???? 32.00????? 24.85?? 1.287?? 0.2174 > bct2:sent????? 17.17????? 22.69?? 0.757?? 0.4610 > bct2:unsent??? 38.95????? 19.11?? 2.039?? 0.0595 > > Residual standard error: 24.85 on 15 degrees of freedom > Multiple R-squared:? 0.2613,??? Adjusted R-squared:? 0.06437 > F-statistic: 1.327 on 4 and 15 DF,? p-value: 0.3052 > > Then the hypothesis is tested directly by the t-value for the > coefficient bct2:sent. > > I hope that this helps, > ?John > > -- > John Fox, Professor Emeritus > McMaster University > Hamilton, Ontario, Canada > web: https://www.john-fox.ca/ > > On 2023-09-26 1:12 a.m., Michael Cohn wrote: >> Caution: External email. >> >> >> I'm running a linear regression with two categorical predictors and their >> interaction. One combination of levels does not occur in the data, and as >> expected, no parameter is estimated for it. I now want to significance >> test >> a particular combination of levels that does occur in the data (ie, I >> want >> to get a confidence interval for the total prediction at given levels of >> each variable). >> >> In the past I've done this using car::deltaMethod() but in this dataset >> that does not work, as shown in the example below: The regression model >> gives the expected output, but deltaMethod() gives this error: >> >> error in t(gd) %*% vcov. : non-conformable arguments >> >> I believe this is because there is no parameter estimate for when the >> predictors have the values 't1' and 'other'. In the df_fixed dataframe, >> putting one person into that combination of categories causes >> deltaMethod() >> to work as expected. >> >> I don't know of any theoretical reason that missing one interaction >> parameter estimate should prevent getting a confidence interval for a >> different combination of predictors. Is there a way to use >> deltaMethod() or >> some other function to do this without changing my data? >> >> Thank you, >> >> - Michael Cohn >> Vote Rev (http://voterev.org) >> >> >> Demonstration: >> ------ >> >> library(car) >> # create dataset with outcome and two categorical predictors >> outcomes <- c(91,2,60,53,38,78,48,33,97,41,64,84,64,8,66,41,52,18,57,34) >> persontype <- >> c("t2","t2","t2","t2","t2","t2","t2","t1","t2","t2","t2","t2","t1","t1","t2","t2","t1","t2","t2","t2") >> arm_letter <- >> c("unsent","unsent","unsent","unsent","sent","unsent","unsent","sent","unsent","unsent","other","unsent","unsent","sent","unsent","other","unsent","sent","sent","unsent") >> df <- data.frame(a = outcomes, b=persontype, c=arm_letter) >> >> # note: there are no records with the combination 't1' + 'other' >> table(df$b,df$c) >> >> >> #regression works as expected >> minimal_formula <- formula("a ~ b*c") >> minimal_model <- lm(minimal_formula, data=df) >> summary(minimal_model) >> >> #use deltaMethod() to get a prediction for individuals with the >> combination >> 'b2' and 'sent' >> # deltaMethod() fails with "error in t(gd) %*% vcov. : non-conformable >> arguments." >> deltaMethod(minimal_model, "bt2 + csent + `bt2:csent`", rhs=0) >> >> # duplicate the dataset and change one record to be in the previously >> empty >> cell >> df_fixed <- df >> df_fixed[c(13),"c"] <- 'other' >> table(df_fixed$b,df_fixed$c) >> >> #deltaMethod() now works >> minimal_model_fixed <- lm(minimal_formula, data=df_fixed) >> deltaMethod(minimal_model_fixed, "bt2 + csent + `bt2:csent`", rhs=0) >> >> ???????? [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.