thr3ads.net - R help - [R] glm and stepAIC selects too many effects [Jun 2017]

If this information is useful, please help other people find it:
Share via:

Marc Girondot

2017-Jun-06 06:08 UTC

[R] glm and stepAIC selects too many effects

This is a question at the border between stats and r.

When I do a glm with many potential effects, and select a model using 
stepAIC, many independent variables are selected even if there are no 
relationship between dependent variable and the effects (all are random 
numbers).

Do someone has a solution to prevent this effect ? Is it related to 
Bonferoni correction ?

Is there is a ratio of independent vs number of observations that is 
safe for stepAIC ?

Thanks

Marc

Example of code. When 2 independent variables are included, no effect is 
selected, when 11 are included, 7 to 8 are selected.

x <- rnorm(15, 15, 2)
A <- rnorm(15, 20, 5)
B <- rnorm(15, 20, 5)
C <- rnorm(15, 20, 5)
D <- rnorm(15, 20, 5)
E <- rnorm(15, 20, 5)
F <- rnorm(15, 20, 5)
G <- rnorm(15, 20, 5)
H <- rnorm(15, 20, 5)
I <- rnorm(15, 20, 5)
J <- rnorm(15, 20, 5)
K <- rnorm(15, 20, 5)

df <- data.frame(x=x, A=A, B=B, C=C, D=D,
                  E=E, F=F, G=G, H=H, I=I, J=J,
                  K=K)

G1 <- glm(formula = x ~ A + B,
          data=df, family = gaussian(link = "identity"))

g1 <- stepAIC(G1)

summary(g1)

G2 <- glm(formula = x ~ A + B + C + D + E + F + G + H + I + J + K,
          data=df, family = gaussian(link = "identity"))

g2 <- stepAIC(G2)

summary(g2)

Rolf Turner

2017-Jun-06 06:25 UTC

head link

[R] [FORGED] glm and stepAIC selects too many effects

On 06/06/17 18:08, Marc Girondot via R-help wrote:> This is a question at the border between stats and r.
> 
> When I do a glm with many potential effects, and select a model using 
> stepAIC, many independent variables are selected even if there are no 
> relationship between dependent variable and the effects (all are random 
> numbers).
> 
> Do someone has a solution to prevent this effect ? Is it related to 
> Bonferoni correction ?
> 
> Is there is a ratio of independent vs number of observations that is 
> safe for stepAIC ?
> 
> Thanks
> 
> Marc
> 
> Example of code. When 2 independent variables are included, no effect is 
> selected, when 11 are included, 7 to 8 are selected.
> 
> x <- rnorm(15, 15, 2)
> A <- rnorm(15, 20, 5)
> B <- rnorm(15, 20, 5)
> C <- rnorm(15, 20, 5)
> D <- rnorm(15, 20, 5)
> E <- rnorm(15, 20, 5)
> F <- rnorm(15, 20, 5)
> G <- rnorm(15, 20, 5)
> H <- rnorm(15, 20, 5)
> I <- rnorm(15, 20, 5)
> J <- rnorm(15, 20, 5)
> K <- rnorm(15, 20, 5)
> 
> df <- data.frame(x=x, A=A, B=B, C=C, D=D,
>                   E=E, F=F, G=G, H=H, I=I, J=J,
>                   K=K)
> 
> G1 <- glm(formula = x ~ A + B,
>           data=df, family = gaussian(link = "identity"))
> 
> g1 <- stepAIC(G1)
> 
> summary(g1)
> 
> G2 <- glm(formula = x ~ A + B + C + D + E + F + G + H + I + J + K,
>           data=df, family = gaussian(link = "identity"))
> 
> g2 <- stepAIC(G2)
> 
> summary(g2)
IMHO there's nothing much that you can do about this.  Trying to get the 
data to select a model is always fraught with peril.

The phenomenon that you have observed has been remarked on before; see
Alan Miller's book "Subset Selection in Regression" (Chapman and
Hall,
1990), page 12 (first paragraph of section 1.4).

However you might find some of Miller's recommendations to be at least a 
*bit* useful.

cheers,

Rolf Turner

-- 
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

Seemingly Similar Threads

Search for more apparently analagous threads

R help - Jun 2017 - glm and stepAIC selects too many effects

[R] glm and stepAIC selects too many effects

[R] [FORGED] glm and stepAIC selects too many effects

Seemingly Similar Threads