Dear list, here is an example of stepAIC that I do not understand. The data is n=42, Lage is the only factor and there are four other variables treated as continuous. First you see the stepAIC-forward solution (fs7). The strange thing here is that apparently not all interactions are tried for inclusion, but only WQ:Lage. In particular, I think that WFL:Lage should be tried in the last two steps, where WFL and Lage are already in the fit. After fs7, I give the output of fs6 (backward), where all interactions are tried as I have expected. (regsubsets works properly forward and backward.) Do I misunderstand something or is something strange going on in the forward fit? (I don't want to discuss here if the forward fit is a good thing to do from a data analytic viewpoint. I agree that I should presumably not choose it. However, I want to understand what the algorithm does.) Thank you, Christian> w6 <- lm(Preis~RW1+WFL+WQ+VD+Lage+Lage*WFL+Lage*WQ+Lage*VD,+ data=wohnung)> w7 <- lm(Preis~1, data=wohnung)> fs7 <-stepAIC(w7,scope=list(upper=~RW1+WFL+WQ+VD+Lage+Lage*WFL+Lage*WQ+Lage*VD, + lower=~1), direction="forward") Start: AIC= 623.57 Preis ~ 1 Df Sum of Sq RSS AIC + WQ 1 37219390 75101315 609 + Lage 1 19029749 93290956 618 + WFL 1 12506022 99814682 621 + RW1 1 7299347 105021358 623 <none> 112320704 624 + VD 1 5170556 107150149 624 Step: AIC= 608.66 Preis ~ WQ Df Sum of Sq RSS AIC + Lage 1 4736613 70364702 608 <none> 75101315 609 + WFL 1 1863992 73237323 610 + VD 1 555800 74545515 610 + RW1 1 462284 74639030 610 Step: AIC= 607.92 Preis ~ WQ + Lage Df Sum of Sq RSS AIC + WFL 1 4721973 65642729 607 <none> 70364702 608 + WQ:Lage 1 2829768 67534934 608 + RW1 1 2567408 67797294 608 + VD 1 678458 69686244 610 Step: AIC= 607.01 Preis ~ WQ + Lage + WFL Df Sum of Sq RSS AIC + WQ:Lage 1 5610596 60032132 605 + RW1 1 3404796 62237933 607 <none> 65642729 607 + VD 1 925528 64717201 608 Step: AIC= 605.25 Preis ~ WQ + Lage + WFL + WQ:Lage Df Sum of Sq RSS AIC + RW1 1 3492210 56539923 605 <none> 60032132 605 + VD 1 355353 59676779 607 Step: AIC= 604.74 Preis ~ WQ + Lage + WFL + RW1 + WQ:Lage Df Sum of Sq RSS AIC <none> 56539923 605 + VD 1 94023 56445900 607 Backward fit:> stepAIC(w6)Start: AIC= 596.53 Preis ~ RW1 + WFL + WQ + VD + Lage + Lage * WFL + Lage * WQ + Lage * VD Df Sum of Sq RSS AIC - WQ:Lage 1 190953 40507327 595 - RW1 1 865788 41182162 595 <none> 40316374 597 - WFL:Lage 1 6491181 46807556 601 - VD:Lage 1 12307855 52624230 606 Step: AIC= 594.73 Preis ~ RW1 + WFL + WQ + VD + Lage + WFL:Lage + VD:Lage Df Sum of Sq RSS AIC - RW1 1 756790 41264117 594 - WQ 1 1910020 42417348 595 <none> 40507327 595 - WFL:Lage 1 10302360 50809687 602 - VD:Lage 1 13222644 53729971 605 Step: AIC= 593.51 Preis ~ WFL + WQ + VD + Lage + WFL:Lage + VD:Lage Df Sum of Sq RSS AIC - WQ 1 1793962 43058080 593 <none> 41264117 594 - WFL:Lage 1 12069383 53333500 602 - VD:Lage 1 13657842 54921959 604 Step: AIC= 593.3 Preis ~ WFL + VD + Lage + WFL:Lage + VD:Lage Df Sum of Sq RSS AIC <none> 43058080 593 - WFL:Lage 1 14241342 57299422 603 - VD:Lage 1 19078878 62136957 607 Call: lm(formula = Preis ~ WFL + VD + Lage + WFL:Lage + VD:Lage, data = wohnung) Coefficients: (Intercept) WFL VD Lage2 WFL:Lage2 VD:Lage2 -53269.15 55.92 8025.62 59259.63 -46.71 -8233.36 *********************************************************************** Christian Hennig Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg hennig at math.uni-hamburg.de, http://www.math.uni-hamburg.de/home/hennig/ ####################################################################### ich empfehle www.boag-online.de
On 29 Mar 2004, at 19:07, Christian Hennig wrote: Dear list, First you see the stepAIC-forward solution (fs7). The strange thing here is that apparently not all interactions are tried for inclusion, but only WQ:Lage. In particular, I think that WFL:Lage should be tried in the last two steps, where WFL and Lage are already in the fit. After fs7, I give the output of fs6 (backward), where all interactions are tried as I have expected. (regsubsets works properly forward and backward.) Bob O'Hara posted a message about the same issue just one week ago. This seems to be a limitation in identifying names of interaction terms. In studying scope, A:B and B:A are regarded as different animals. R decides internally which way it arranges these names, and if the building up model has a candidate B:A, but scope has A:B, then B:A is not regarded as being in the scope. The net result is that step (or stepAIC) can be used to build interactions only with good luck. The extreme case is that R decides to include a term (say A:B) from the scope, but after inclusion R decides to re-arrange its name as B:A. This is no longer in the scope and step ends with an error message. I have hoped to work out a reproducible example of this, but haven't had time. However, this happens with the latest devel version of vegan if you use methods that you shouldn?t use (that is, you step cca which you cannot do). The last step: Step: AIC= 125.58 varespec ~ Al + P + K + Baresoil + P:K + P:Baresoil Df AIC + K:Al 1 125.02 + Zn 1 125.36 <none> 125.58 + P:Al 1 125.60 + Al:Baresoil 1 125.82 + Humdepth 1 125.83 + Mo 1 125.94 + Mg 1 125.96 + Mn 1 126.11 + S 1 126.36 - P:Baresoil 1 126.43 + N 1 126.52 + Fe 1 126.72 + K:Baresoil 1 126.80 + pH 1 126.89 + Ca 1 127.01 - P:K 1 127.14 - Al 1 128.01 Step: AIC= 125.02 varespec ~ Al + P + K + Baresoil + P:K + P:Baresoil + Al:K Error in factor.scope(ffac, list(add = fadd, drop = fdrop)) : upper scope does not include model Interpretation: K:Al was in the scope and it was included. However, after inclusion it changed into Al:K which is not in the scope, so that R was able to produce a model where "upper scope does not include model". The secret is in the C routines which decide how to order terms in formulae. cheers, jari oksanen (Oulu)