Hi:
On Wed, Dec 15, 2010 at 4:24 AM, beatlebg <rhelpforum@gmail.com> wrote:
>
> Am I trying to perform multiple linear regressions on each
'VARIABLE2'. I
> figured out that there are different ways, using the following code:
> (data
> is given at the end of this message)
> reg <- lapply(split(TRY, VARIABLE2), function(X){lm(X2 ~ X3, data=X)})
> lapply(reg, summary)
>
> Which produces the following:
>
> $`1`
>
> Call:
> lm(formula = X2 ~ X3, data = X)
>
> Residuals:
> Min 1Q Median 3Q Max
> -1.24233 -0.30028 0.03706 0.46170 1.12408
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 3.0705 0.2323 13.215 5.95e-15 ***
> X3 0.4744 0.2640 1.797 0.0813 .
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.5752 on 34 degrees of freedom
> Multiple R-squared: 0.08672, Adjusted R-squared: 0.05986
> F-statistic: 3.228 on 1 and 34 DF, p-value: 0.08126
> ^^^^^^^^^^^
>
> $`2`
>
> Call:
> lm(formula = X2 ~ X3, data = X)
>
> Residuals:
> Min 1Q Median 3Q Max
> -1.1358 -0.6403 0.2505 0.4055 1.2088
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 2.5859 0.2968 8.713 4.53e-10 ***
> X3 0.4957 0.3435 1.443 0.158
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.6765 on 33 degrees of freedom
> Multiple R-squared: 0.05937, Adjusted R-squared: 0.03086
> F-statistic: 2.083 on 1 and 33 DF, p-value: 0.1584
> ^^^^^^^^
>
> $`3`
>
> Call:
> lm(formula = X2 ~ X3, data = X)
>
> Residuals:
> Min 1Q Median 3Q Max
> -1.70021 -0.66049 -0.00138 0.81210 1.26162
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 1.9473 0.3522 5.529 2.73e-06 ***
> X3 0.8515 0.3954 2.154 0.0378 *
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.8979 on 37 degrees of freedom
> Multiple R-squared: 0.1114, Adjusted R-squared: 0.08739
> F-statistic: 4.639 on 1 and 37 DF, p-value: 0.03784
> ^^^^^^^^
> It should also be possible to use the lmList function, but remarkebly, I
> get
> the same estimates, but different Std. Errors... I used the following code:
>
>
> modlst <- lmList(X2 ~ X3 | VARIABLE2, TRY)
> summary(modlst)
>
> Which produces
>
> Call:
> Model: X2 ~ X3 | VARIABLE2
> Data: TRY
>
> Coefficients:
> (Intercept)
> Estimate Std. Error t value Pr(>|t|)
> 1 3.070507 0.2969014 10.341841 0.000000e+00
> 2 2.585938 0.3224380 8.019952 1.665779e-12
> 3 1.947292 0.2882936 6.754546 8.454271e-10
> X3
> Estimate Std. Error t value Pr(>|t|)
> 1 0.4744112 0.3373931 1.406108 0.162672738
> 2 0.4957349 0.3731949 1.328354 0.186968753
> 3 0.8515270 0.3236325 2.631154 0.009803152
>
> Residual standard error: 0.7350239 on 104 degrees of freedom
>
^^^^^^^^^^^^^^^^^^^^^^^^^^
(33 + 34 + 37) = 104.
The residual variance in lmList() is based on a pooling of all the data. It
considers the groups to be part of the same data frame. Read its help page
carefully to understand what it is meant to do.
> I do not understand what is the difference between these two methods and
> what causes the difference in Std. Errors. Which method is preferable? I
> checked the results with other software programm, and those results
> corresponded with the first method...
>
Which is preferable depends on your goals. If you intend for each subgroup
of data to be independent, then your listwise method is appropriate; if the
groups are meant to be part of the same data set (e.g., if you want to
perform comparisons that involve the different subgroups), then the lmList()
approach would seem more appropriate, at least with respect to the purpose
to which lmList() is intended. How you perceive the connections between the
grouped data frames matters.
HTH,
Dennis
>
> I really hope someone can explain where I made a mistake. Thank you.
>
>
>
> data.frame: TRY:
>
> VARIABLE2 X2 X3
> 1 1 2.3025851 1.00000000
> 2 1 3.8286414 1.00000000
> 3 1 4.3820266 1.00000000
> 4 1 3.6375862 1.00000000
> 5 1 3.7841896 1.00000000
> 6 1 3.4965076 1.00000000
> 7 1 2.8332133 1.00000000
> 8 1 3.6375862 1.00000000
> 9 1 4.0775374 1.00000000
> 10 1 3.4339872 1.00000000
> 11 1 3.5263605 1.00000000
> 12 1 3.0445224 1.00000000
> 13 1 2.8332133 1.00000000
> 14 1 2.7725887 1.00000000
> 15 1 3.0910425 1.00000000
> 16 1 4.1108739 1.00000000
> 17 1 3.2958369 1.00000000
> 18 1 2.7080502 1.00000000
> 19 1 2.9957323 1.00000000
> 20 1 3.6375862 1.00000000
> 21 1 3.8918203 1.00000000
> 22 1 3.8712010 1.00000000
> 23 1 3.4011974 1.00000000
> 24 1 3.2958369 1.00000000
> 25 1 4.1271344 1.00000000
> 26 1 4.1588831 1.00000000
> 27 1 4.1271344 0.90476190
> 28 1 3.8712010 0.66666667
> 29 1 4.5108595 0.66666667
> 30 1 3.9120230 0.33333333
> 31 1 3.6375862 0.23809524
> 32 1 3.4339872 0.04761905
> 33 1 2.8903718 0.00000000
> 34 1 2.8903718 0.00000000
> 35 1 2.8332133 0.00000000
> 36 1 1.9459101 0.00000000
> 37 2 2.0794415 1.00000000
> 38 2 3.4657359 1.00000000
> 39 2 3.9889840 1.00000000
> 40 2 3.4339872 1.00000000
> 41 2 3.4011974 1.00000000
> 42 2 3.3322045 1.00000000
> 43 2 2.8903718 1.00000000
> 44 2 3.3672958 1.00000000
> 45 2 3.3322045 1.00000000
> 46 2 3.4339872 1.00000000
> 47 2 3.4011974 1.00000000
> 48 2 3.2958369 1.00000000
> 49 2 2.8332133 1.00000000
> 50 2 3.3322045 1.00000000
> 51 2 3.3672958 1.00000000
> 52 2 3.6635616 1.00000000
> 53 2 2.8903718 1.00000000
> 54 2 1.9459101 1.00000000
> 55 2 2.0794415 1.00000000
> 56 2 2.3025851 1.00000000
> 57 2 2.4849066 1.00000000
> 58 2 2.0794415 1.00000000
> 59 2 2.3978953 1.00000000
> 60 2 2.4849066 1.00000000
> 61 2 4.2904594 1.00000000
> 62 2 3.9889840 0.57142857
> 63 2 3.6109179 0.52380952
> 64 2 3.5553481 0.33333333
> 65 2 3.1780538 0.33333333
> 66 2 3.1780538 0.33333333
> 67 2 2.7725887 0.33333333
> 68 2 3.1354942 0.19047619
> 69 2 1.7917595 0.09523810
> 70 2 1.9459101 0.19047619
> 71 2 1.6094379 0.00000000
> 72 3 2.3978953 1.00000000
> 73 3 2.4849066 1.00000000
> 74 3 1.6094379 1.00000000
> 75 3 1.3862944 1.00000000
> 76 3 1.7917595 1.00000000
> 77 3 1.0986123 1.00000000
> 78 3 2.0794415 1.00000000
> 79 3 1.3862944 1.00000000
> 80 3 1.9459101 1.00000000
> 81 3 3.1780538 1.00000000
> 82 3 2.1972246 1.00000000
> 83 3 2.4849066 1.00000000
> 84 3 2.6390573 1.00000000
> 85 3 3.6109179 1.00000000
> 86 3 2.3978953 1.00000000
> 87 3 2.1972246 1.00000000
> 88 3 1.6094379 1.00000000
> 89 3 3.0910425 1.00000000
> 90 3 3.6888795 1.00000000
> 91 3 3.3672958 1.00000000
> 92 3 3.4011974 1.00000000
> 93 3 2.4849066 1.00000000
> 94 3 3.4657359 1.00000000
> 95 3 4.0604430 1.00000000
> 96 3 3.6635616 1.00000000
> 97 3 3.6109179 1.00000000
> 98 3 3.8286414 1.00000000
> 99 3 3.6375862 1.00000000
> 100 3 3.7135721 1.00000000
> 101 3 3.8918203 0.80952381
> 102 3 3.7376696 0.85714286
> 103 3 3.0445224 0.66666667
> 104 3 3.2958369 0.33333333
> 105 3 2.7080502 0.00000000
> 106 3 1.9459101 0.00000000
> 107 3 2.4849066 0.04761905
> 108 3 1.9459101 0.00000000
> 109 3 0.6931472 0.00000000
>
> --
> View this message in context:
>
http://r.789695.n4.nabble.com/lmList-and-lapply-lm-different-std-errors-tp3088903p3088903.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]