thr3ads.net - R help - [R] lmList and lapply(... lm) different std. errors [Dec 2010]

If this information is useful, please help other people find it:
Share via:

beatlebg

2010-Dec-15 12:24 UTC

[R] lmList and lapply(... lm) different std. errors

Am I trying to perform multiple linear regressions on each 'VARIABLE2'.
I
figured out that there are different ways, using the following code:   (data
is given at the end of this message) 
reg <- lapply(split(TRY, VARIABLE2), function(X){lm(X2 ~ X3, data=X)}) 
lapply(reg, summary) 

Which produces the following: 

$`1` 

Call: 
lm(formula = X2 ~ X3, data = X) 

Residuals: 
     Min       1Q   Median       3Q      Max 
-1.24233 -0.30028  0.03706  0.46170  1.12408 

Coefficients: 
            Estimate Std. Error t value Pr(>|t|)     
(Intercept)   3.0705     0.2323  13.215 5.95e-15 *** 
X3            0.4744     0.2640   1.797   0.0813 .   
--- 
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 

Residual standard error: 0.5752 on 34 degrees of freedom 
Multiple R-squared: 0.08672,    Adjusted R-squared: 0.05986 
F-statistic: 3.228 on 1 and 34 DF,  p-value: 0.08126 


$`2` 

Call: 
lm(formula = X2 ~ X3, data = X) 

Residuals: 
    Min      1Q  Median      3Q     Max 
-1.1358 -0.6403  0.2505  0.4055  1.2088 

Coefficients: 
            Estimate Std. Error t value Pr(>|t|)     
(Intercept)   2.5859     0.2968   8.713 4.53e-10 *** 
X3            0.4957     0.3435   1.443    0.158     
--- 
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 

Residual standard error: 0.6765 on 33 degrees of freedom 
Multiple R-squared: 0.05937,    Adjusted R-squared: 0.03086 
F-statistic: 2.083 on 1 and 33 DF,  p-value: 0.1584 


$`3` 

Call: 
lm(formula = X2 ~ X3, data = X) 

Residuals: 
     Min       1Q   Median       3Q      Max 
-1.70021 -0.66049 -0.00138  0.81210  1.26162 

Coefficients: 
            Estimate Std. Error t value Pr(>|t|)     
(Intercept)   1.9473     0.3522   5.529 2.73e-06 *** 
X3            0.8515     0.3954   2.154   0.0378 *   
--- 
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 

Residual standard error: 0.8979 on 37 degrees of freedom 
Multiple R-squared: 0.1114,     Adjusted R-squared: 0.08739 
F-statistic: 4.639 on 1 and 37 DF,  p-value: 0.03784 

It should also be possible to use the lmList function, but remarkebly, I get
the same estimates, but different Std. Errors... I used the following code: 


modlst <- lmList(X2 ~ X3 | VARIABLE2, TRY) 
summary(modlst) 

Which produces 

Call: 
  Model: X2 ~ X3 | VARIABLE2 
   Data: TRY 

Coefficients: 
   (Intercept) 
  Estimate Std. Error   t value     Pr(>|t|) 
1 3.070507  0.2969014 10.341841 0.000000e+00 
2 2.585938  0.3224380  8.019952 1.665779e-12 
3 1.947292  0.2882936  6.754546 8.454271e-10 
   X3 
   Estimate Std. Error  t value    Pr(>|t|) 
1 0.4744112  0.3373931 1.406108 0.162672738 
2 0.4957349  0.3731949 1.328354 0.186968753 
3 0.8515270  0.3236325 2.631154 0.009803152 

Residual standard error: 0.7350239 on 104 degrees of freedom 

I do not understand what is the difference between these two methods and
what causes the difference in Std. Errors. Which method is preferable? I
checked the results with other software programm, and those results
corresponded with the first method...   

I really hope someone can explain where I made a mistake. Thank you. 



data.frame: TRY: 

   VARIABLE2        X2         X3 
1           1 2.3025851 1.00000000 
2           1 3.8286414 1.00000000 
3           1 4.3820266 1.00000000 
4           1 3.6375862 1.00000000 
5           1 3.7841896 1.00000000 
6           1 3.4965076 1.00000000 
7           1 2.8332133 1.00000000 
8           1 3.6375862 1.00000000 
9           1 4.0775374 1.00000000 
10          1 3.4339872 1.00000000 
11          1 3.5263605 1.00000000 
12          1 3.0445224 1.00000000 
13          1 2.8332133 1.00000000 
14          1 2.7725887 1.00000000 
15          1 3.0910425 1.00000000 
16          1 4.1108739 1.00000000 
17          1 3.2958369 1.00000000 
18          1 2.7080502 1.00000000 
19          1 2.9957323 1.00000000 
20          1 3.6375862 1.00000000 
21          1 3.8918203 1.00000000 
22          1 3.8712010 1.00000000 
23          1 3.4011974 1.00000000 
24          1 3.2958369 1.00000000 
25          1 4.1271344 1.00000000 
26          1 4.1588831 1.00000000 
27          1 4.1271344 0.90476190 
28          1 3.8712010 0.66666667 
29          1 4.5108595 0.66666667 
30          1 3.9120230 0.33333333 
31          1 3.6375862 0.23809524 
32          1 3.4339872 0.04761905 
33          1 2.8903718 0.00000000 
34          1 2.8903718 0.00000000 
35          1 2.8332133 0.00000000 
36          1 1.9459101 0.00000000 
37          2 2.0794415 1.00000000 
38          2 3.4657359 1.00000000 
39          2 3.9889840 1.00000000 
40          2 3.4339872 1.00000000 
41          2 3.4011974 1.00000000 
42          2 3.3322045 1.00000000 
43          2 2.8903718 1.00000000 
44          2 3.3672958 1.00000000 
45          2 3.3322045 1.00000000 
46          2 3.4339872 1.00000000 
47          2 3.4011974 1.00000000 
48          2 3.2958369 1.00000000 
49          2 2.8332133 1.00000000 
50          2 3.3322045 1.00000000 
51          2 3.3672958 1.00000000 
52          2 3.6635616 1.00000000 
53          2 2.8903718 1.00000000 
54          2 1.9459101 1.00000000 
55          2 2.0794415 1.00000000 
56          2 2.3025851 1.00000000 
57          2 2.4849066 1.00000000 
58          2 2.0794415 1.00000000 
59          2 2.3978953 1.00000000 
60          2 2.4849066 1.00000000 
61          2 4.2904594 1.00000000 
62          2 3.9889840 0.57142857 
63          2 3.6109179 0.52380952 
64          2 3.5553481 0.33333333 
65          2 3.1780538 0.33333333 
66          2 3.1780538 0.33333333 
67          2 2.7725887 0.33333333 
68          2 3.1354942 0.19047619 
69          2 1.7917595 0.09523810 
70          2 1.9459101 0.19047619 
71          2 1.6094379 0.00000000 
72          3 2.3978953 1.00000000 
73          3 2.4849066 1.00000000 
74          3 1.6094379 1.00000000 
75          3 1.3862944 1.00000000 
76          3 1.7917595 1.00000000 
77          3 1.0986123 1.00000000 
78          3 2.0794415 1.00000000 
79          3 1.3862944 1.00000000 
80          3 1.9459101 1.00000000 
81          3 3.1780538 1.00000000 
82          3 2.1972246 1.00000000 
83          3 2.4849066 1.00000000 
84          3 2.6390573 1.00000000 
85          3 3.6109179 1.00000000 
86          3 2.3978953 1.00000000 
87          3 2.1972246 1.00000000 
88          3 1.6094379 1.00000000 
89          3 3.0910425 1.00000000 
90          3 3.6888795 1.00000000 
91          3 3.3672958 1.00000000 
92          3 3.4011974 1.00000000 
93          3 2.4849066 1.00000000 
94          3 3.4657359 1.00000000 
95          3 4.0604430 1.00000000 
96          3 3.6635616 1.00000000 
97          3 3.6109179 1.00000000 
98          3 3.8286414 1.00000000 
99          3 3.6375862 1.00000000 
100         3 3.7135721 1.00000000 
101         3 3.8918203 0.80952381 
102         3 3.7376696 0.85714286 
103         3 3.0445224 0.66666667 
104         3 3.2958369 0.33333333 
105         3 2.7080502 0.00000000 
106         3 1.9459101 0.00000000 
107         3 2.4849066 0.04761905 
108         3 1.9459101 0.00000000 
109         3 0.6931472 0.00000000 

-- 
View this message in context:
http://r.789695.n4.nabble.com/lmList-and-lapply-lm-different-std-errors-tp3088903p3088903.html
Sent from the R help mailing list archive at Nabble.com.

Dennis Murphy

2010-Dec-15 18:23 UTC

head link

[R] lmList and lapply(... lm) different std. errors

Hi:

On Wed, Dec 15, 2010 at 4:24 AM, beatlebg <rhelpforum@gmail.com> wrote:
>
> Am I trying to perform multiple linear regressions on each
'VARIABLE2'. I
> figured out that there are different ways, using the following code:
> (data
> is given at the end of this message)
> reg <- lapply(split(TRY, VARIABLE2), function(X){lm(X2 ~ X3, data=X)})
> lapply(reg, summary)
>
> Which produces the following:
>
> $`1`
>
> Call:
> lm(formula = X2 ~ X3, data = X)
>
> Residuals:
>     Min       1Q   Median       3Q      Max
> -1.24233 -0.30028  0.03706  0.46170  1.12408
>
> Coefficients:
>            Estimate Std. Error t value Pr(>|t|)
> (Intercept)   3.0705     0.2323  13.215 5.95e-15 ***
> X3            0.4744     0.2640   1.797   0.0813 .
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.5752 on 34 degrees of freedom
> Multiple R-squared: 0.08672,    Adjusted R-squared: 0.05986
> F-statistic: 3.228 on 1 and 34 DF,  p-value: 0.08126
>                                         ^^^^^^^^^^^
>
> $`2`
>
> Call:
> lm(formula = X2 ~ X3, data = X)
>
> Residuals:
>    Min      1Q  Median      3Q     Max
> -1.1358 -0.6403  0.2505  0.4055  1.2088
>
> Coefficients:
>            Estimate Std. Error t value Pr(>|t|)
> (Intercept)   2.5859     0.2968   8.713 4.53e-10 ***
> X3            0.4957     0.3435   1.443    0.158
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.6765 on 33 degrees of freedom
> Multiple R-squared: 0.05937,    Adjusted R-squared: 0.03086
> F-statistic: 2.083 on 1 and 33 DF,  p-value: 0.1584
>                                         ^^^^^^^^
>
> $`3`
>
> Call:
> lm(formula = X2 ~ X3, data = X)
>
> Residuals:
>     Min       1Q   Median       3Q      Max
> -1.70021 -0.66049 -0.00138  0.81210  1.26162
>
> Coefficients:
>            Estimate Std. Error t value Pr(>|t|)
> (Intercept)   1.9473     0.3522   5.529 2.73e-06 ***
> X3            0.8515     0.3954   2.154   0.0378 *
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.8979 on 37 degrees of freedom
> Multiple R-squared: 0.1114,     Adjusted R-squared: 0.08739
> F-statistic: 4.639 on 1 and 37 DF,  p-value: 0.03784
>                                           ^^^^^^^^
> It should also be possible to use the lmList function, but remarkebly, I
> get
> the same estimates, but different Std. Errors... I used the following code:
>
>
> modlst <- lmList(X2 ~ X3 | VARIABLE2, TRY)
> summary(modlst)
>
> Which produces
>
> Call:
>  Model: X2 ~ X3 | VARIABLE2
>   Data: TRY
>
> Coefficients:
>   (Intercept)
>  Estimate Std. Error   t value     Pr(>|t|)
> 1 3.070507  0.2969014 10.341841 0.000000e+00
> 2 2.585938  0.3224380  8.019952 1.665779e-12
> 3 1.947292  0.2882936  6.754546 8.454271e-10
>   X3
>   Estimate Std. Error  t value    Pr(>|t|)
> 1 0.4744112  0.3373931 1.406108 0.162672738
> 2 0.4957349  0.3731949 1.328354 0.186968753
> 3 0.8515270  0.3236325 2.631154 0.009803152
>
> Residual standard error: 0.7350239 on 104 degrees of freedom
>
^^^^^^^^^^^^^^^^^^^^^^^^^^
(33 + 34 + 37) = 104.

The residual variance in lmList() is based on a pooling of all the data. It
considers the groups to be part of the same data frame. Read its help page
carefully to understand what it is meant to do.

> I do not understand what is the difference between these two methods and
> what causes the difference in Std. Errors. Which method is preferable? I
> checked the results with other software programm, and those results
> corresponded with the first method...
>
Which is preferable depends on your goals. If you intend for each subgroup
of data to be independent, then your listwise method is appropriate; if the
groups are meant to be part of the same data set (e.g., if you want to
perform comparisons that involve the different subgroups), then the lmList()
approach would seem more appropriate, at least with respect to the purpose
to which lmList() is intended. How you perceive the connections between the
grouped data frames matters.

HTH,
Dennis
>
> I really hope someone can explain where I made a mistake. Thank you.
>
>
>
> data.frame: TRY:
>
>   VARIABLE2        X2         X3
> 1           1 2.3025851 1.00000000
> 2           1 3.8286414 1.00000000
> 3           1 4.3820266 1.00000000
> 4           1 3.6375862 1.00000000
> 5           1 3.7841896 1.00000000
> 6           1 3.4965076 1.00000000
> 7           1 2.8332133 1.00000000
> 8           1 3.6375862 1.00000000
> 9           1 4.0775374 1.00000000
> 10          1 3.4339872 1.00000000
> 11          1 3.5263605 1.00000000
> 12          1 3.0445224 1.00000000
> 13          1 2.8332133 1.00000000
> 14          1 2.7725887 1.00000000
> 15          1 3.0910425 1.00000000
> 16          1 4.1108739 1.00000000
> 17          1 3.2958369 1.00000000
> 18          1 2.7080502 1.00000000
> 19          1 2.9957323 1.00000000
> 20          1 3.6375862 1.00000000
> 21          1 3.8918203 1.00000000
> 22          1 3.8712010 1.00000000
> 23          1 3.4011974 1.00000000
> 24          1 3.2958369 1.00000000
> 25          1 4.1271344 1.00000000
> 26          1 4.1588831 1.00000000
> 27          1 4.1271344 0.90476190
> 28          1 3.8712010 0.66666667
> 29          1 4.5108595 0.66666667
> 30          1 3.9120230 0.33333333
> 31          1 3.6375862 0.23809524
> 32          1 3.4339872 0.04761905
> 33          1 2.8903718 0.00000000
> 34          1 2.8903718 0.00000000
> 35          1 2.8332133 0.00000000
> 36          1 1.9459101 0.00000000
> 37          2 2.0794415 1.00000000
> 38          2 3.4657359 1.00000000
> 39          2 3.9889840 1.00000000
> 40          2 3.4339872 1.00000000
> 41          2 3.4011974 1.00000000
> 42          2 3.3322045 1.00000000
> 43          2 2.8903718 1.00000000
> 44          2 3.3672958 1.00000000
> 45          2 3.3322045 1.00000000
> 46          2 3.4339872 1.00000000
> 47          2 3.4011974 1.00000000
> 48          2 3.2958369 1.00000000
> 49          2 2.8332133 1.00000000
> 50          2 3.3322045 1.00000000
> 51          2 3.3672958 1.00000000
> 52          2 3.6635616 1.00000000
> 53          2 2.8903718 1.00000000
> 54          2 1.9459101 1.00000000
> 55          2 2.0794415 1.00000000
> 56          2 2.3025851 1.00000000
> 57          2 2.4849066 1.00000000
> 58          2 2.0794415 1.00000000
> 59          2 2.3978953 1.00000000
> 60          2 2.4849066 1.00000000
> 61          2 4.2904594 1.00000000
> 62          2 3.9889840 0.57142857
> 63          2 3.6109179 0.52380952
> 64          2 3.5553481 0.33333333
> 65          2 3.1780538 0.33333333
> 66          2 3.1780538 0.33333333
> 67          2 2.7725887 0.33333333
> 68          2 3.1354942 0.19047619
> 69          2 1.7917595 0.09523810
> 70          2 1.9459101 0.19047619
> 71          2 1.6094379 0.00000000
> 72          3 2.3978953 1.00000000
> 73          3 2.4849066 1.00000000
> 74          3 1.6094379 1.00000000
> 75          3 1.3862944 1.00000000
> 76          3 1.7917595 1.00000000
> 77          3 1.0986123 1.00000000
> 78          3 2.0794415 1.00000000
> 79          3 1.3862944 1.00000000
> 80          3 1.9459101 1.00000000
> 81          3 3.1780538 1.00000000
> 82          3 2.1972246 1.00000000
> 83          3 2.4849066 1.00000000
> 84          3 2.6390573 1.00000000
> 85          3 3.6109179 1.00000000
> 86          3 2.3978953 1.00000000
> 87          3 2.1972246 1.00000000
> 88          3 1.6094379 1.00000000
> 89          3 3.0910425 1.00000000
> 90          3 3.6888795 1.00000000
> 91          3 3.3672958 1.00000000
> 92          3 3.4011974 1.00000000
> 93          3 2.4849066 1.00000000
> 94          3 3.4657359 1.00000000
> 95          3 4.0604430 1.00000000
> 96          3 3.6635616 1.00000000
> 97          3 3.6109179 1.00000000
> 98          3 3.8286414 1.00000000
> 99          3 3.6375862 1.00000000
> 100         3 3.7135721 1.00000000
> 101         3 3.8918203 0.80952381
> 102         3 3.7376696 0.85714286
> 103         3 3.0445224 0.66666667
> 104         3 3.2958369 0.33333333
> 105         3 2.7080502 0.00000000
> 106         3 1.9459101 0.00000000
> 107         3 2.4849066 0.04761905
> 108         3 1.9459101 0.00000000
> 109         3 0.6931472 0.00000000
>
> --
> View this message in context:
>
http://r.789695.n4.nabble.com/lmList-and-lapply-lm-different-std-errors-tp3088903p3088903.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Dec 2010 - lmList and lapply(... lm) different std. errors

[R] lmList and lapply(... lm) different std. errors

[R] lmList and lapply(... lm) different std. errors

Possibly Parallel Threads