thr3ads.net - R help - [R] two-factor linear models with missing cells [Aug 2009]

If this information is useful, please help other people find it:
Share via:

Murray Jorgensen

2009-Aug-02 22:36 UTC

[R] two-factor linear models with missing cells

I am wondering how to interpret the parameter estimates that lm()
reports in this sort of situation:

y = round(rnorm(n=24,mean=5,sd=2),2)
A = gl(3,2,24,labels=c("one","two","three"))
B =
gl(4,6,24,labels=c("i","ii","iii","iv"))
# Make both observations for A=1, B=4 missing
y[19] = NA
y[20] = NA
data.frame(y,A,B)
nonadd = lm(y ~ A * B)

> summary(nonadd)
Call:
lm(formula = y ~ A * B)

Residuals:
Min 1Q Median 3Q Max
-3.555e+00 -7.675e-01 -6.939e-17 7.675e-01 3.555e+00

Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.755 1.667 2.252 0.0457 *
Atwo 1.655 2.358 0.702 0.4974
Athree 3.330 2.358 1.412 0.1856
Bii 1.435 2.358 0.609 0.5552
Biii 2.055 2.358 0.871 0.4021
Biv -1.635 2.358 -0.693 0.5025
Atwo:Bii -1.145 3.335 -0.343 0.7378
Athree:Bii -4.535 3.335 -1.360 0.2011
Atwo:Biii -3.230 3.335 -0.969 0.3536
Athree:Biii -2.105 3.335 -0.631 0.5408
Atwo:Biv 1.655 3.335 0.496 0.6295
Athree:Biv NA NA NA NA
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 2.358 on 11 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.2797, Adjusted R-squared: -0.3752
F-statistic: 0.4271 on 10 and 11 DF, p-value: 0.9044
> fitted(nonadd)1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21
3.755 3.755 5.410 5.410 7.085 7.085 5.190 5.190 5.700 5.700 3.985 3.985
5.810 5.810 4.235 4.235 7.035 7.035 5.430
22 23 24
5.430 5.450 5.450> t(model.matrix(nonadd)%*%coef(nonadd))1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21 22 23 24
[1,] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

I guess that the parameter estimates reported are linear combinations of
the cell means, but which linear combinations and how does lm() decide
what parameters to report?

Cheers, Murray

-- 
Dr Murray Jorgensen      http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: maj at waikato.ac.nz                                Fax 7 838 4155
Phone  +64 7 838 4773 wk    Home +64 7 825 0441   Mobile 021 0200 8350

Murray Jorgensen

2009-Aug-02 22:40 UTC

head link

[R] two-factor linear models with missing cells

I am wondering how to interpret the parameter estimates that lm()
reports in this sort of situation:

y = round(rnorm(n=24,mean=5,sd=2),2)
A = gl(3,2,24,labels=c("one","two","three"))
B =
gl(4,6,24,labels=c("i","ii","iii","iv"))
# Make both observations for A=1, B=4 missing
y[19] = NA
y[20] = NA
data.frame(y,A,B)
nonadd = lm(y ~ A * B)

> summary(nonadd)
Call:
lm(formula = y ~ A * B)

Residuals:
Min 1Q Median 3Q Max
-3.555e+00 -7.675e-01 -6.939e-17 7.675e-01 3.555e+00

Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.755 1.667 2.252 0.0457 *
Atwo 1.655 2.358 0.702 0.4974
Athree 3.330 2.358 1.412 0.1856
Bii 1.435 2.358 0.609 0.5552
Biii 2.055 2.358 0.871 0.4021
Biv -1.635 2.358 -0.693 0.5025
Atwo:Bii -1.145 3.335 -0.343 0.7378
Athree:Bii -4.535 3.335 -1.360 0.2011
Atwo:Biii -3.230 3.335 -0.969 0.3536
Athree:Biii -2.105 3.335 -0.631 0.5408
Atwo:Biv 1.655 3.335 0.496 0.6295
Athree:Biv NA NA NA NA
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 2.358 on 11 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.2797, Adjusted R-squared: -0.3752
F-statistic: 0.4271 on 10 and 11 DF, p-value: 0.9044
> fitted(nonadd)1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21
3.755 3.755 5.410 5.410 7.085 7.085 5.190 5.190 5.700 5.700 3.985 3.985
5.810 5.810 4.235 4.235 7.035 7.035 5.430
22 23 24
5.430 5.450 5.450> t(model.matrix(nonadd)%*%coef(nonadd))1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21 22 23 24
[1,] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

I guess that the parameter estimates reported are linear combinations of
the cell means, but which linear combinations and how does lm() decide
what parameters to report?

Cheers, Murray

-- 
Dr Murray Jorgensen      http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: maj at waikato.ac.nz                                Fax 7 838 4155
Phone  +64 7 838 4773 wk    Home +64 7 825 0441   Mobile 021 0200 8350

David Winsemius

2009-Aug-02 22:59 UTC

head link

[R] two-factor linear models with missing cells

Does this help at all?

<after your code...>

 > contrasts(A)
       two three
one     0     0
two     1     0
three   0     1
 > contrasts(B)
     ii iii iv
i    0   0  0
ii   1   0  0
iii  0   1  0
iv   0   0  1

 > contrasts(A:B)
           one:ii one:iii one:iv two:i two:ii two:iii two:iv three:i  
three:ii three:iii three:iv
one:i          0       0      0     0      0       0      0        
0        0         0        0
one:ii         1       0      0     0      0       0      0        
0        0         0        0
one:iii        0       1      0     0      0       0      0        
0        0         0        0
one:iv         0       0      1     0      0       0      0        
0        0         0        0
two:i          0       0      0     1      0       0      0        
0        0         0        0
two:ii         0       0      0     0      1       0      0        
0        0         0        0
two:iii        0       0      0     0      0       1      0        
0        0         0        0
two:iv         0       0      0     0      0       0      1        
0        0         0        0
three:i        0       0      0     0      0       0      0        
1        0         0        0
three:ii       0       0      0     0      0       0      0        
0        1         0        0
three:iii      0       0      0     0      0       0      0        
0        0         1        0
three:iv       0       0      0     0      0       0      0        
0        0         0        1
-- 
David
On Aug 2, 2009, at 6:40 PM, Murray Jorgensen wrote:
> I am wondering how to interpret the parameter estimates that lm()
> reports in this sort of situation:
>
> y = round(rnorm(n=24,mean=5,sd=2),2)
> A = gl(3,2,24,labels=c("one","two","three"))
> B =
gl(4,6,24,labels=c("i","ii","iii","iv"))
> # Make both observations for A=1, B=4 missing
> y[19] = NA
> y[20] = NA
> data.frame(y,A,B)
> nonadd = lm(y ~ A * B)
>
>
>> summary(nonadd)
>
> Call:
> lm(formula = y ~ A * B)
>
> Residuals:
> Min 1Q Median 3Q Max
> -3.555e+00 -7.675e-01 -6.939e-17 7.675e-01 3.555e+00
>
> Coefficients: (1 not defined because of singularities)
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 3.755 1.667 2.252 0.0457 *
> Atwo 1.655 2.358 0.702 0.4974
> Athree 3.330 2.358 1.412 0.1856
> Bii 1.435 2.358 0.609 0.5552
> Biii 2.055 2.358 0.871 0.4021
> Biv -1.635 2.358 -0.693 0.5025
> Atwo:Bii -1.145 3.335 -0.343 0.7378
> Athree:Bii -4.535 3.335 -1.360 0.2011
> Atwo:Biii -3.230 3.335 -0.969 0.3536
> Athree:Biii -2.105 3.335 -0.631 0.5408
> Atwo:Biv 1.655 3.335 0.496 0.6295
> Athree:Biv NA NA NA NA
> ---
> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> Residual standard error: 2.358 on 11 degrees of freedom
> (2 observations deleted due to missingness)
> Multiple R-squared: 0.2797, Adjusted R-squared: -0.3752
> F-statistic: 0.4271 on 10 and 11 DF, p-value: 0.9044
>
>> fitted(nonadd)
> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21
> 3.755 3.755 5.410 5.410 7.085 7.085 5.190 5.190 5.700 5.700 3.985  
> 3.985
> 5.810 5.810 4.235 4.235 7.035 7.035 5.430
> 22 23 24
> 5.430 5.450 5.450
>> t(model.matrix(nonadd)%*%coef(nonadd))
> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21 22 23 24
> [1,] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
>
> I guess that the parameter estimates reported are linear  
> combinations of
> the cell means, but which linear combinations and how does lm() decide
> what parameters to report?
>
> Cheers, Murray
>
> -- 
> Dr Murray Jorgensen      http://www.stats.waikato.ac.nz/Staff/maj.html
> Department of Statistics, University of Waikato, Hamilton, New Zealand
David Winsemius, MD
Heritage Laboratories
West Hartford, CT

Peter Dalgaard

2009-Aug-03 05:47 UTC

head link

[R] two-factor linear models with missing cells

Murray Jorgensen wrote:> I am wondering how to interpret the parameter estimates that lm()
> reports in this sort of situation:
> 
> y = round(rnorm(n=24,mean=5,sd=2),2)
> A = gl(3,2,24,labels=c("one","two","three"))
> B =
gl(4,6,24,labels=c("i","ii","iii","iv"))
> # Make both observations for A=1, B=4 missing
> y[19] = NA
> y[20] = NA
> data.frame(y,A,B)
> nonadd = lm(y ~ A * B)
> 
> 
>> summary(nonadd)
> 
> Call:
> lm(formula = y ~ A * B)
> 
> Residuals:
> Min 1Q Median 3Q Max
> -3.555e+00 -7.675e-01 -6.939e-17 7.675e-01 3.555e+00
> 
> Coefficients: (1 not defined because of singularities)
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 3.755 1.667 2.252 0.0457 *
> Atwo 1.655 2.358 0.702 0.4974
> Athree 3.330 2.358 1.412 0.1856
> Bii 1.435 2.358 0.609 0.5552
> Biii 2.055 2.358 0.871 0.4021
> Biv -1.635 2.358 -0.693 0.5025
> Atwo:Bii -1.145 3.335 -0.343 0.7378
> Athree:Bii -4.535 3.335 -1.360 0.2011
> Atwo:Biii -3.230 3.335 -0.969 0.3536
> Athree:Biii -2.105 3.335 -0.631 0.5408
> Atwo:Biv 1.655 3.335 0.496 0.6295
> Athree:Biv NA NA NA NA
> ---
> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
> 
> Residual standard error: 2.358 on 11 degrees of freedom
> (2 observations deleted due to missingness)
> Multiple R-squared: 0.2797, Adjusted R-squared: -0.3752
> F-statistic: 0.4271 on 10 and 11 DF, p-value: 0.9044
> 
>> fitted(nonadd)
> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21
> 3.755 3.755 5.410 5.410 7.085 7.085 5.190 5.190 5.700 5.700 3.985 3.985
> 5.810 5.810 4.235 4.235 7.035 7.035 5.430
> 22 23 24
> 5.430 5.450 5.450
>> t(model.matrix(nonadd)%*%coef(nonadd))
> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21 22 23 24
> [1,] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> 
> I guess that the parameter estimates reported are linear combinations of
> the cell means, but which linear combinations and how does lm() decide
> what parameters to report?
> 
> Cheers, Murray
> 
What's the problem? The parameters are defined as usual for the two-way 
layout:

The intercept is the fitted value in the top left corner
The A coefficients are the fitted values in the first column minus the 
intercept.
The B coefficients vice versa.
The interaction coefficients are the fitted values minus the sum of the 
the intercept and the corresponding A and B coefficients.

One interaction coefficient is set missing because you have no data, but 
except for that, the fitted values equal the cell means.

-- 
    O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Aug 2009 - two-factor linear models with missing cells

[R] two-factor linear models with missing cells

[R] two-factor linear models with missing cells

[R] two-factor linear models with missing cells

[R] two-factor linear models with missing cells

Possibly Parallel Threads