thr3ads.net - R help - [R] two cols in a data frame are the same factor [Mar 2008]

If this information is useful, please help other people find it:
Share via:

Andres Legarra

2008-Mar-18 09:11 UTC

[R] two cols in a data frame are the same factor

Dear all,
I have a data set (QTL detection) where I have two cols of factors in
the data frame that correspond logically (in my model) to the same
factor. In fact these are haplotype classes.
Another real-life example would be family gas consumption as a
function of car company (e.g. Ford, GM, and Honda) (assuming 2 cars by
family).

An artificial example follows:
set.seed(1234)
L3 <- LETTERS[1:3]
(d <- data.frame( y=rnorm(10), fac=sample(L3, 10,
repl=TRUE),fac1=sample(L3,10,repl=T)))

 lm(y ~ fac+fac1,data=d)

and I get:

Coefficients:
(Intercept)         facB         facC        fac1B        fac1C
     0.3612      -0.9359      -0.2004      -2.1376      -0.5438

However, to respect my model, I need to constrain effects in fac and
fac1 to be the same, i.e. facB=fac1B and facC=fac1C. There are
logically just 4 unknowns (average,A,B,C).
With continuous covariates one might do y ~ I(cov1+cov2), but this is
not the case.

Is there any trick to do that?
Thanks,

Andres Legarra
INRA-SAGA
Toulouse, France

Andres Legarra

2008-Mar-18 12:01 UTC

head link

[R] two cols in a data frame are the same factor

Dear all,
 I have a data set (QTL detection) where I have two cols of factors in
 the data frame that correspond logically (in my model) to the same
 factor. In fact these are haplotype classes.
 Another real-life example would be family gas consumption as a
 function of car company (e.g. Ford, GM, and Honda) (assuming 2 cars by
 family).

 An artificial example follows:
 set.seed(1234)
 L3 <- LETTERS[1:3]
 (d <- data.frame( y=rnorm(10), fac=sample(L3, 10,
 repl=TRUE),fac1=sample(L3,10,repl=T)))

  lm(y ~ fac+fac1,data=d)

 and I get:

 Coefficients:
 (Intercept)         facB         facC        fac1B        fac1C
     0.3612      -0.9359      -0.2004      -2.1376      -0.5438

 However, to respect my model, I need to constrain effects in fac and
 fac1 to be the same, i.e. facB=fac1B and facC=fac1C. There are
 logically just 4 unknowns (average,A,B,C).
 With continuous covariates one might do y ~ I(cov1+cov2), but this is
 not the case.

 Is there any trick to do that?
 Thanks,

 Andres Legarra
 INRA-SAGA
 Toulouse, France

Michael Dewey

2008-Mar-19 17:57 UTC

head link

[R] two cols in a data frame are the same factor

At 09:11 18/03/2008, Andres Legarra wrote:>Dear all,
>I have a data set (QTL detection) where I have two cols of factors in
>the data frame that correspond logically (in my model) to the same
>factor. In fact these are haplotype classes.
>Another real-life example would be family gas consumption as a
>function of car company (e.g. Ford, GM, and Honda) (assuming 2 cars by
>family).
Unless I completely misunderstand this it looks like you have the 
dataset in wide format when you really wanted it in long format (to 
use the terminology of ?reshape). Then you would fit a model allowing 
for the clustering by family.

>An artificial example follows:
>set.seed(1234)
>L3 <- LETTERS[1:3]
>(d <- data.frame( y=rnorm(10), fac=sample(L3, 10,
>repl=TRUE),fac1=sample(L3,10,repl=T)))
>
>  lm(y ~ fac+fac1,data=d)
>
>and I get:
>
>Coefficients:
>(Intercept)         facB         facC        fac1B        fac1C
>      0.3612      -0.9359      -0.2004      -2.1376      -0.5438
>
>However, to respect my model, I need to constrain effects in fac and
>fac1 to be the same, i.e. facB=fac1B and facC=fac1C. There are
>logically just 4 unknowns (average,A,B,C).
>With continuous covariates one might do y ~ I(cov1+cov2), but this is
>not the case.
>
>Is there any trick to do that?
>Thanks,
>
>Andres Legarra
>INRA-SAGA
>Toulouse, France
Michael Dewey
http://www.aghmed.fsnet.co.uk

Andres Legarra

2008-Mar-20 08:25 UTC

head link

[R] two cols in a data frame are the same factor

Hi,
I am afraid you misunderstood it. I do not have repeated records, but
for every record I have two, possibly different, simultaneously
present, instanciations of an explanatory variable.

My data is as follows :

yield haplo1 haplo2
100  A B
151  B A
212  A A

So I have one effect (haplo), but two copies of each affect "yield".
If I use lm() I get:>
a=data.frame(yield=c(100,151,212),haplo1=c("A","B","A"),haplo2=c("B","A","A"))Call:
lm(formula = yield ~ -1 + haplo1 + haplo2, data = a)

Coefficients:
 haploA   haploB  haplo2B
    212      151     -112


But I get different coefficients for the two "A"s (in fact oe was set
to 0) and the Two "Bs" . That is, the model has four unknowns but in
my example I have just two!

A least-squares solution is simple to do by hand:

 X=matrix(c(1,1,1,1,2,0),ncol=2) #the incidence matrix> X     [,1] [,2]
[1,]    1    1
[2,]    1    2
[3,]    1    0> solve(crossprod(X,X),crossprod(X,a$yield))         [,1]
[1,] 184.8333
[2,] -30.5000

where [1,] is the solution for A and [2,] is the solution for B

This is not difficult to do by hand, but it is for a simple case and I
miss all the machinery in lm()

Thank you
Andres

On Wed, Mar 19, 2008 at 6:57 PM, Michael Dewey <info at
aghmed.fsnet.co.uk> wrote:> At 09:11 18/03/2008, Andres Legarra wrote:
>  >Dear all,
>  >I have a data set (QTL detection) where I have two cols of factors in
>  >the data frame that correspond logically (in my model) to the same
>  >factor. In fact these are haplotype classes.
>  >Another real-life example would be family gas consumption as a
>  >function of car company (e.g. Ford, GM, and Honda) (assuming 2 cars by
>  >family).
>
>  Unless I completely misunderstand this it looks like you have the
>  dataset in wide format when you really wanted it in long format (to
>  use the terminology of ?reshape). Then you would fit a model allowing
>  for the clustering by family.
>
>
>
>
>  >An artificial example follows:
>  >set.seed(1234)
>  >L3 <- LETTERS[1:3]
>  >(d <- data.frame( y=rnorm(10), fac=sample(L3, 10,
>  >repl=TRUE),fac1=sample(L3,10,repl=T)))
>  >
>  >  lm(y ~ fac+fac1,data=d)
>  >
>  >and I get:
>  >
>  >Coefficients:
>  >(Intercept)         facB         facC        fac1B        fac1C
>  >      0.3612      -0.9359      -0.2004      -2.1376      -0.5438
>  >
>  >However, to respect my model, I need to constrain effects in fac and
>  >fac1 to be the same, i.e. facB=fac1B and facC=fac1C. There are
>  >logically just 4 unknowns (average,A,B,C).
>  >With continuous covariates one might do y ~ I(cov1+cov2), but this is
>  >not the case.
>  >
>  >Is there any trick to do that?
>  >Thanks,
>  >
>  >Andres Legarra
>  >INRA-SAGA
>  >Toulouse, France
>
>  Michael Dewey
>  http://www.aghmed.fsnet.co.uk
>
>

R help - Mar 2008 - two cols in a data frame are the same factor

[R] two cols in a data frame are the same factor

[R] two cols in a data frame are the same factor

[R] two cols in a data frame are the same factor

[R] two cols in a data frame are the same factor