Dear all,
I have a data set (QTL detection) where I have two cols of factors in
the data frame that correspond logically (in my model) to the same
factor. In fact these are haplotype classes.
Another real-life example would be family gas consumption as a
function of car company (e.g. Ford, GM, and Honda) (assuming 2 cars by
family).
An artificial example follows:
set.seed(1234)
L3 <- LETTERS[1:3]
(d <- data.frame( y=rnorm(10), fac=sample(L3, 10,
repl=TRUE),fac1=sample(L3,10,repl=T)))
 lm(y ~ fac+fac1,data=d)
and I get:
Coefficients:
(Intercept)         facB         facC        fac1B        fac1C
     0.3612      -0.9359      -0.2004      -2.1376      -0.5438
However, to respect my model, I need to constrain effects in fac and
fac1 to be the same, i.e. facB=fac1B and facC=fac1C. There are
logically just 4 unknowns (average,A,B,C).
With continuous covariates one might do y ~ I(cov1+cov2), but this is
not the case.
Is there any trick to do that?
Thanks,
Andres Legarra
INRA-SAGA
Toulouse, France
Dear all,
 I have a data set (QTL detection) where I have two cols of factors in
 the data frame that correspond logically (in my model) to the same
 factor. In fact these are haplotype classes.
 Another real-life example would be family gas consumption as a
 function of car company (e.g. Ford, GM, and Honda) (assuming 2 cars by
 family).
 An artificial example follows:
 set.seed(1234)
 L3 <- LETTERS[1:3]
 (d <- data.frame( y=rnorm(10), fac=sample(L3, 10,
 repl=TRUE),fac1=sample(L3,10,repl=T)))
  lm(y ~ fac+fac1,data=d)
 and I get:
 Coefficients:
 (Intercept)         facB         facC        fac1B        fac1C
     0.3612      -0.9359      -0.2004      -2.1376      -0.5438
 However, to respect my model, I need to constrain effects in fac and
 fac1 to be the same, i.e. facB=fac1B and facC=fac1C. There are
 logically just 4 unknowns (average,A,B,C).
 With continuous covariates one might do y ~ I(cov1+cov2), but this is
 not the case.
 Is there any trick to do that?
 Thanks,
 Andres Legarra
 INRA-SAGA
 Toulouse, France
At 09:11 18/03/2008, Andres Legarra wrote:>Dear all, >I have a data set (QTL detection) where I have two cols of factors in >the data frame that correspond logically (in my model) to the same >factor. In fact these are haplotype classes. >Another real-life example would be family gas consumption as a >function of car company (e.g. Ford, GM, and Honda) (assuming 2 cars by >family).Unless I completely misunderstand this it looks like you have the dataset in wide format when you really wanted it in long format (to use the terminology of ?reshape). Then you would fit a model allowing for the clustering by family.>An artificial example follows: >set.seed(1234) >L3 <- LETTERS[1:3] >(d <- data.frame( y=rnorm(10), fac=sample(L3, 10, >repl=TRUE),fac1=sample(L3,10,repl=T))) > > lm(y ~ fac+fac1,data=d) > >and I get: > >Coefficients: >(Intercept) facB facC fac1B fac1C > 0.3612 -0.9359 -0.2004 -2.1376 -0.5438 > >However, to respect my model, I need to constrain effects in fac and >fac1 to be the same, i.e. facB=fac1B and facC=fac1C. There are >logically just 4 unknowns (average,A,B,C). >With continuous covariates one might do y ~ I(cov1+cov2), but this is >not the case. > >Is there any trick to do that? >Thanks, > >Andres Legarra >INRA-SAGA >Toulouse, FranceMichael Dewey http://www.aghmed.fsnet.co.uk
Hi, I am afraid you misunderstood it. I do not have repeated records, but for every record I have two, possibly different, simultaneously present, instanciations of an explanatory variable. My data is as follows : yield haplo1 haplo2 100 A B 151 B A 212 A A So I have one effect (haplo), but two copies of each affect "yield". If I use lm() I get:> a=data.frame(yield=c(100,151,212),haplo1=c("A","B","A"),haplo2=c("B","A","A"))Call: lm(formula = yield ~ -1 + haplo1 + haplo2, data = a) Coefficients: haploA haploB haplo2B 212 151 -112 But I get different coefficients for the two "A"s (in fact oe was set to 0) and the Two "Bs" . That is, the model has four unknowns but in my example I have just two! A least-squares solution is simple to do by hand: X=matrix(c(1,1,1,1,2,0),ncol=2) #the incidence matrix> X[,1] [,2] [1,] 1 1 [2,] 1 2 [3,] 1 0> solve(crossprod(X,X),crossprod(X,a$yield))[,1] [1,] 184.8333 [2,] -30.5000 where [1,] is the solution for A and [2,] is the solution for B This is not difficult to do by hand, but it is for a simple case and I miss all the machinery in lm() Thank you Andres On Wed, Mar 19, 2008 at 6:57 PM, Michael Dewey <info at aghmed.fsnet.co.uk> wrote:> At 09:11 18/03/2008, Andres Legarra wrote: > >Dear all, > >I have a data set (QTL detection) where I have two cols of factors in > >the data frame that correspond logically (in my model) to the same > >factor. In fact these are haplotype classes. > >Another real-life example would be family gas consumption as a > >function of car company (e.g. Ford, GM, and Honda) (assuming 2 cars by > >family). > > Unless I completely misunderstand this it looks like you have the > dataset in wide format when you really wanted it in long format (to > use the terminology of ?reshape). Then you would fit a model allowing > for the clustering by family. > > > > > >An artificial example follows: > >set.seed(1234) > >L3 <- LETTERS[1:3] > >(d <- data.frame( y=rnorm(10), fac=sample(L3, 10, > >repl=TRUE),fac1=sample(L3,10,repl=T))) > > > > lm(y ~ fac+fac1,data=d) > > > >and I get: > > > >Coefficients: > >(Intercept) facB facC fac1B fac1C > > 0.3612 -0.9359 -0.2004 -2.1376 -0.5438 > > > >However, to respect my model, I need to constrain effects in fac and > >fac1 to be the same, i.e. facB=fac1B and facC=fac1C. There are > >logically just 4 unknowns (average,A,B,C). > >With continuous covariates one might do y ~ I(cov1+cov2), but this is > >not the case. > > > >Is there any trick to do that? > >Thanks, > > > >Andres Legarra > >INRA-SAGA > >Toulouse, France > > Michael Dewey > http://www.aghmed.fsnet.co.uk > >