On Thu, May 19, 2011 at 7:21 PM, karena <dr.jzhou at gmail.com>
wrote:> Hi, I have a question about how to do covariate adjustment.
>
> I have two sets of 'gene expression' data. They are from two
different
> tissue types, 'liver' and 'brain', respectively.
> The purpose of my analysis is to compare the pattern of the whole genome
> 'gene expression' between the two tissue types.
> I have 'age' and 'sex' as covariates. Since 'age'
and 'sex' definitely have
> influence on gene expression, I need to first filter out the proporation of
> 'gene expression' attributable to 'age' and 'sex',
then compare the
> 'remaining gene expression' value between 'liver' and
'brain'.
> How to do the covariate adjustment to remove the effects of these two
> covariates?
> Should I do a 'step-wise' regression or something?
> Which function in R should I use?
>
> I am new to this field, and really appreciate your help!
Go to your local library and get an introductory book on linear
regression (or linear models) that also covers basics of multi-variate
regression.
When you learn a little bit about regression, read at least the first
few chapters of Julian Faraway's book on regression in R,
cran.r-project.org/doc/contrib/Faraway-PRA.pdf
When you're done, use roughly the following command, assuming age is
is a numeric variable and expression contains the expression data with
genes in columns.
sex.num = as.numeric(as.factor(sex))
tissue.num = as.numeric(as.factor(tissue))
fit = lm(expression~sex.num + age + tissue.num)
The variable fit will be a list with one element per gene. In each
element, look at the component $coefficients[4,4], which is the
p-value for the hypothesis that the coefficient of tissue is zero.
Since you likely have thousands of genes, you will have to do a
multiple testing correction, which you could do for example using the
qvalue package in which the function qvalue calculates the false
discovery rate.
Peter