Hi, I have a microarray dataset from Agilent chips. The data were really log ratio between test samples and a universal reference RNA. Because of the nature of log ratios, coefficient of variation (CV) doesn't really apply to this kind of data due to the fact that mean of log ratio is very close to 0. What kind of measurements would people use to measure the dispersion so that I can compare across genes on the chip to find stably expressed genes? something similar to CV would be easily interpreted? Thanks John [[alternative HTML version deleted]]
On Tue, Feb 21, 2012 at 1:44 PM, array chip <arrayprofile at yahoo.com> wrote:> Hi, I have a microarray dataset from Agilent chips. The data were really log ratio between test samples and a universal reference RNA. Because of the nature of log ratios, coefficient of variation (CV) doesn't really apply to this kind of data due to the fact that mean of log ratio is very close to 0. What kind of measurements would people use to measure the dispersion so that I can compare across genes on the chip to find stably expressed genes? something similar to CV would be easily interpreted? >You may want to ask this question in the bioconductor list since it isn't really an R question. Do you also have some sort of an expression p-value? If you only have expression itself, you could simply look at variance and hope that non-expressed genes have expression values determined chiefly by noise which varies quite a bit, so they would have a higher variance than genes with stable expression higher than the typical noise. HTH, Peter
On Feb 21, 2012, at 22:44 , array chip wrote:> Hi, I have a microarray dataset from Agilent chips. The data were really log ratio between test samples and a universal reference RNA. Because of the nature of log ratios, coefficient of variation (CV) doesn't really apply to this kind of data due to the fact that mean of log ratio is very close to 0. What kind of measurements would people use to measure the dispersion so that I can compare across genes on the chip to find stably expressed genes? something similar to CV would be easily interpreted?What's wrong with the SD of log(X)?? That's pretty much equivalent to CV at least for CV's less than 50%:> x <- rlnorm(1000,5,.5) > sd(x)/mean(x)[1] 0.5252718> sd(log(x))[1] 0.5037995 Looking for a relative measure of precision _after_ taking log strikes me as very odd. If you scale your original observations by a constant factor, this will be _added_ to the log transformed data, without affecting their variation at all. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
If a variable y has (approximately) constant CV, then log(y) has (approximately) constant variance. So, use the standard deviation of the data. ---- begin included message --- Hi, I have a microarray dataset from Agilent chips. The data were really log ratio between test samples and a universal reference RNA. Because of the nature of log ratios, coefficient of variation (CV) doesn't really apply to this kind of data due to the fact that mean of log ratio is very close to 0. What kind of measurements would people use to measure the dispersion so that I can compare across genes on the chip to find stably expressed genes? something similar to CV would be easily interpreted?