Friederike -
The program is doing exactly what you asked it to do, but please
notice that
varprobe<-c(var(schizo.matrix[x,]))
creates a vector containing all the elements of the variance-covariance
matrix of schizo.matrix[x,], which is a 20000x150 matrix, giving a
variance-covariance matrix of 150x150 or 22500 elements. You then rank all
of the elements of this variance-covariance matrix, and ask for the
ones that are ranked higher than 10000, leaving 22500-10000=12500
elements. So now we should probably look at what you wanted to do:
x <- 1:20000
y <- 2:151
schizo.matrix<-data.matrix(schizo[,y])
varprobe<-apply(schizo.matrix[x,],1,var) # calculate the variance of each
row
# reorder the rows of schizo.matrix to correspond to varprobe and
# select the top 10000
schizo.sub = schizo.matrix[order(varprobe,decreasing=TRUE),][1:10000,]
There are other ways of extracting the top 10000, but I think that this
shows how to calculate what you really want.
- Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spector at stat.berkeley.edu
On Fri, 3 Nov 2006, Friederike Barthel wrote:
> Dear List
>
> I am currently running a microarray analysis on the dataset schizo and
would
> like to filter out all genes with a low variance. However, when running the
> code detailed below, I end up with 12,500 genes in my final set rather than
> the 10,000 I was looking for. Can anyone pinpoint where I am going wrong?
>
> ***********reading in data**********
>
> schizo<-read.table("octassign_data.txt",header=T,
sep="\t")
>
> dim(schizo)
>
> head(schizo)
>
> attach(schizo)
>
> ***********creating matrix and calculating variance across
probesets********
>
> x<-c(1:20000)
>
> y<-c(2:151)
>
> schizo.matrix<-data.matrix(schizo[,y])
>
> varprobe<-c(var(schizo.matrix[x,]))
>
> hist(varprobe)
>
> **************filter out low variance*************
>
> top10000 <- which(rank(varprobe)>10000)
>
> schizo.sub<-schizo[top10000,]
>> dim(schizo.sub)
> [1] 12500 151
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>