Keizer_71
2008-Feb-19 23:34 UTC
[R] Calculating the distance samples using distance metics method
***********reading in data********** data<-read.table("microarray.txt",header=T, sep="\t") head(data) dim(data) attach(data) ***********creating matrix and calculating variance across probesets******** x<-1:20000 y<-2:141 data.matrix<-data.matrix(data[,y]) variableprobe<-apply(data.matrix[x,],1,var) hist(variableprobe) **************filter out low variance************* data.sub = data.matrix[order(variableprobe,decreasing=TRUE),][1:10000,] dim(data.sub) [1] 10000 140 What is the best way to calculate the distances between the samples using the euclidean or manhattan distance metrics? any suggestions? -- View this message in context: http://www.nabble.com/Calculating-the-distance-samples-using-distance-metics-method-tp15578860p15578860.html Sent from the R help mailing list archive at Nabble.com.
Bill.Venables at csiro.au
2008-Feb-19 23:58 UTC
[R] Calculating the distance samples using distance metics method
Distance matrices are not usually and end in themselves but a means to some other end. Rather than ask what is the best way to calculate such a huge distance matrix, maybe the question you should ask yourself is what are you going to do with it if ever you did manage to calculate it. Maybe you can bypass the distance matrix calculation and get to the end point by some other means. For example, if the eventual goal is clustering, then perhaps something like clara() in the 'cluster' package will do the job more effectively. It is designed to handle situations of this kind. Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile: +61 4 8819 4402 Home Phone: +61 7 3286 7700 mailto:Bill.Venables at csiro.au http://www.cmis.csiro.au/bill.venables/ -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Keizer_71 Sent: Wednesday, 20 February 2008 9:35 AM To: r-help at r-project.org Subject: [R] Calculating the distance samples using distance metics method ***********reading in data********** data<-read.table("microarray.txt",header=T, sep="\t") head(data) dim(data) attach(data) ***********creating matrix and calculating variance across probesets******** x<-1:20000 y<-2:141 data.matrix<-data.matrix(data[,y]) variableprobe<-apply(data.matrix[x,],1,var) hist(variableprobe) **************filter out low variance************* data.sub = data.matrix[order(variableprobe,decreasing=TRUE),][1:10000,] dim(data.sub) [1] 10000 140 What is the best way to calculate the distances between the samples using the euclidean or manhattan distance metrics? any suggestions? -- View this message in context: http://www.nabble.com/Calculating-the-distance-samples-using-distance-me tics-method-tp15578860p15578860.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.