Hello everyone. Like others on this list, I'm new to R, and really not much of a programmer, so please excuse any obtuse questions! I'm trying to repeat a function across all possible combinations of vectors in a data frame. I'd hugely appreciate any advice! Here's what I'm doing: I have some data: 40 samples, ~460 000 different readings between 1 and 0 for each sample. I would like to make R spit out a matrix of distances between the samples. So far, I have made a function to calculate the distance between any two samples: DistanceCalc<-function(x,y){#x and y are both vectors - the entire reading set for sample x and #sample y, respectively distance<-sqrt(sum((x-y).^2)) distanceCorrected<-distance/sqrt(length(x))#to force the maximum possible value to =1 print(distanceCorrected) } The next thing I want to do is to make this function run to compare all possible combinations of my samples (1vs1, 1vs2, 1vs3...2vs1, 2vs2 etc). In python, the only other programming language I have ever used, I would just use a "for" loop. I have asked the internet how to do this, but the overwhelming response seems to be "you don't want to do it like that - use the 'apply' functions". I've tried to use the apply functions, but I tend to find that I can only give my DistanceCalc function a single vector (I can tell it where to find x, but not where to find y, or vice versa). I've also found the 'by' and the 'outer' functions, but I'm likewise failing at making those work, e.g.> distancetable<-outer(DataWithoutBlanks,DataWithoutBlanks,FUN=DistanceCalc)Error in x - y : non-numeric argument to binary operator I think this may be because my data has headers and the function is trying to calculate the difference between the names of my samples, but I don't know how to correct this. Would really appreciate your help! Jen -- View this message in context: http://r.789695.n4.nabble.com/repeating-a-function-across-a-data-frame-tp4638643.html Sent from the R help mailing list archive at Nabble.com.
Hi did you find function dist? It seems that it can do directly what you want. Regards Petr> > Hello everyone. Like others on this list, I'm new to R, and really notmuch> of a programmer, so please excuse any obtuse questions! I'm trying to > repeat a function across all possible combinations of vectors in a data > frame. I'd hugely appreciate any advice! > > Here's what I'm doing: > > I have some data: 40 samples, ~460 000 different readings between 1 and0> for each sample. I would like to make R spit out a matrix of distances > between the samples. So far, I have made a function to calculate the > distance between any two samples: > > DistanceCalc<-function(x,y){#x and y are both vectors - the entirereading> set for sample x and > #sample y, respectively > distance<-sqrt(sum((x-y).^2)) > distanceCorrected<-distance/sqrt(length(x))#to force the maximumpossible> value to =1 > print(distanceCorrected) > } > > The next thing I want to do is to make this function run to compare all > possible combinations of my samples (1vs1, 1vs2, 1vs3...2vs1, 2vs2 etc).In> python, the only other programming language I have ever used, I wouldjust> use a "for" loop. I have asked the internet how to do this, but the > overwhelming response seems to be "you don't want to do it like that -use> the 'apply' functions". I've tried to use the apply functions, but Itend> to find that I can only give my DistanceCalc function a single vector (Ican> tell it where to find x, but not where to find y, or vice versa). I'vealso> found the 'by' and the 'outer' functions, but I'm likewise failing atmaking> those work, e.g. > > >distancetable<-outer(DataWithoutBlanks,DataWithoutBlanks,FUN=DistanceCalc)> Error in x - y : non-numeric argument to binary operator > > I think this may be because my data has headers and the function istrying> to calculate the difference between the names of my samples, but I don't > know how to correct this. > > Would really appreciate your help! > > Jen > > > > -- > View this message in context: http://r.789695.n4.nabble.com/repeating-a- > function-across-a-data-frame-tp4638643.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
As Petr suggests, the dist() function will do much of the work for you. For example ... # example matrix of data nsamples <- 40 nreadings <- 460000 dat <- matrix(runif(nsamples*nreadings), nrow=nsamples) # Euclidean distance between the ROWS of dat distance <- dist(dat) Jean JenniferH <jenachobbs@gmail.com> wrote on 08/01/2012 04:42:48 AM:> > Hello everyone. Like others on this list, I'm new to R, and really notmuch> of a programmer, so please excuse any obtuse questions! I'm trying to > repeat a function across all possible combinations of vectors in a data > frame. I'd hugely appreciate any advice! > > Here's what I'm doing: > > I have some data: 40 samples, ~460 000 different readings between 1 and0> for each sample. I would like to make R spit out a matrix of distances > between the samples. So far, I have made a function to calculate the > distance between any two samples: > > DistanceCalc<-function(x,y){#x and y are both vectors - the entirereading> set for sample x and > #sample y, respectively > distance<-sqrt(sum((x-y).^2)) > distanceCorrected<-distance/sqrt(length(x))#to force the maximumpossible> value to =1 > print(distanceCorrected) > } > > The next thing I want to do is to make this function run to compare all > possible combinations of my samples (1vs1, 1vs2, 1vs3...2vs1, 2vs2 etc).In> python, the only other programming language I have ever used, I wouldjust> use a "for" loop. I have asked the internet how to do this, but the > overwhelming response seems to be "you don't want to do it like that -use> the 'apply' functions". I've tried to use the apply functions, but Itend> to find that I can only give my DistanceCalc function a single vector (Ican> tell it where to find x, but not where to find y, or vice versa). I'vealso> found the 'by' and the 'outer' functions, but I'm likewise failing atmaking> those work, e.g. > > >distancetable<-outer(DataWithoutBlanks,DataWithoutBlanks,FUN=DistanceCalc)> Error in x - y : non-numeric argument to binary operator > > I think this may be because my data has headers and the function istrying> to calculate the difference between the names of my samples, but I don't > know how to correct this. > > Would really appreciate your help! > > Jen[[alternative HTML version deleted]]
Hi Petr and Jean, thanks very much, problem solved! Really appreciate your help. Jennifer -- View this message in context: http://r.789695.n4.nabble.com/repeating-a-function-across-a-data-frame-tp4638643p4638678.html Sent from the R help mailing list archive at Nabble.com.