On Sep 24, 2015, at 1:54 PM, Lorenzo Isella wrote:> On Thu, Sep 24, 2015 at 01:30:02PM -0700, David Winsemius wrote: >> >> On Sep 24, 2015, at 12:36 PM, Lorenzo Isella wrote: >> >>> Hi, >>> And thanks for your reply. >>> Essentially, your script gets the job done. >>> For instance, if I run >>> >>> mm <- cbind(5/(1:5), -2*sqrt(1:5)) >>> dst <- dist(mm) >>> dst2 <- as.matrix(dst) >>> diag(dst2) <- NA >>> idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9))) >>> >>> then it correctly detects the first two rows, where all the values are >>> larger than 0.9. >>> In other words, it detects the points that are at least 0.9 units away >>> from *all* the other points. >>> My other question (I did not realize this until I got your answer) is >>> the following: I have the distance matrix of a set of N points. >>> You gave me an algorithm two find all the points that are at least 0.9 >>> units away from any other points. >>> However, in some cases, for me it is OK even a weaker condition: find >>> a subset of k points (with k tunable) whose distance *from each other* >>> is greater than 0.9 units (even if their distance from some other >>> points may be smaller than 0.9). >> >> If I understand ..... Make a matrix of unique combinations, then apply by rows to get the qualifying columns that satisfy the distance criterion: >> >> mtxcomb <- combn(1:20, 5) >> goodcls <- apply(mtxcomb , 2, function(idx) all( dist( cbind( x[idx], y[idx]) ) > 0.9)) >> mtxcomb [ , goodcls] >> >> In my sample it was around 9% of the total 5 item combinations. >> >> snipped a lot of output: >> ..... >> [,1440] [,1441] >> [1,] 12 13 >> [2,] 13 16 >> [3,] 16 17 >> [4,] 19 19 >> [5,] 20 20 >>> dim( mtxcomb) >> [1] 5 15504 >> > > Hi, > Thanks for your reply. > I think I am getting there, but when I run your commands, I get this > error message > > Error in cbind(x[idx], y[idx]) : object 'x' not found > > Any idea why? Should I combine those 3 lines with something else?No idea. I was running the setup that you asked for in your original message which you have now omitted from the mail chain.> Cheers > > LorenzoDavid Winsemius Alameda, CA, USA
You defined x and y in your original email as:> x<-rnorm(20) > y<-rnorm(20) > > mm<-as.matrix(cbind(x,y)) > > dst<-(dist(mm))------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: David Winsemius [mailto:dwinsemius at comcast.net] Sent: Thursday, September 24, 2015 6:30 PM To: Lorenzo Isella Cc: David L Carlson; r-help at r-project.org Subject: Re: [R] Sampling the Distance Matrix On Sep 24, 2015, at 1:54 PM, Lorenzo Isella wrote:> On Thu, Sep 24, 2015 at 01:30:02PM -0700, David Winsemius wrote: >> >> On Sep 24, 2015, at 12:36 PM, Lorenzo Isella wrote: >> >>> Hi, >>> And thanks for your reply. >>> Essentially, your script gets the job done. >>> For instance, if I run >>> >>> mm <- cbind(5/(1:5), -2*sqrt(1:5)) >>> dst <- dist(mm) >>> dst2 <- as.matrix(dst) >>> diag(dst2) <- NA >>> idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9))) >>> >>> then it correctly detects the first two rows, where all the values are >>> larger than 0.9. >>> In other words, it detects the points that are at least 0.9 units away >>> from *all* the other points. >>> My other question (I did not realize this until I got your answer) is >>> the following: I have the distance matrix of a set of N points. >>> You gave me an algorithm two find all the points that are at least 0.9 >>> units away from any other points. >>> However, in some cases, for me it is OK even a weaker condition: find >>> a subset of k points (with k tunable) whose distance *from each other* >>> is greater than 0.9 units (even if their distance from some other >>> points may be smaller than 0.9). >> >> If I understand ..... Make a matrix of unique combinations, then apply by rows to get the qualifying columns that satisfy the distance criterion: >> >> mtxcomb <- combn(1:20, 5) >> goodcls <- apply(mtxcomb , 2, function(idx) all( dist( cbind( x[idx], y[idx]) ) > 0.9)) >> mtxcomb [ , goodcls] >> >> In my sample it was around 9% of the total 5 item combinations. >> >> snipped a lot of output: >> ..... >> [,1440] [,1441] >> [1,] 12 13 >> [2,] 13 16 >> [3,] 16 17 >> [4,] 19 19 >> [5,] 20 20 >>> dim( mtxcomb) >> [1] 5 15504 >> > > Hi, > Thanks for your reply. > I think I am getting there, but when I run your commands, I get this > error message > > Error in cbind(x[idx], y[idx]) : object 'x' not found > > Any idea why? Should I combine those 3 lines with something else?No idea. I was running the setup that you asked for in your original message which you have now omitted from the mail chain.> Cheers > > LorenzoDavid Winsemius Alameda, CA, USA
Absolutely right! Thanks to both David for their help. Cheers Lorenzo On Fri, Sep 25, 2015 at 01:54:54PM +0000, David L Carlson wrote:>You defined x and y in your original email as: > >> x<-rnorm(20) >> y<-rnorm(20) >> >> mm<-as.matrix(cbind(x,y)) >> >> dst<-(dist(mm)) > >------------------------------------- >David L Carlson >Department of Anthropology >Texas A&M University >College Station, TX 77840-4352 > > >-----Original Message----- >From: David Winsemius [mailto:dwinsemius at comcast.net] >Sent: Thursday, September 24, 2015 6:30 PM >To: Lorenzo Isella >Cc: David L Carlson; r-help at r-project.org >Subject: Re: [R] Sampling the Distance Matrix > > >On Sep 24, 2015, at 1:54 PM, Lorenzo Isella wrote: > >> On Thu, Sep 24, 2015 at 01:30:02PM -0700, David Winsemius wrote: >>> >>> On Sep 24, 2015, at 12:36 PM, Lorenzo Isella wrote: >>> >>>> Hi, >>>> And thanks for your reply. >>>> Essentially, your script gets the job done. >>>> For instance, if I run >>>> >>>> mm <- cbind(5/(1:5), -2*sqrt(1:5)) >>>> dst <- dist(mm) >>>> dst2 <- as.matrix(dst) >>>> diag(dst2) <- NA >>>> idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9))) >>>> >>>> then it correctly detects the first two rows, where all the values are >>>> larger than 0.9. >>>> In other words, it detects the points that are at least 0.9 units away >>>> from *all* the other points. >>>> My other question (I did not realize this until I got your answer) is >>>> the following: I have the distance matrix of a set of N points. >>>> You gave me an algorithm two find all the points that are at least 0.9 >>>> units away from any other points. >>>> However, in some cases, for me it is OK even a weaker condition: find >>>> a subset of k points (with k tunable) whose distance *from each other* >>>> is greater than 0.9 units (even if their distance from some other >>>> points may be smaller than 0.9). >>> >>> If I understand ..... Make a matrix of unique combinations, then apply by rows to get the qualifying columns that satisfy the distance criterion: >>> >>> mtxcomb <- combn(1:20, 5) >>> goodcls <- apply(mtxcomb , 2, function(idx) all( dist( cbind( x[idx], y[idx]) ) > 0.9)) >>> mtxcomb [ , goodcls] >>> >>> In my sample it was around 9% of the total 5 item combinations. >>> >>> snipped a lot of output: >>> ..... >>> [,1440] [,1441] >>> [1,] 12 13 >>> [2,] 13 16 >>> [3,] 16 17 >>> [4,] 19 19 >>> [5,] 20 20 >>>> dim( mtxcomb) >>> [1] 5 15504 >>> >> >> Hi, >> Thanks for your reply. >> I think I am getting there, but when I run your commands, I get this >> error message >> >> Error in cbind(x[idx], y[idx]) : object 'x' not found >> >> Any idea why? Should I combine those 3 lines with something else? > >No idea. I was running the setup that you asked for in your original message which you have now omitted from the mail chain. > > > >> Cheers >> >> Lorenzo > >David Winsemius >Alameda, CA, USA >
Apologies for not letting this thread rest in peace. The small script ######################################################### set.seed(1234) x <- rnorm(20) y <- rnorm(20) goodcls <- apply(mtxcomb , 2, function(idx) all( dist( cbind( x[idx], y[idx]) ) > 0.9)) mycomb <- mtxcomb [ , goodcls] ######################################################### is perfect to detects groups of 5 points whose distances to each other are always above 0.9. However, in my practical case I have about 500 points and I am looking for subset of several tens of points whose distance is above a given threshold. Unfortunately, the approach above does not scale, so I wonder if anybody is aware of an alternative approach. Many thanks Lorenzo