I think the OP wanted rows where all values were greater than .9. If so, this works:> set.seed(42) > dst <- dist(cbind(rnorm(20), rnorm(20))) > dst2 <- as.matrix(dst) > diag(dst2) <- NA > idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9))) > idx13 18 19 13 18 19> dst2[idx, idx]13 18 19 13 NA 2.272407 3.606054 18 2.272407 NA 1.578150 19 3.606054 1.578150 NA ------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of William Dunlap Sent: Wednesday, September 23, 2015 3:23 PM To: Lorenzo Isella Cc: r-help at r-project.org Subject: Re: [R] Sampling the Distance Matrix> mm <- cbind(1/(1:5), sqrt(1:5)) > d <- dist(mm) > d1 2 3 4 2 0.6492864 3 0.9901226 0.3588848 4 1.2500000 0.6369033 0.2806086 5 1.4723668 0.8748970 0.5213550 0.2413050> which(as.matrix(d)>0.9, arr.ind=TRUE)row col 3 3 1 4 4 1 5 5 1 1 1 3 1 1 4 1 1 5 I.e., the distances between mm's rows 3 & 1, 4 & 1, and 5,1 are more than 0.9 The as.matrix(d) is needed because dist returns the lower triangle of the distance matrix and an object of class "dist" and as.matrix.dist converts that into a matrix. Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Sep 23, 2015 at 12:15 PM, Lorenzo Isella <lorenzo.isella at gmail.com> wrote:> Dear All, > Suppose you have a distance matrix stored like a dist object, for > instance > > x<-rnorm(20) > y<-rnorm(20) > > mm<-as.matrix(cbind(x,y)) > > dst<-(dist(mm)) > > Now, my problem is the following: I would like to get the rows of mm > corresponding to points whose distance is always larger of, let's say, > 0.9. > In other words, if I were to compute the distance matrix on those > selected rows of mm, apart from the diagonal, I would get all entries > larger than 0.9. > Any idea about how I can efficiently code that? > Regards > > Lorenzo > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi, And thanks for your reply. Essentially, your script gets the job done. For instance, if I run mm <- cbind(5/(1:5), -2*sqrt(1:5)) dst <- dist(mm) dst2 <- as.matrix(dst) diag(dst2) <- NA idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9))) then it correctly detects the first two rows, where all the values are larger than 0.9. In other words, it detects the points that are at least 0.9 units away from *all* the other points. My other question (I did not realize this until I got your answer) is the following: I have the distance matrix of a set of N points. You gave me an algorithm two find all the points that are at least 0.9 units away from any other points. However, in some cases, for me it is OK even a weaker condition: find a subset of k points (with k tunable) whose distance *from each other* is greater than 0.9 units (even if their distance from some other points may be smaller than 0.9). Any idea about how to tackle that? Is it simply a matter of detecting the row and column numbers of all the entries of the distance matrix larger than 0.9? Many thanks Lorenzo On Wed, Sep 23, 2015 at 09:23:04PM +0000, David L Carlson wrote:>I think the OP wanted rows where all values were greater than .9. >If so, this works: > >> set.seed(42) >> dst <- dist(cbind(rnorm(20), rnorm(20))) >> dst2 <- as.matrix(dst) >> diag(dst2) <- NA >> idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9))) >> idx >13 18 19 >13 18 19 >> dst2[idx, idx] > 13 18 19 >13 NA 2.272407 3.606054 >18 2.272407 NA 1.578150 >19 3.606054 1.578150 NA > >------------------------------------- >David L Carlson >Department of Anthropology >Texas A&M University >College Station, TX 77840-4352 > > > >-----Original Message----- >From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of William Dunlap >Sent: Wednesday, September 23, 2015 3:23 PM >To: Lorenzo Isella >Cc: r-help at r-project.org >Subject: Re: [R] Sampling the Distance Matrix > >> mm <- cbind(1/(1:5), sqrt(1:5)) >> d <- dist(mm) >> d > 1 2 3 4 >2 0.6492864 >3 0.9901226 0.3588848 >4 1.2500000 0.6369033 0.2806086 >5 1.4723668 0.8748970 0.5213550 0.2413050 >> which(as.matrix(d)>0.9, arr.ind=TRUE) > row col >3 3 1 >4 4 1 >5 5 1 >1 1 3 >1 1 4 >1 1 5 >I.e., the distances between mm's rows 3 & 1, 4 & 1, and 5,1 are more than 0.9 > >The as.matrix(d) is needed because dist returns the lower triangle of >the distance >matrix and an object of class "dist" and as.matrix.dist converts that >into a matrix. > >Bill Dunlap >TIBCO Software >wdunlap tibco.com > > >On Wed, Sep 23, 2015 at 12:15 PM, Lorenzo Isella ><lorenzo.isella at gmail.com> wrote: >> Dear All, >> Suppose you have a distance matrix stored like a dist object, for >> instance >> >> x<-rnorm(20) >> y<-rnorm(20) >> >> mm<-as.matrix(cbind(x,y)) >> >> dst<-(dist(mm)) >> >> Now, my problem is the following: I would like to get the rows of mm >> corresponding to points whose distance is always larger of, let's say, >> 0.9. >> In other words, if I were to compute the distance matrix on those >> selected rows of mm, apart from the diagonal, I would get all entries >> larger than 0.9. >> Any idea about how I can efficiently code that? >> Regards >> >> Lorenzo >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
On Sep 24, 2015, at 12:36 PM, Lorenzo Isella wrote:> Hi, > And thanks for your reply. > Essentially, your script gets the job done. > For instance, if I run > > mm <- cbind(5/(1:5), -2*sqrt(1:5)) > dst <- dist(mm) > dst2 <- as.matrix(dst) > diag(dst2) <- NA > idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9))) > > then it correctly detects the first two rows, where all the values are > larger than 0.9. > In other words, it detects the points that are at least 0.9 units away > from *all* the other points. > My other question (I did not realize this until I got your answer) is > the following: I have the distance matrix of a set of N points. > You gave me an algorithm two find all the points that are at least 0.9 > units away from any other points. > However, in some cases, for me it is OK even a weaker condition: find > a subset of k points (with k tunable) whose distance *from each other* > is greater than 0.9 units (even if their distance from some other > points may be smaller than 0.9).If I understand ..... Make a matrix of unique combinations, then apply by rows to get the qualifying columns that satisfy the distance criterion: mtxcomb <- combn(1:20, 5) goodcls <- apply(mtxcomb , 2, function(idx) all( dist( cbind( x[idx], y[idx]) ) > 0.9)) mtxcomb [ , goodcls] In my sample it was around 9% of the total 5 item combinations. snipped a lot of output: ..... [,1440] [,1441] [1,] 12 13 [2,] 13 16 [3,] 16 17 [4,] 19 19 [5,] 20 20> dim( mtxcomb)[1] 5 15504 -- David> Any idea about how to tackle that? Is it simply a matter of detecting > the row and column numbers of all the entries of the distance matrix > larger than 0.9? > Many thanks > > Lorenzo > > > > On Wed, Sep 23, 2015 at 09:23:04PM +0000, David L Carlson wrote: >> I think the OP wanted rows where all values were greater than .9. >> If so, this works: >> >>> set.seed(42) >>> dst <- dist(cbind(rnorm(20), rnorm(20))) >>> dst2 <- as.matrix(dst) >>> diag(dst2) <- NA >>> idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9))) >>> idx >> 13 18 19 >> 13 18 19 >>> dst2[idx, idx] >> 13 18 19 >> 13 NA 2.272407 3.606054 >> 18 2.272407 NA 1.578150 >> 19 3.606054 1.578150 NA >> >> ------------------------------------- >> David L Carlson >> Department of Anthropology >> Texas A&M University >> College Station, TX 77840-4352 >> >> >> >> -----Original Message----- >> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of William Dunlap >> Sent: Wednesday, September 23, 2015 3:23 PM >> To: Lorenzo Isella >> Cc: r-help at r-project.org >> Subject: Re: [R] Sampling the Distance Matrix >> >>> mm <- cbind(1/(1:5), sqrt(1:5)) >>> d <- dist(mm) >>> d >> 1 2 3 4 >> 2 0.6492864 >> 3 0.9901226 0.3588848 >> 4 1.2500000 0.6369033 0.2806086 >> 5 1.4723668 0.8748970 0.5213550 0.2413050 >>> which(as.matrix(d)>0.9, arr.ind=TRUE) >> row col >> 3 3 1 >> 4 4 1 >> 5 5 1 >> 1 1 3 >> 1 1 4 >> 1 1 5 >> I.e., the distances between mm's rows 3 & 1, 4 & 1, and 5,1 are more than 0.9 >> >> The as.matrix(d) is needed because dist returns the lower triangle of >> the distance >> matrix and an object of class "dist" and as.matrix.dist converts that >> into a matrix. >> >> Bill Dunlap >> TIBCO Software >> wdunlap tibco.com >> >> >> On Wed, Sep 23, 2015 at 12:15 PM, Lorenzo Isella >> <lorenzo.isella at gmail.com> wrote: >>> Dear All, >>> Suppose you have a distance matrix stored like a dist object, for >>> instance >>> >>> x<-rnorm(20) >>> y<-rnorm(20) >>> >>> mm<-as.matrix(cbind(x,y)) >>> >>> dst<-(dist(mm)) >>> >>> Now, my problem is the following: I would like to get the rows of mm >>> corresponding to points whose distance is always larger of, let's say, >>> 0.9. >>> In other words, if I were to compute the distance matrix on those >>> selected rows of mm, apart from the diagonal, I would get all entries >>> larger than 0.9. >>> Any idea about how I can efficiently code that? >>> Regards >>> >>> Lorenzo >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA