thr3ads.net - R help - [R] Sampling the Distance Matrix [Sep 2015]

If this information is useful, please help other people find it:
Share via:

David Winsemius

2015-Sep-24 23:29 UTC

[R] Sampling the Distance Matrix

On Sep 24, 2015, at 1:54 PM, Lorenzo Isella wrote:
> On Thu, Sep 24, 2015 at 01:30:02PM -0700, David Winsemius wrote:
>> 
>> On Sep 24, 2015, at 12:36 PM, Lorenzo Isella wrote:
>> 
>>> Hi,
>>> And thanks for your reply.
>>> Essentially, your script gets the job done.
>>> For instance, if I run
>>> 
>>> mm <- cbind(5/(1:5), -2*sqrt(1:5))
>>> dst <- dist(mm)
>>> dst2 <- as.matrix(dst)
>>> diag(dst2) <- NA
>>> idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9)))
>>> 
>>> then it correctly detects the first two rows, where all the values
are
>>> larger than 0.9.
>>> In other words, it detects the points that are at least 0.9 units
away
>>> from *all* the other points.
>>> My other question (I did not realize this until I got your answer)
is
>>> the following: I have the distance matrix of a set of N points.
>>> You gave me an algorithm two find all the points that are at least
0.9
>>> units away from any other points.
>>> However, in some cases, for me it is OK even a weaker condition:
find
>>> a subset of k points (with k tunable) whose distance *from each
other*
>>> is greater than 0.9 units (even if their distance from some other
>>> points may be smaller than 0.9).
>> 
>> If I understand ..... Make a matrix of unique combinations, then apply
by rows to get the qualifying columns that satisfy the distance criterion:
>> 
>> mtxcomb <- combn(1:20, 5)
>> goodcls <- apply(mtxcomb , 2, function(idx) all( dist( cbind(
x[idx], y[idx]) ) > 0.9))
>> mtxcomb [ , goodcls]
>> 
>> In my sample it was around 9% of the total 5 item combinations.
>> 
>> snipped a lot of output:
>> .....
>>   [,1440] [,1441]
>> [1,]      12      13
>> [2,]      13      16
>> [3,]      16      17
>> [4,]      19      19
>> [5,]      20      20
>>> dim( mtxcomb)
>> [1]     5 15504
>> 
> 
> Hi,
> Thanks for your reply.
> I think I am getting there, but when I run your commands, I get this
> error message
> 
> Error in cbind(x[idx], y[idx]) : object 'x' not found
> 
> Any idea why? Should I combine those 3 lines with something else?
No idea. I was running the setup that you asked for in your original message
which you have now omitted from the mail chain.


> Cheers
> 
> Lorenzo
David Winsemius
Alameda, CA, USA

David L Carlson

2015-Sep-25 13:54 UTC

head link

[R] Sampling the Distance Matrix

You defined x and y in your original email as:
> x<-rnorm(20)
> y<-rnorm(20)
>
> mm<-as.matrix(cbind(x,y))
>
> dst<-(dist(mm))
-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352


-----Original Message-----
From: David Winsemius [mailto:dwinsemius at comcast.net] 
Sent: Thursday, September 24, 2015 6:30 PM
To: Lorenzo Isella
Cc: David L Carlson; r-help at r-project.org
Subject: Re: [R] Sampling the Distance Matrix


On Sep 24, 2015, at 1:54 PM, Lorenzo Isella wrote:
> On Thu, Sep 24, 2015 at 01:30:02PM -0700, David Winsemius wrote:
>> 
>> On Sep 24, 2015, at 12:36 PM, Lorenzo Isella wrote:
>> 
>>> Hi,
>>> And thanks for your reply.
>>> Essentially, your script gets the job done.
>>> For instance, if I run
>>> 
>>> mm <- cbind(5/(1:5), -2*sqrt(1:5))
>>> dst <- dist(mm)
>>> dst2 <- as.matrix(dst)
>>> diag(dst2) <- NA
>>> idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9)))
>>> 
>>> then it correctly detects the first two rows, where all the values
are
>>> larger than 0.9.
>>> In other words, it detects the points that are at least 0.9 units
away
>>> from *all* the other points.
>>> My other question (I did not realize this until I got your answer)
is
>>> the following: I have the distance matrix of a set of N points.
>>> You gave me an algorithm two find all the points that are at least
0.9
>>> units away from any other points.
>>> However, in some cases, for me it is OK even a weaker condition:
find
>>> a subset of k points (with k tunable) whose distance *from each
other*
>>> is greater than 0.9 units (even if their distance from some other
>>> points may be smaller than 0.9).
>> 
>> If I understand ..... Make a matrix of unique combinations, then apply
by rows to get the qualifying columns that satisfy the distance criterion:
>> 
>> mtxcomb <- combn(1:20, 5)
>> goodcls <- apply(mtxcomb , 2, function(idx) all( dist( cbind(
x[idx], y[idx]) ) > 0.9))
>> mtxcomb [ , goodcls]
>> 
>> In my sample it was around 9% of the total 5 item combinations.
>> 
>> snipped a lot of output:
>> .....
>>   [,1440] [,1441]
>> [1,]      12      13
>> [2,]      13      16
>> [3,]      16      17
>> [4,]      19      19
>> [5,]      20      20
>>> dim( mtxcomb)
>> [1]     5 15504
>> 
> 
> Hi,
> Thanks for your reply.
> I think I am getting there, but when I run your commands, I get this
> error message
> 
> Error in cbind(x[idx], y[idx]) : object 'x' not found
> 
> Any idea why? Should I combine those 3 lines with something else?
No idea. I was running the setup that you asked for in your original message
which you have now omitted from the mail chain.


> Cheers
> 
> Lorenzo
David Winsemius
Alameda, CA, USA

Lorenzo Isella

2015-Sep-25 19:15 UTC

head link

[R] Sampling the Distance Matrix

Absolutely right!
Thanks to both David for their help.
Cheers

Lorenzo

On Fri, Sep 25, 2015 at 01:54:54PM +0000, David L Carlson
wrote:>You defined x and y in your original email as:
>
>> x<-rnorm(20)
>> y<-rnorm(20)
>>
>> mm<-as.matrix(cbind(x,y))
>>
>> dst<-(dist(mm))
>
>-------------------------------------
>David L Carlson
>Department of Anthropology
>Texas A&M University
>College Station, TX 77840-4352
>
>
>-----Original Message-----
>From: David Winsemius [mailto:dwinsemius at comcast.net]
>Sent: Thursday, September 24, 2015 6:30 PM
>To: Lorenzo Isella
>Cc: David L Carlson; r-help at r-project.org
>Subject: Re: [R] Sampling the Distance Matrix
>
>
>On Sep 24, 2015, at 1:54 PM, Lorenzo Isella wrote:
>
>> On Thu, Sep 24, 2015 at 01:30:02PM -0700, David Winsemius wrote:
>>>
>>> On Sep 24, 2015, at 12:36 PM, Lorenzo Isella wrote:
>>>
>>>> Hi,
>>>> And thanks for your reply.
>>>> Essentially, your script gets the job done.
>>>> For instance, if I run
>>>>
>>>> mm <- cbind(5/(1:5), -2*sqrt(1:5))
>>>> dst <- dist(mm)
>>>> dst2 <- as.matrix(dst)
>>>> diag(dst2) <- NA
>>>> idx <- which(apply(dst2, 1, function(x)
all(na.omit(x)>.9)))
>>>>
>>>> then it correctly detects the first two rows, where all the
values are
>>>> larger than 0.9.
>>>> In other words, it detects the points that are at least 0.9
units away
>>>> from *all* the other points.
>>>> My other question (I did not realize this until I got your
answer) is
>>>> the following: I have the distance matrix of a set of N points.
>>>> You gave me an algorithm two find all the points that are at
least 0.9
>>>> units away from any other points.
>>>> However, in some cases, for me it is OK even a weaker
condition: find
>>>> a subset of k points (with k tunable) whose distance *from each
other*
>>>> is greater than 0.9 units (even if their distance from some
other
>>>> points may be smaller than 0.9).
>>>
>>> If I understand ..... Make a matrix of unique combinations, then
apply by rows to get the qualifying columns that satisfy the distance criterion:
>>>
>>> mtxcomb <- combn(1:20, 5)
>>> goodcls <- apply(mtxcomb , 2, function(idx) all( dist( cbind(
x[idx], y[idx]) ) > 0.9))
>>> mtxcomb [ , goodcls]
>>>
>>> In my sample it was around 9% of the total 5 item combinations.
>>>
>>> snipped a lot of output:
>>> .....
>>>   [,1440] [,1441]
>>> [1,]      12      13
>>> [2,]      13      16
>>> [3,]      16      17
>>> [4,]      19      19
>>> [5,]      20      20
>>>> dim( mtxcomb)
>>> [1]     5 15504
>>>
>>
>> Hi,
>> Thanks for your reply.
>> I think I am getting there, but when I run your commands, I get this
>> error message
>>
>> Error in cbind(x[idx], y[idx]) : object 'x' not found
>>
>> Any idea why? Should I combine those 3 lines with something else?
>
>No idea. I was running the setup that you asked for in your original message
which you have now omitted from the mail chain.
>
>
>
>> Cheers
>>
>> Lorenzo
>
>David Winsemius
>Alameda, CA, USA
>

Lorenzo Isella

2015-Sep-25 19:54 UTC

head link

[R] Sampling the Distance Matrix

Apologies for not letting this thread rest in peace.
The small script

#########################################################
set.seed(1234)

x <- rnorm(20)
y <- rnorm(20)


goodcls <- apply(mtxcomb , 2, function(idx) all( dist( cbind( x[idx],
y[idx]) ) > 0.9))

mycomb <- mtxcomb [ , goodcls]
#########################################################


is perfect to detects groups of 5 points whose distances to each other
are always above 0.9.
However, in my practical case I have about 500 points and I am looking
for subset of several tens of points whose distance is above a given
threshold.
Unfortunately, the approach above does not scale, so I wonder if
anybody is aware of an alternative approach.
Many thanks

Lorenzo

R help - Sep 2015 - Sampling the Distance Matrix

[R] Sampling the Distance Matrix

[R] Sampling the Distance Matrix

[R] Sampling the Distance Matrix

[R] Sampling the Distance Matrix