thr3ads.net - R help - [R] Finding overlaps in vector [Dec 2007]

If this information is useful, please help other people find it:
Share via:

Johannes Graumann

2007-Dec-21 09:56 UTC

[R] Finding overlaps in vector

<posted & mailed>

Dear all,

I'm trying to solve the problem, of how to find clusters of values in a
vector that are closer than a given value. Illustrated this might look as
follows:

vector <- c(0,0.45,1,2,3,3.25,3.33,3.75,4.1,5,6,6.45,7,7.1,8)

When using '0.5' as the proximity requirement, the following groups
would
result:
0,0.45
3,3.25,3.33,3.75,4.1
6,6.45
7,7.1

Jim Holtman proposed a very elegant solution in
http://tolstoy.newcastle.edu.au/R/e2/help/07/07/21286.html, which I have
modified and perused since he wrote it to me. The beauty of this approach
is that it will not only work for constant proximity requirements as above,
but also for overlap-windows defined in terms of ppm around each value.
Now I have an additional need and have found no way (short of iteratively
step through all the groups returned) to figure out how to do that with
Jim's approach: how to figure out that 6,6.45 and 7,7.1 are separate
clusters?

Thanks for any hints, Joh

jim holtman

2007-Dec-21 15:32 UTC

head link

[R] Finding overlaps in vector

Here is a modification of the algorithm to use a specified value for
the overlap:
> vector <- c(0,0.45,1,2,3,3.25,3.33,3.75,4.1,5,6,6.45,7,7.1,8)
> # following add 0.5 as the overlap detection -- can be changed
> x <- rbind(cbind(value=vector, oper=1, id=seq_along(vector)),+            cbind(value=vector+0.5, oper=-1,
id=seq_along(vector)))> x <- x[order(x[,'value'], -x[, 'oper']),]
> # determine which ones overlap
> x <- cbind(x, over=cumsum(x[, 'oper']))
> # now partition into groups and only use groups greater than or equal to 3
> # determine where the breaks are (0 values in cumsum(over))
> x <- cbind(x, breaks=cumsum(x[, 'over'] == 0))
> # delete entries with 'over' == 0
> x <- x[x[, 'over'] != 0,]
> # split into groupd
> x.groups <- split(x[, 'id'], x[, 'breaks'])
> # only keep those with more than 2
> x.subsets <- x.groups[sapply(x.groups, length) >= 3]
> # print out the subsets
> invisible(lapply(x.subsets, function(a) print(vector[unique(a)])))[1] 0.00 0.45
[1] 3.00 3.25 3.33 3.75 4.10
[1] 6.00 6.45
[1] 7.0 7.1


On Dec 21, 2007 4:56 AM, Johannes Graumann <johannes_graumann at web.de>
wrote:> <posted & mailed>
>
> Dear all,
>
> I'm trying to solve the problem, of how to find clusters of values in a
> vector that are closer than a given value. Illustrated this might look as
> follows:
>
> vector <- c(0,0.45,1,2,3,3.25,3.33,3.75,4.1,5,6,6.45,7,7.1,8)
>
> When using '0.5' as the proximity requirement, the following groups
would
> result:
> 0,0.45
> 3,3.25,3.33,3.75,4.1
> 6,6.45
> 7,7.1
>
> Jim Holtman proposed a very elegant solution in
> http://tolstoy.newcastle.edu.au/R/e2/help/07/07/21286.html, which I have
> modified and perused since he wrote it to me. The beauty of this approach
> is that it will not only work for constant proximity requirements as above,
> but also for overlap-windows defined in terms of ppm around each value.
> Now I have an additional need and have found no way (short of iteratively
> step through all the groups returned) to figure out how to do that with
> Jim's approach: how to figure out that 6,6.45 and 7,7.1 are separate
> clusters?
>
> Thanks for any hints, Joh
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

Gabor Grothendieck

2007-Dec-21 16:09 UTC

head link

[R] Finding overlaps in vector

This may not be as direct as Jim's in terms of specifying granularity but
will uses conventional hierarchical clustering to create the clusters and also
draws a nice dendrogram for you.   I have split the dendrogram at a
height of 0.5
to define the clusters but you can change that to whatever granularity you like:
> v <- c(0, 0.45, 1, 2, 3, 3.25, 3.33, 3.75, 4.1, 5, 6, 6.45, 7, 7.1, 8)
>
> # cluster and plot
> hc <- hclust(dist(v), method = "single")
> plot(hc, lab = v)
> cl <- rect.hclust(hc, h = .5, border = "red")
>
> # each component of list cl is one cluster.  Print them out.
> for(idx in cl) print(unname(v[idx]))[1] 8
[1] 7.0 7.1
[1] 6.00 6.45
[1] 5
[1] 3.00 3.25 3.33 3.75 4.10
[1] 2
[1] 1
[1] 0.00 0.45
> # a different representation of the clusters
> vv <- v
> names(vv) <- ct <- cutree(hc, h = .5)
> vv   1    1    2    3    4    4    4    4    4    5    6    6    7    7    8
0.00 0.45 1.00 2.00 3.00 3.25 3.33 3.75 4.10 5.00 6.00 6.45 7.00 7.10 8.00


On Dec 21, 2007 4:56 AM, Johannes Graumann <johannes_graumann at web.de>
wrote:> <posted & mailed>
>
> Dear all,
>
> I'm trying to solve the problem, of how to find clusters of values in a
> vector that are closer than a given value. Illustrated this might look as
> follows:
>
> vector <- c(0,0.45,1,2,3,3.25,3.33,3.75,4.1,5,6,6.45,7,7.1,8)
>
> When using '0.5' as the proximity requirement, the following groups
would
> result:
> 0,0.45
> 3,3.25,3.33,3.75,4.1
> 6,6.45
> 7,7.1
>
> Jim Holtman proposed a very elegant solution in
> http://tolstoy.newcastle.edu.au/R/e2/help/07/07/21286.html, which I have
> modified and perused since he wrote it to me. The beauty of this approach
> is that it will not only work for constant proximity requirements as above,
> but also for overlap-windows defined in terms of ppm around each value.
> Now I have an additional need and have found no way (short of iteratively
> step through all the groups returned) to figure out how to do that with
> Jim's approach: how to figure out that 6,6.45 and 7,7.1 are separate
> clusters?
>
> Thanks for any hints, Joh
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry

2007-Dec-21 18:41 UTC

head link

[R] Finding overlaps in vector

On Fri, 21 Dec 2007, Johannes Graumann wrote:
> <posted & mailed>
>
> Dear all,
>
> I'm trying to solve the problem, of how to find clusters of values in a
> vector that are closer than a given value. Illustrated this might look as
> follows:
>
> vector <- c(0,0.45,1,2,3,3.25,3.33,3.75,4.1,5,6,6.45,7,7.1,8)
>
> When using '0.5' as the proximity requirement, the following groups
would
> result:
> 0,0.45
> 3,3.25,3.33,3.75,4.1
> 6,6.45
> 7,7.1
Try this:
> tmp <- rle( diff(v)<.5 )
> ends <- 1+cumsum(tmp$lengths)[tmp$values]
> mapply(function(x,y) v[ seq(to=x,length=y) ], ends,
1+tmp$lengths[tmp$values])[[1]]
[1] 0.00 0.45

[[2]]
[1] 3.00 3.25 3.33 3.75 4.10

[[3]]
[1] 6.00 6.45

[[4]]
[1] 7.0 7.1


HTH,

Chuck
>
> Jim Holtman proposed a very elegant solution in
> http://tolstoy.newcastle.edu.au/R/e2/help/07/07/21286.html, which I have
> modified and perused since he wrote it to me. The beauty of this approach
> is that it will not only work for constant proximity requirements as above,
> but also for overlap-windows defined in terms of ppm around each value.
> Now I have an additional need and have found no way (short of iteratively
> step through all the groups returned) to figure out how to do that with
> Jim's approach: how to figure out that 6,6.45 and 7,7.1 are separate
> clusters?
>
> Thanks for any hints, Joh
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

Possibly Parallel Threads

Search for more reasonably related threads

R help - Dec 2007 - Finding overlaps in vector

[R] Finding overlaps in vector

[R] Finding overlaps in vector

[R] Finding overlaps in vector

[R] Finding overlaps in vector

Possibly Parallel Threads