thr3ads.net - R devel - [Rd] findInterval [Sep 2024]

If this information is useful, please help other people find it:
Share via:

Gabor Grothendieck

2024-Sep-16 15:21 UTC

[Rd] findInterval

Suppose we have `dat` shown below and we want to find the the `y` value
corresponding to the last value in `x` equal to the corresponding component
of `seek` and we wish to return an output the same length as `seek` using
`findInterval` to perform  the search.  This returns the correct result:

  dat <- data.frame(x = c(2, 2, 3, 4, 4, 4),
    y = c(37, 12, 19, 30, 6, 15),
    seek = 1:6)

  zero2na <- function(x) replace(x, x == 0, NA)
  dat |>
    transform(dat, result = y[ zero2na(findInterval(seek, x)) ] ) |>
    _$result
   ## [1] NA 12 19 15 15 15

Since `findInterval` returns an index it is natural that the next step be
to use the index and it is also common that we want a result that is the
same length as the input.

The extra step here is to convert the 0 which `findInterval`
hard codes as missing to NA.

If, like `match`, the `findInterval` function had a `nomatch=` argument we
could have written this as follows which is shorter, more understandable
and avoids the need for zero2na:

  # if nomatch= were implemented
  seek |>
    transform(result = y[ findInterval(x, nomatch = NA) ] ) |>
    _$result

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Martin Maechler

2024-Sep-17 16:14 UTC

head link

[Rd] findInterval

>>>>> Gabor Grothendieck 
>>>>>     on Mon, 16 Sep 2024 11:21:55 -0400 writes:
    > Suppose we have `dat` shown below and we want to find the the `y` value
    > corresponding to the last value in `x` equal to the corresponding
component
    > of `seek` and we wish to return an output the same length as `seek`
using
    > `findInterval` to perform  the search.  This returns the correct
result:

    > dat <- data.frame(x = c(2, 2, 3, 4, 4, 4),
    > 	     		y = c(37, 12, 19, 30, 6, 15),
    >                  seek = 1:6)

    > zero2na <- function(x) replace(x, x == 0, NA)

    > dat |>
    > transform(dat, result = y[ zero2na(findInterval(seek, x)) ] ) |>
    > _$result
    > ## [1] NA 12 19 15 15 15

I'd write that as

    with(dat, y[ zero2na(findInterval(seek, x)) ] )

so I can read it with jumping hoops and stand on my head ...

    > Since `findInterval` returns an index it is natural that the next step
be
    > to use the index and it is also common that we want a result that is
the
    > same length as the input.

I think your example where x and y are of the same length
not typical.

Not that the design of   findInterval(x, vec, ..)  is indeed to always return
an index, but there isn't any "nomatch", but rather a
- "left of the leftmost", i.e.,  an x[i] < vec[1]  (as
'vec' must be
  sorted increasingly) or
- "right of rightmost"  , i.e.,  an x[i] > vec[length(vec)]

and these should give *different* results (and not both the
same).

I don't think 'nomatch' would improve the relatively clean 
findInterval()
behavior.

There are  three logical switches  ... which allow   2^3
variants of which I now guess only 6  differ:

Here's some R code showing the possibilities:


(argsTF <- names(formals(findInterval))[-(1:2)]) #
"rightmost.closed"  "all.inside" "left.open"
FT <- c(FALSE, TRUE)
allFT <- as.matrix(expand.grid(rightmost.closed = FT,
                               all.inside       = FT,
                               left.open        = FT))
allFT
(cn <- substr(colnames(allFT), 1,1)) #  "r" "a"
"l"

x <- 2:18
v <- c(5, 10, 15) # create two bins [5,10) and [10,15)

fiAll <- apply(allFT, 1, function(r.a.f)
    do.call(findInterval, c(list(x, v), as.list(r.a.f))))

cbind(x, fiAll) # has all info

## must find cool 'column names' for fiAll: construct from r.., a.., l..
= F / T
(cn1 <- apply(`dim<-`(c(".","|")[allFT+1L],
dim(allFT)), 1, paste0, collapse=""))
##  "..." "|.." ".|." "||."
"..|" "|.|" ".||" "|||"
colnames(fiAll) <- cn1
cbind(x, fiAll) ## --> col. 3 == 4  and  7 == 8
##==> show only unique columns:
cbind(x, t(unique(t(fiAll))))
 ##  x ... |.. .|. ..| |.| .||
 ##  2   0   0   1   0   0   1
 ##  3   0   0   1   0   0   1
 ##  4   0   0   1   0   0   1
 ##  5   1   1   1   0   1   1
 ##  6   1   1   1   1   1   1
 ##  7   1   1   1   1   1   1
 ##  8   1   1   1   1   1   1
 ##  9   1   1   1   1   1   1
 ## 10   2   2   2   1   1   1
 ## 11   2   2   2   2   2   2
 ## 12   2   2   2   2   2   2
 ## 13   2   2   2   2   2   2
 ## 14   2   2   2   2   2   2
 ## 15   3   2   2   2   2   2
 ## 16   3   3   2   3   3   2
 ## 17   3   3   2   3   3   2
 ## 18   3   3   2   3   3   2
  

Martin

Possibly Parallel Threads

Search for more apparently analagous threads

R devel - Sep 2024 - findInterval

[Rd] findInterval

[Rd] findInterval

Possibly Parallel Threads