Dear R-devel,
May I suggest that a new feature be added to a couple of unique() methods?
Sometimes it's useful to have the indices of the original data that the
unique elements come from, so that the original data can be recreated from
the unique()ed data. I suggest that an `index' argument be added for
unique. Below is a suggested patch against
R/src/library/base/R/duplicated.R:
*** R-devel/src/library/base/R/duplicated.R Tue Jul 20 12:45:52 2004
--- duplicated.R Thu Aug 19 14:50:38 2004
***************
*** 34,50 ****
## NB unique.default is used by factor to avoid unique.matrix,
## so it needs to handle some other cases.
! unique.default <- function(x, incomparables = FALSE, ...)
{
if(!is.logical(incomparables) || incomparables)
.NotYetUsed("incomparables != FALSE")
z <- .Internal(unique(x))
if(is.factor(x))
! factor(z, levels = seq(len=nlevels(x)), labels = levels(x),
! ordered = is.ordered(x))
else if(inherits(x, "POSIXct") || inherits(x, "Date"))
! structure(z, class=class(x))
! else z
}
unique.data.frame <- function(x, incomparables = FALSE, ...)
--- 34,51 ----
## NB unique.default is used by factor to avoid unique.matrix,
## so it needs to handle some other cases.
! unique.default <- function(x, incomparables = FALSE, index=FALSE, ...)
{
if(!is.logical(incomparables) || incomparables)
.NotYetUsed("incomparables != FALSE")
z <- .Internal(unique(x))
if(is.factor(x))
! z <- factor(z, levels = seq(len=nlevels(x)), labels = levels(x),
! ordered = is.ordered(x))
else if(inherits(x, "POSIXct") || inherits(x, "Date"))
! z <- structure(z, class=class(x))
! if (index) attr(z, "index") <- match(x, z)
! z
}
unique.data.frame <- function(x, incomparables = FALSE, ...)
***************
*** 55,61 ****
}
unique.matrix <- unique.array <-
! function(x, incomparables = FALSE , MARGIN = 1, ...)
{
if(!is.logical(incomparables) || incomparables)
.NotYetUsed("incomparables != FALSE")
--- 56,62 ----
}
unique.matrix <- unique.array <-
! function(x, incomparables = FALSE , MARGIN = 1, index=FALSE, ...)
{
if(!is.logical(incomparables) || incomparables)
.NotYetUsed("incomparables != FALSE")
***************
*** 66,70 ****
args <- rep(alist(a=), ndim)
names(args) <- NULL
args[[MARGIN]] <- !duplicated(as.vector(temp))
! do.call("[", c(list(x=x), args, list(drop=FALSE)))
}
--- 67,76 ----
args <- rep(alist(a=), ndim)
names(args) <- NULL
args[[MARGIN]] <- !duplicated(as.vector(temp))
! res <- do.call("[", c(list(x=x), args, list(drop=FALSE)))
! if (index) {
! resTemp <- apply(res, MARGIN, function(x) paste(x, collapse
"\r"))
! attr(res, "index") <- match(temp, resTemp)
! }
! res
}
An example usage:
> x <- sample(5, 10, rep=T)
> x
[1] 4 2 5 3 2 3 4 2 2 3> z <- unique(x, index=TRUE)
> z[attr(z, "index")] == x
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE> x <- factor(x)
> z <- unique(x, index=TRUE)
> z
[1] 4 2 5 3
Levels: 2 3 4 5> z[attr(z, "index")] == x
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[I have not tried adding the same thing for the unique.data.frame method,
but that shouldn't be too hard...]
Best,
Andy
Andy Liaw, PhD
Biometrics Research PO Box 2000, RY33-300
Merck Research Labs Rahway, NJ 07065
mailto:andy_liaw@merck.com 732-594-0820