Bronwyn Rayfield
2012-Aug-28 19:58 UTC
[R] return first index for each unique value in a vector
I would like to efficiently find the first index of each unique value in a very large vector. For example, if I have a vector A<-c(9,2,9,5) I would like to return not only the unique values (2,5,9) but also their first indices (2,4,1). I tried using a for loop with which(A==unique(A)[i])[1] to find the first index of each unique value but it is very slow. What I am trying to do is easily and quickly done with the "unique" function in MATLAB (see http://www.mathworks.com/help/techdoc/ref/unique.html). Thank you for your help, Bronwyn [[alternative HTML version deleted]]
R. Michael Weylandt
2012-Aug-28 22:32 UTC
[R] return first index for each unique value in a vector
On Tue, Aug 28, 2012 at 2:58 PM, Bronwyn Rayfield <bronwynrayfield at gmail.com> wrote:> I would like to efficiently find the first index of each unique value in a > very large vector. > > For example, if I have a vector > > A<-c(9,2,9,5) > > I would like to return not only the unique values (2,5,9) but also their > first indices (2,4,1). > > I tried using a for loop with which(A==unique(A)[i])[1] to find the first > index of each unique value but it is very slow.You'll get marginally more speed from which.max() but I'm sure there's a better way. I'll write if I can think of it. Michael> > What I am trying to do is easily and quickly done with the "unique" > function in MATLAB (see > http://www.mathworks.com/help/techdoc/ref/unique.html). > > Thank you for your help, > Bronwyn > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Noia Raindrops
2012-Aug-28 22:49 UTC
[R] return first index for each unique value in a vector
Hi, Try this: order(A)[!duplicated(sort(A))] -- Noia Raindrops noia.raindrops at gmail.com
HI,
I was thinking about duplicated().? But, Bert already posted the solution.? The
solution below is not very efficient.
A<-c(9,2,9,5)
unik<-as.numeric(names(table(A)))
match(unik,A)
#[1] 2 4 1
#Bert's solution wins here.
system.time({
set.seed(1)
A<-sample(1:5,1e6,replace=TRUE)
unik <- !duplicated(A)? ## logical vector of unique values
seq_along(A)[unik]? ## indices
A[unik]})
?user? system elapsed
? 0.040?? 0.016?? 0.056
#My solution
system.time({
set.seed(1)
A<-sample(1:5,1e6,replace=TRUE)
#unik<-as.numeric(names(table(A)))
match(as.numeric(names(table(A))),A)})
?user? system elapsed
?0.344?? 0.036?? 0.383
#Robert's solution
?system.time({
set.seed(1)
A<-sample(1:5,1e6,replace=TRUE)
as.numeric(rownames(unique(data.frame(A)[1])))})
?user? system elapsed
? 0.056?? 0.012?? 0.069
A.K.
----- Original Message -----
From: Bronwyn Rayfield <bronwynrayfield at gmail.com>
To: r-help at r-project.org
Cc:
Sent: Tuesday, August 28, 2012 3:58 PM
Subject: [R] return first index for each unique value in a vector
I would like to efficiently find the first index of each unique value in a
very large vector.
For example, if I have a vector
A<-c(9,2,9,5)
I would like to return not only the unique values (2,5,9) but also their
first indices (2,4,1).
I tried using a for loop with which(A==unique(A)[i])[1] to find the first
index of each unique value but it is very slow.
What I am trying to do is easily and quickly done with the "unique"
function in MATLAB (see
http://www.mathworks.com/help/techdoc/ref/unique.html).
Thank you for your help,
Bronwyn
??? [[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
HI,
Replacing seq_along() with which() slightly improved CPU time. ?
system.time({
?set.seed(1)
?A<-sample(1:5,1e6,replace=TRUE)
?which(!duplicated(A))
?A[which(!duplicated(A))]
?})
#?? user? system elapsed
? #0.040?? 0.012?? 0.052?
A.K.
----- Original Message -----
From: Bronwyn Rayfield <bronwynrayfield at gmail.com>
To: r-help at r-project.org
Cc:
Sent: Tuesday, August 28, 2012 3:58 PM
Subject: [R] return first index for each unique value in a vector
I would like to efficiently find the first index of each unique value in a
very large vector.
For example, if I have a vector
A<-c(9,2,9,5)
I would like to return not only the unique values (2,5,9) but also their
first indices (2,4,1).
I tried using a for loop with which(A==unique(A)[i])[1] to find the first
index of each unique value but it is very slow.
What I am trying to do is easily and quickly done with the "unique"
function in MATLAB (see
http://www.mathworks.com/help/techdoc/ref/unique.html).
Thank you for your help,
Bronwyn
??? [[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
William Dunlap
2012-Aug-29 03:22 UTC
[R] return first index for each unique value in a vector
Here are two methods:> A<-c(9,2,9,5) > f1 <- function(x) { d <- !duplicated(x) ; data.frame(uniqueValue=x[d], firstIndex=which(d)) } > f2 <- function(x) { u <- unique(x) ; data.frame(uniqueValue=u, firstIndex=match(u, x))} > f1(A)uniqueValue firstIndex 1 9 1 2 2 2 3 5 4> identical(f1(A), f2(A))[1] TRUE> A6 <- sample(1e6, size=5e5, replace=TRUE) > system.time(z1 <- f1(A6))user system elapsed 0.25 0.02 0.27> system.time(z2 <- f2(A6))user system elapsed 0.09 0.02 0.11> identical(z1, z2)[1] TRUE Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf > Of Bronwyn Rayfield > Sent: Tuesday, August 28, 2012 12:59 PM > To: r-help at r-project.org > Subject: [R] return first index for each unique value in a vector > > I would like to efficiently find the first index of each unique value in a > very large vector. > > For example, if I have a vector > > A<-c(9,2,9,5) > > I would like to return not only the unique values (2,5,9) but also their > first indices (2,4,1). > > I tried using a for loop with which(A==unique(A)[i])[1] to find the first > index of each unique value but it is very slow. > > What I am trying to do is easily and quickly done with the "unique" > function in MATLAB (see > http://www.mathworks.com/help/techdoc/ref/unique.html). > > Thank you for your help, > Bronwyn > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.