Bronwyn Rayfield
2012-Aug-28 19:58 UTC
[R] return first index for each unique value in a vector
I would like to efficiently find the first index of each unique value in a very large vector. For example, if I have a vector A<-c(9,2,9,5) I would like to return not only the unique values (2,5,9) but also their first indices (2,4,1). I tried using a for loop with which(A==unique(A)[i])[1] to find the first index of each unique value but it is very slow. What I am trying to do is easily and quickly done with the "unique" function in MATLAB (see http://www.mathworks.com/help/techdoc/ref/unique.html). Thank you for your help, Bronwyn [[alternative HTML version deleted]]
R. Michael Weylandt
2012-Aug-28 22:32 UTC
[R] return first index for each unique value in a vector
On Tue, Aug 28, 2012 at 2:58 PM, Bronwyn Rayfield <bronwynrayfield at gmail.com> wrote:> I would like to efficiently find the first index of each unique value in a > very large vector. > > For example, if I have a vector > > A<-c(9,2,9,5) > > I would like to return not only the unique values (2,5,9) but also their > first indices (2,4,1). > > I tried using a for loop with which(A==unique(A)[i])[1] to find the first > index of each unique value but it is very slow.You'll get marginally more speed from which.max() but I'm sure there's a better way. I'll write if I can think of it. Michael> > What I am trying to do is easily and quickly done with the "unique" > function in MATLAB (see > http://www.mathworks.com/help/techdoc/ref/unique.html). > > Thank you for your help, > Bronwyn > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Noia Raindrops
2012-Aug-28 22:49 UTC
[R] return first index for each unique value in a vector
Hi, Try this: order(A)[!duplicated(sort(A))] -- Noia Raindrops noia.raindrops at gmail.com
HI, I was thinking about duplicated().? But, Bert already posted the solution.? The solution below is not very efficient. A<-c(9,2,9,5) unik<-as.numeric(names(table(A))) match(unik,A) #[1] 2 4 1 #Bert's solution wins here. system.time({ set.seed(1) A<-sample(1:5,1e6,replace=TRUE) unik <- !duplicated(A)? ## logical vector of unique values seq_along(A)[unik]? ## indices A[unik]}) ?user? system elapsed ? 0.040?? 0.016?? 0.056 #My solution system.time({ set.seed(1) A<-sample(1:5,1e6,replace=TRUE) #unik<-as.numeric(names(table(A))) match(as.numeric(names(table(A))),A)}) ?user? system elapsed ?0.344?? 0.036?? 0.383 #Robert's solution ?system.time({ set.seed(1) A<-sample(1:5,1e6,replace=TRUE) as.numeric(rownames(unique(data.frame(A)[1])))}) ?user? system elapsed ? 0.056?? 0.012?? 0.069 A.K. ----- Original Message ----- From: Bronwyn Rayfield <bronwynrayfield at gmail.com> To: r-help at r-project.org Cc: Sent: Tuesday, August 28, 2012 3:58 PM Subject: [R] return first index for each unique value in a vector I would like to efficiently find the first index of each unique value in a very large vector. For example, if I have a vector A<-c(9,2,9,5) I would like to return not only the unique values (2,5,9) but also their first indices (2,4,1). I tried using a for loop with which(A==unique(A)[i])[1] to find the first index of each unique value but it is very slow. What I am trying to do is easily and quickly done with the "unique" function in MATLAB (see http://www.mathworks.com/help/techdoc/ref/unique.html). Thank you for your help, Bronwyn ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
HI, Replacing seq_along() with which() slightly improved CPU time. ? system.time({ ?set.seed(1) ?A<-sample(1:5,1e6,replace=TRUE) ?which(!duplicated(A)) ?A[which(!duplicated(A))] ?}) #?? user? system elapsed ? #0.040?? 0.012?? 0.052? A.K. ----- Original Message ----- From: Bronwyn Rayfield <bronwynrayfield at gmail.com> To: r-help at r-project.org Cc: Sent: Tuesday, August 28, 2012 3:58 PM Subject: [R] return first index for each unique value in a vector I would like to efficiently find the first index of each unique value in a very large vector. For example, if I have a vector A<-c(9,2,9,5) I would like to return not only the unique values (2,5,9) but also their first indices (2,4,1). I tried using a for loop with which(A==unique(A)[i])[1] to find the first index of each unique value but it is very slow. What I am trying to do is easily and quickly done with the "unique" function in MATLAB (see http://www.mathworks.com/help/techdoc/ref/unique.html). Thank you for your help, Bronwyn ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
William Dunlap
2012-Aug-29 03:22 UTC
[R] return first index for each unique value in a vector
Here are two methods:> A<-c(9,2,9,5) > f1 <- function(x) { d <- !duplicated(x) ; data.frame(uniqueValue=x[d], firstIndex=which(d)) } > f2 <- function(x) { u <- unique(x) ; data.frame(uniqueValue=u, firstIndex=match(u, x))} > f1(A)uniqueValue firstIndex 1 9 1 2 2 2 3 5 4> identical(f1(A), f2(A))[1] TRUE> A6 <- sample(1e6, size=5e5, replace=TRUE) > system.time(z1 <- f1(A6))user system elapsed 0.25 0.02 0.27> system.time(z2 <- f2(A6))user system elapsed 0.09 0.02 0.11> identical(z1, z2)[1] TRUE Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf > Of Bronwyn Rayfield > Sent: Tuesday, August 28, 2012 12:59 PM > To: r-help at r-project.org > Subject: [R] return first index for each unique value in a vector > > I would like to efficiently find the first index of each unique value in a > very large vector. > > For example, if I have a vector > > A<-c(9,2,9,5) > > I would like to return not only the unique values (2,5,9) but also their > first indices (2,4,1). > > I tried using a for loop with which(A==unique(A)[i])[1] to find the first > index of each unique value but it is very slow. > > What I am trying to do is easily and quickly done with the "unique" > function in MATLAB (see > http://www.mathworks.com/help/techdoc/ref/unique.html). > > Thank you for your help, > Bronwyn > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.