Dear all, I want to replace an (unsorted) id variable in a large dataset by a running number without changing the order of the cases. E.g., y <- c(4,4,4,2,45,12,12) should be replaced by something like x <- c(1,1,1,2,3,4,4) Sorry for this simple question & thank you very much for your help! [[alternative HTML version deleted]]
The rle (run length encoding) function is ideal for problems like this:> y <- c(4,4,4,2,45,12,12) > rr = rle(y) > rep(seq(along=rr$values),rr$lengths)[1] 1 1 1 2 3 4 4 - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Sat, 27 Mar 2010, sun wrote:> Dear all, > > I want to replace an (unsorted) id variable in a large dataset by a running > number without changing the order of the cases. > > E.g., > > y <- c(4,4,4,2,45,12,12) > > should be replaced by something like > > x <- c(1,1,1,2,3,4,4) > > Sorry for this simple question & thank you very much for your help! > > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Try this: as.numeric(factor(y, levels = unique(y))) On Sat, Mar 27, 2010 at 12:42 PM, sun <sun.sonny71 at googlemail.com> wrote:> Dear all, > > I want to replace an (unsorted) id variable in a large dataset by a running > number without changing the order of the cases. > > E.g., > > y <- c(4,4,4,2,45,12,12) > > should be replaced by something like > > x <- c(1,1,1,2,3,4,4) > > Sorry for this simple question ?& thank you very much for your help! > > > > > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paran?-Brasil 25? 25' 40" S 49? 16' 22" O
try this: y <- c(4,4,4,2,45,12,12) match(y, unique(y)) I hope it helps. Best, Dimitris On 3/27/2010 4:42 PM, sun wrote:> Dear all, > > I want to replace an (unsorted) id variable in a large dataset by a running > number without changing the order of the cases. > > E.g., > > y<- c(4,4,4,2,45,12,12) > > should be replaced by something like > > x<- c(1,1,1,2,3,4,4) > > Sorry for this simple question& thank you very much for your help! > > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014
Here is a comparison of the speed of the solutions so far plus one based on cumsum. It seems that cumsum was the fastest and is 26x faster than the slowest solution. The speed may not be so important here and readability might be key in which case a good compromise might be match which was still pretty fast (1.6x the time of cumsum) and is very simple.> library(rbenchmark) > set.seed(1) > y <- sample(10000, 10000, replace = TRUE) > benchmark(+ match = match(y, unique(y)), + cumsum = cumsum(c(FALSE, y[-1] != y[-length(y)])) + 1, + rle = with(rle(y), rep(seq_along(values), lengths)), + factor = as.numeric(factor(y, levels = unique(y))) + ) test replications elapsed relative user.self sys.self user.child sys.child 2 cumsum 100 0.21 1.000000 0.22 0 NA NA 4 factor 100 5.50 26.190476 5.32 0 NA NA 1 match 100 0.34 1.619048 0.33 0 NA NA 3 rle 100 0.81 3.857143 0.81 0 NA NA>On Sat, Mar 27, 2010 at 11:42 AM, sun <sun.sonny71 at googlemail.com> wrote:> Dear all, > > I want to replace an (unsorted) id variable in a large dataset by a running > number without changing the order of the cases. > > E.g., > > y <- c(4,4,4,2,45,12,12) > > should be replaced by something like > > x <- c(1,1,1,2,3,4,4) > > Sorry for this simple question ?& thank you very much for your help! > > > > > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >