Hi R Users I am hoping someone might be able to give some pointers on alternative code to the for loop described below. I have a dataset which is ordered by subject ID and date, what I would like to do is create a new variable that numbers the entries for each person (e.g. 1,2,3,....) As an example if we have subjects A, B and C all with multiple entries (have excluded date variable for simplicity), the for loop below achieves the desired result, however my dataset is big (1 million + observations) and the for loop is slow. Is there a more efficient way of getting to the desired result? Many thanks in advance Toni A <- data.frame(ID=c('A','A','A','A','B','B','B', 'C','C','C','C','C')) ID 1 A 2 A 3 A 4 A 5 B 6 B 7 B 8 C 9 C 10 C 11 C 12 C A$Session_ID <- 0 previous_ID <- '' current_index <- 1 for ( i in seq(1,nrow(A)) ) { if (A$ID[i] != previous_ID) {current_index <- 1} A$Session_ID[i] <- current_index previous_ID <- A$ID[i] current_index <- current_index + 1 } ID Session_ID 1 A 1 2 A 2 3 A 3 4 A 4 5 B 1 6 B 2 7 B 3 8 C 1 9 C 2 10 C 3 11 C 4 12 C 5
try this:> x <- read.table('clipboard') > xV1 V2 1 1 A 2 2 A 3 3 A 4 4 A 5 5 B 6 6 B 7 7 B 8 8 C 9 9 C 10 10 C 11 11 C 12 12 C> x$ID <- ave(x$V1, x$V2, FUN = function(a)seq(length(a))) > xV1 V2 ID 1 1 A 1 2 2 A 2 3 3 A 3 4 4 A 4 5 5 B 1 6 6 B 2 7 7 B 3 8 8 C 1 9 9 C 2 10 10 C 3 11 11 C 4 12 12 C 5>On Wed, Sep 21, 2011 at 11:02 PM, Toni Pitcher <toni.pitcher at otago.ac.nz> wrote:> Hi R Users > > I am hoping someone might be able to give some pointers on alternative code to the for loop described below. > > I have a dataset which is ordered by subject ID and date, what I would like to do is create a new variable that numbers the entries for each person (e.g. 1,2,3,....) > > As an example if we have subjects A, B and C all with multiple entries (have excluded date variable for simplicity), the for loop below achieves the desired result, however my dataset is big (1 million + observations) and the for loop is slow. Is there a more efficient way of getting to the desired result? > > Many thanks in advance > > Toni > > > A <- data.frame(ID=c('A','A','A','A','B','B','B', 'C','C','C','C','C')) > > ?ID > 1 ? A > 2 ? A > 3 ? A > 4 ? A > 5 ? B > 6 ? B > 7 ? B > 8 ? C > 9 ? C > 10 ?C > 11 ?C > 12 ?C > > > A$Session_ID <- 0 > previous_ID <- '' > current_index <- 1 > for ( i in seq(1,nrow(A)) ) > { > ?if (A$ID[i] != previous_ID) > ? ?{current_index <- 1} > ?A$Session_ID[i] <- current_index > ?previous_ID <- A$ID[i] > ?current_index <- current_index + 1 > } > > > > ID Session_ID > 1 ? A ? ? ? ? ?1 > 2 ? A ? ? ? ? ?2 > 3 ? A ? ? ? ? ?3 > 4 ? A ? ? ? ? ?4 > 5 ? B ? ? ? ? ?1 > 6 ? B ? ? ? ? ?2 > 7 ? B ? ? ? ? ?3 > 8 ? C ? ? ? ? ?1 > 9 ? C ? ? ? ? ?2 > 10 ?C ? ? ? ? ?3 > 11 ?C ? ? ? ? ?4 > 12 ?C ? ? ? ? ?5 > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
A$Session_id <- ave(rep(1,length(A$ID),A$ID,FUN=cumsum) --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. Toni Pitcher <toni.pitcher@otago.ac.nz> wrote: Hi R Users I am hoping someone might be able to give some pointers on alternative code to the for loop described below. I have a dataset which is ordered by subject ID and date, what I would like to do is create a new variable that numbers the entries for each person (e.g. 1,2,3,....) As an example if we have subjects A, B and C all with multiple entries (have excluded date variable for simplicity), the for loop below achieves the desired result, however my dataset is big (1 million + observations) and the for loop is slow. Is there a more efficient way of getting to the desired result? Many thanks in advance Toni A <- data.frame(ID=c('A','A','A','A','B','B','B', 'C','C','C','C','C')) ID 1 A 2 A 3 A 4 A 5 B 6 B 7 B 8 C 9 C 10 C 11 C 12 C A$Session_ID <- 0 previous_ID <- '' current_index <- 1 for ( i in seq(1,nrow(A)) ) { if (A$ID[i] != previous_ID) {current_index <- 1} A$Session_ID[i] <- current_index previous_ID <- A$ID[i] current_index <- current_index + 1 } ID Session_ID 1 A 1 2 A 2 3 A 3 4 A 4 5 B 1 6 B 2 7 B 3 8 C 1 9 C 2 10 C 3 11 C 4 12 C 5 _____________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]