Hi R Users
I am hoping someone might be able to give some pointers on alternative code to
the for loop described below.
I have a dataset which is ordered by subject ID and date, what I would like to
do is create a new variable that numbers the entries for each person (e.g.
1,2,3,....)
As an example if we have subjects A, B and C all with multiple entries (have
excluded date variable for simplicity), the for loop below achieves the desired
result, however my dataset is big (1 million + observations) and the for loop is
slow. Is there a more efficient way of getting to the desired result?
Many thanks in advance
Toni
A <-
data.frame(ID=c('A','A','A','A','B','B','B',
'C','C','C','C','C'))
ID
1 A
2 A
3 A
4 A
5 B
6 B
7 B
8 C
9 C
10 C
11 C
12 C
A$Session_ID <- 0
previous_ID <- ''
current_index <- 1
for ( i in seq(1,nrow(A)) )
{
if (A$ID[i] != previous_ID)
{current_index <- 1}
A$Session_ID[i] <- current_index
previous_ID <- A$ID[i]
current_index <- current_index + 1
}
ID Session_ID
1 A 1
2 A 2
3 A 3
4 A 4
5 B 1
6 B 2
7 B 3
8 C 1
9 C 2
10 C 3
11 C 4
12 C 5
try this:> x <- read.table('clipboard') > xV1 V2 1 1 A 2 2 A 3 3 A 4 4 A 5 5 B 6 6 B 7 7 B 8 8 C 9 9 C 10 10 C 11 11 C 12 12 C> x$ID <- ave(x$V1, x$V2, FUN = function(a)seq(length(a))) > xV1 V2 ID 1 1 A 1 2 2 A 2 3 3 A 3 4 4 A 4 5 5 B 1 6 6 B 2 7 7 B 3 8 8 C 1 9 9 C 2 10 10 C 3 11 11 C 4 12 12 C 5>On Wed, Sep 21, 2011 at 11:02 PM, Toni Pitcher <toni.pitcher at otago.ac.nz> wrote:> Hi R Users > > I am hoping someone might be able to give some pointers on alternative code to the for loop described below. > > I have a dataset which is ordered by subject ID and date, what I would like to do is create a new variable that numbers the entries for each person (e.g. 1,2,3,....) > > As an example if we have subjects A, B and C all with multiple entries (have excluded date variable for simplicity), the for loop below achieves the desired result, however my dataset is big (1 million + observations) and the for loop is slow. Is there a more efficient way of getting to the desired result? > > Many thanks in advance > > Toni > > > A <- data.frame(ID=c('A','A','A','A','B','B','B', 'C','C','C','C','C')) > > ?ID > 1 ? A > 2 ? A > 3 ? A > 4 ? A > 5 ? B > 6 ? B > 7 ? B > 8 ? C > 9 ? C > 10 ?C > 11 ?C > 12 ?C > > > A$Session_ID <- 0 > previous_ID <- '' > current_index <- 1 > for ( i in seq(1,nrow(A)) ) > { > ?if (A$ID[i] != previous_ID) > ? ?{current_index <- 1} > ?A$Session_ID[i] <- current_index > ?previous_ID <- A$ID[i] > ?current_index <- current_index + 1 > } > > > > ID Session_ID > 1 ? A ? ? ? ? ?1 > 2 ? A ? ? ? ? ?2 > 3 ? A ? ? ? ? ?3 > 4 ? A ? ? ? ? ?4 > 5 ? B ? ? ? ? ?1 > 6 ? B ? ? ? ? ?2 > 7 ? B ? ? ? ? ?3 > 8 ? C ? ? ? ? ?1 > 9 ? C ? ? ? ? ?2 > 10 ?C ? ? ? ? ?3 > 11 ?C ? ? ? ? ?4 > 12 ?C ? ? ? ? ?5 > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
A$Session_id <- ave(rep(1,length(A$ID),A$ID,FUN=cumsum)
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
Toni Pitcher <toni.pitcher@otago.ac.nz> wrote:
Hi R Users
I am hoping someone might be able to give some pointers on alternative code to
the for loop described below.
I have a dataset which is ordered by subject ID and date, what I would like to
do is create a new variable that numbers the entries for each person (e.g.
1,2,3,....)
As an example if we have subjects A, B and C all with multiple entries (have
excluded date variable for simplicity), the for loop below achieves the desired
result, however my dataset is big (1 million + observations) and the for loop is
slow. Is there a more efficient way of getting to the desired result?
Many thanks in advance
Toni
A <-
data.frame(ID=c('A','A','A','A','B','B','B',
'C','C','C','C','C'))
ID
1 A
2 A
3 A
4 A
5 B
6 B
7 B
8 C
9 C
10 C
11 C
12 C
A$Session_ID <- 0
previous_ID <- ''
current_index <- 1
for ( i in seq(1,nrow(A)) )
{
if (A$ID[i] != previous_ID)
{current_index <- 1}
A$Session_ID[i] <- current_index
previous_ID <- A$ID[i]
current_index <- current_index + 1
}
ID Session_ID
1 A 1
2 A 2
3 A 3
4 A 4
5 B 1
6 B 2
7 B 3
8 C 1
9 C 2
10 C 3
11 C 4
12 C 5
_____________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]