This small example will be applied to a problem with 1.4e6 lines of data. First, here is the dataset and a few lines of R script, followed by an explanation of what I'd like to get: dat <- read.table(textConnection("ISEG IRCH val 1 1 265 1 2 260 1 3 234 54 39 467 54 40 468 54 41 460 54 42 489 1 1 265 1 2 276 1 3 217 54 39 456 54 40 507 54 41 483 54 42 457 1 1 265 1 2 287 1 3 224 54 39 473 54 40 502 54 41 497 54 42 447 1 1 230 1 2 251 1 3 199 54 39 439 54 40 474 54 41 477 54 42 413 1 1 230 1 2 262 1 3 217 54 39 455 54 40 493 54 41 489 54 42 431 1 1 1002 1 2 1222 1 3 1198 54 39 1876 54 40 1565 54 41 1455 54 42 1427 1 1 1002 1 2 1246 1 3 1153 54 39 1813 54 40 1490 54 41 1518 54 42 1486 1 1 1002 1 2 1229 1 3 1142 54 39 1797 54 40 1517 54 41 1527 54 42 1514"),header=TRUE) dat$seq <- ifelse(dat$ISEG==1 & dat$IRCH==1, 1, 0) tmp <- diff(dat[dat$seq==1,]$val)!=0 dat$idx <- 0 dat[dat$seq==1,][c(TRUE,tmp),]$idx <- 1 dat$ts <- cumsum(dat$idx) At this point, I'd like to add one more column called "iter" that counts up by 1 based on "seq", but within each "ts". So, the result would look like this (undoubtedly this is a simple problem with something like ddply, but I've been unable to construct the R for it): dat ISEG IRCH val seq idx ts iter 1 1 265 1 1 1 1 1 2 260 0 0 1 1 1 3 234 0 0 1 1 54 39 467 0 0 1 1 54 40 468 0 0 1 1 54 41 460 0 0 1 1 54 42 489 0 0 1 1 1 1 265 1 0 1 2 1 2 276 0 0 1 2 1 3 217 0 0 1 2 54 39 456 0 0 1 2 54 40 507 0 0 1 2 54 41 483 0 0 1 2 54 42 457 0 0 1 2 1 1 265 1 0 1 3 1 2 287 0 0 1 3 1 3 224 0 0 1 3 54 39 473 0 0 1 3 54 40 502 0 0 1 3 54 41 497 0 0 1 3 54 42 447 0 0 1 3 1 1 230 1 1 2 1 1 2 251 0 0 2 1 1 3 199 0 0 2 1 54 39 439 0 0 2 1 54 40 474 0 0 2 1 54 41 477 0 0 2 1 54 42 413 0 0 2 1 1 1 230 1 0 2 2 1 2 262 0 0 2 2 1 3 217 0 0 2 2 54 39 455 0 0 2 2 54 40 493 0 0 2 2 54 41 489 0 0 2 2 54 42 431 0 0 2 2 1 1 1002 1 1 3 1 1 2 1222 0 0 3 1 1 3 1198 0 0 3 1 54 39 1876 0 0 3 1 54 40 1565 0 0 3 1 54 41 1455 0 0 3 1 54 42 1427 0 0 3 1 1 1 1002 1 0 3 2 1 2 1246 0 0 3 2 1 3 1153 0 0 3 2 54 39 1813 0 0 3 2 54 40 1490 0 0 3 2 54 41 1518 0 0 3 2 54 42 1486 0 0 3 2 1 1 1002 1 0 3 3 1 2 1229 0 0 3 3 1 3 1142 0 0 3 3 54 39 1797 0 0 3 3 54 40 1517 0 0 3 3 54 41 1527 0 0 3 3 54 42 1514 0 0 3 3 [[alternative HTML version deleted]]
On Apr 3, 2015, at 5:17 AM, Morway, Eric wrote:> This small example will be applied to a problem with 1.4e6 lines of data. > First, here is the dataset and a few lines of R script, followed by an > explanation of what I'd like to get: > > dat <- read.table(textConnection("ISEG IRCH val > 1 1 265 > 1 2 260 > 1 3 234 > 54 39 467 > 54 40 468 > 54 41 460 > 54 42 489 > 1 1 265 > 1 2 276 > 1 3 217 > 54 39 456 > 54 40 507 > 54 41 483 > 54 42 457 > 1 1 265 > 1 2 287 > 1 3 224 > 54 39 473 > 54 40 502 > 54 41 497 > 54 42 447 > 1 1 230 > 1 2 251 > 1 3 199 > 54 39 439 > 54 40 474 > 54 41 477 > 54 42 413 > 1 1 230 > 1 2 262 > 1 3 217 > 54 39 455 > 54 40 493 > 54 41 489 > 54 42 431 > 1 1 1002 > 1 2 1222 > 1 3 1198 > 54 39 1876 > 54 40 1565 > 54 41 1455 > 54 42 1427 > 1 1 1002 > 1 2 1246 > 1 3 1153 > 54 39 1813 > 54 40 1490 > 54 41 1518 > 54 42 1486 > 1 1 1002 > 1 2 1229 > 1 3 1142 > 54 39 1797 > 54 40 1517 > 54 41 1527 > 54 42 1514"),header=TRUE) > > dat$seq <- ifelse(dat$ISEG==1 & dat$IRCH==1, 1, 0) > tmp <- diff(dat[dat$seq==1,]$val)!=0 > dat$idx <- 0 > dat[dat$seq==1,][c(TRUE,tmp),]$idx <- 1 > dat$ts <- cumsum(dat$idx) > > At this point, I'd like to add one more column called "iter" that counts up > by 1 based on "seq", but within each "ts". So, the result would look like > this (undoubtedly this is a simple problem with something like ddply, but > I've been unable to construct the R for it):> dat$iter2 <- ave(dat$seq, dat$ts,FUN=cumsum) > datISEG IRCH val seq idx ts iter iter2 1 1 1 265 1 1 1 1_1 1 2 1 2 260 0 0 1 1_1 1 3 1 3 234 0 0 1 1_1 1 4 54 39 467 0 0 1 1_1 1 5 54 40 468 0 0 1 1_1 1 6 54 41 460 0 0 1 1_1 1 7 54 42 489 0 0 1 1_1 1 8 1 1 265 1 0 1 1_2 2 9 1 2 276 0 0 1 1_2 2 10 1 3 217 0 0 1 1_2 2 11 54 39 456 0 0 1 1_2 2 12 54 40 507 0 0 1 1_2 2 13 54 41 483 0 0 1 1_2 2 14 54 42 457 0 0 1 1_2 2 15 1 1 265 1 0 1 1_3 3 16 1 2 287 0 0 1 1_3 3 17 1 3 224 0 0 1 1_3 3 18 54 39 473 0 0 1 1_3 3 19 54 40 502 0 0 1 1_3 3 20 54 41 497 0 0 1 1_3 3 21 54 42 447 0 0 1 1_3 3 22 1 1 230 1 1 2 2_4 1 23 1 2 251 0 0 2 2_4 1 snipped-----> -- David> > dat > ISEG IRCH val seq idx ts iter > 1 1 265 1 1 1 1 > 1 2 260 0 0 1 1 > 1 3 234 0 0 1 1 > 54 39 467 0 0 1 1 > 54 40 468 0 0 1 1 > 54 41 460 0 0 1 1 > 54 42 489 0 0 1 1 > 1 1 265 1 0 1 2 > 1 2 276 0 0 1 2 > 1 3 217 0 0 1 2 > 54 39 456 0 0 1 2 > 54 40 507 0 0 1 2 > 54 41 483 0 0 1 2 > 54 42 457 0 0 1 2 > 1 1 265 1 0 1 3 > 1 2 287 0 0 1 3 > 1 3 224 0 0 1 3 > 54 39 473 0 0 1 3 > 54 40 502 0 0 1 3 > 54 41 497 0 0 1 3 > 54 42 447 0 0 1 3 > 1 1 230 1 1 2 1 > 1 2 251 0 0 2 1 > 1 3 199 0 0 2 1 > 54 39 439 0 0 2 1 > 54 40 474 0 0 2 1 > 54 41 477 0 0 2 1 > 54 42 413 0 0 2 1 > 1 1 230 1 0 2 2 > 1 2 262 0 0 2 2 > 1 3 217 0 0 2 2 > 54 39 455 0 0 2 2 > 54 40 493 0 0 2 2 > 54 41 489 0 0 2 2 > 54 42 431 0 0 2 2 > 1 1 1002 1 1 3 1 > 1 2 1222 0 0 3 1 > 1 3 1198 0 0 3 1 > 54 39 1876 0 0 3 1 > 54 40 1565 0 0 3 1 > 54 41 1455 0 0 3 1 > 54 42 1427 0 0 3 1 > 1 1 1002 1 0 3 2 > 1 2 1246 0 0 3 2 > 1 3 1153 0 0 3 2 > 54 39 1813 0 0 3 2 > 54 40 1490 0 0 3 2 > 54 41 1518 0 0 3 2 > 54 42 1486 0 0 3 2 > 1 1 1002 1 0 3 3 > 1 2 1229 0 0 3 3 > 1 3 1142 0 0 3 3 > 54 39 1797 0 0 3 3 > 54 40 1517 0 0 3 3 > 54 41 1527 0 0 3 3 > 54 42 1514 0 0 3 3 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
ave() is your friend (unfortunately named as it may be):> ave(dat$seq, dat$ts, FUN=cumsum)[1] 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 1 1 1 1 1 1 1 2 2 2 2 2 2 2 1 1 1 [39] 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3> On 03 Apr 2015, at 14:17 , Morway, Eric <emorway at usgs.gov> wrote: > > This small example will be applied to a problem with 1.4e6 lines of data. > First, here is the dataset and a few lines of R script, followed by an > explanation of what I'd like to get: > > dat <- read.table(textConnection("ISEG IRCH val > 1 1 265 > 1 2 260 > 1 3 234 > 54 39 467 > 54 40 468 > 54 41 460 > 54 42 489 > 1 1 265 > 1 2 276 > 1 3 217 > 54 39 456 > 54 40 507 > 54 41 483 > 54 42 457 > 1 1 265 > 1 2 287 > 1 3 224 > 54 39 473 > 54 40 502 > 54 41 497 > 54 42 447 > 1 1 230 > 1 2 251 > 1 3 199 > 54 39 439 > 54 40 474 > 54 41 477 > 54 42 413 > 1 1 230 > 1 2 262 > 1 3 217 > 54 39 455 > 54 40 493 > 54 41 489 > 54 42 431 > 1 1 1002 > 1 2 1222 > 1 3 1198 > 54 39 1876 > 54 40 1565 > 54 41 1455 > 54 42 1427 > 1 1 1002 > 1 2 1246 > 1 3 1153 > 54 39 1813 > 54 40 1490 > 54 41 1518 > 54 42 1486 > 1 1 1002 > 1 2 1229 > 1 3 1142 > 54 39 1797 > 54 40 1517 > 54 41 1527 > 54 42 1514"),header=TRUE) > > dat$seq <- ifelse(dat$ISEG==1 & dat$IRCH==1, 1, 0) > tmp <- diff(dat[dat$seq==1,]$val)!=0 > dat$idx <- 0 > dat[dat$seq==1,][c(TRUE,tmp),]$idx <- 1 > dat$ts <- cumsum(dat$idx) > > At this point, I'd like to add one more column called "iter" that counts up > by 1 based on "seq", but within each "ts". So, the result would look like > this (undoubtedly this is a simple problem with something like ddply, but > I've been unable to construct the R for it): > > dat > ISEG IRCH val seq idx ts iter > 1 1 265 1 1 1 1 > 1 2 260 0 0 1 1 > 1 3 234 0 0 1 1 > 54 39 467 0 0 1 1 > 54 40 468 0 0 1 1 > 54 41 460 0 0 1 1 > 54 42 489 0 0 1 1 > 1 1 265 1 0 1 2 > 1 2 276 0 0 1 2 > 1 3 217 0 0 1 2 > 54 39 456 0 0 1 2 > 54 40 507 0 0 1 2 > 54 41 483 0 0 1 2 > 54 42 457 0 0 1 2 > 1 1 265 1 0 1 3 > 1 2 287 0 0 1 3 > 1 3 224 0 0 1 3 > 54 39 473 0 0 1 3 > 54 40 502 0 0 1 3 > 54 41 497 0 0 1 3 > 54 42 447 0 0 1 3 > 1 1 230 1 1 2 1 > 1 2 251 0 0 2 1 > 1 3 199 0 0 2 1 > 54 39 439 0 0 2 1 > 54 40 474 0 0 2 1 > 54 41 477 0 0 2 1 > 54 42 413 0 0 2 1 > 1 1 230 1 0 2 2 > 1 2 262 0 0 2 2 > 1 3 217 0 0 2 2 > 54 39 455 0 0 2 2 > 54 40 493 0 0 2 2 > 54 41 489 0 0 2 2 > 54 42 431 0 0 2 2 > 1 1 1002 1 1 3 1 > 1 2 1222 0 0 3 1 > 1 3 1198 0 0 3 1 > 54 39 1876 0 0 3 1 > 54 40 1565 0 0 3 1 > 54 41 1455 0 0 3 1 > 54 42 1427 0 0 3 1 > 1 1 1002 1 0 3 2 > 1 2 1246 0 0 3 2 > 1 3 1153 0 0 3 2 > 54 39 1813 0 0 3 2 > 54 40 1490 0 0 3 2 > 54 41 1518 0 0 3 2 > 54 42 1486 0 0 3 2 > 1 1 1002 1 0 3 3 > 1 2 1229 0 0 3 3 > 1 3 1142 0 0 3 3 > 54 39 1797 0 0 3 3 > 54 40 1517 0 0 3 3 > 54 41 1527 0 0 3 3 > 54 42 1514 0 0 3 3 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com