I have a data set with several thousand observations across time, grouped by subject (example format below) ID TIME OBS 001 2200 23 001 2400 11 001 3200 10 001 4500 22 003 3900 45 003 5605 32 005 1800 56 005 1900 34 005 2300 23 ... I would like to identify the first time for each subject, and then subtract this value from each subsequent time. However, the number of observations per subject varies widely (from 1 to 20), and the intervals between times varies widely. Is there a package that can help do this, or a loop that can be set up to evaluate ID, then calculate the values? The outcome I would like is presented below. ID TIME OBS 001 0 23 001 200 11 001 1000 10 001 2300 22 003 0 45 003 1705 32 005 0 56 005 100 34 005 500 23 ... Any help appreciated.
Here is one way Here is one way:> con <- textConnection("+ ID TIME OBS + 001 2200 23 + 001 2400 11 + 001 3200 10 + 001 4500 22 + 003 3900 45 + 003 5605 32 + 005 1800 56 + 005 1900 34 + 005 2300 23")> dat <- read.table(con, header = TRUE,+ colClasses = c("factor", "numeric", "numeric"))> closeAllConnections() > > tmp <- lapply(split(dat, dat$ID),+ function(x) within(x, TIME <- TIME - min(TIME)))> split(dat, dat$ID) <- tmp > datID TIME OBS 1 001 0 23 2 001 200 11 3 001 1000 10 4 001 2300 22 5 003 0 45 6 003 1705 32 7 005 0 56 8 005 100 34 9 005 500 23>________________________________________ From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On Behalf Of Matthew Strother [rstrothe at gmail.com] Sent: 16 January 2011 07:26 To: r-help at r-project.org Subject: [R] data prep question I have a data set with several thousand observations across time, grouped by subject (example format below) ID TIME OBS 001 2200 23 001 2400 11 001 3200 10 001 4500 22 003 3900 45 003 5605 32 005 1800 56 005 1900 34 005 2300 23 ... I would like to identify the first time for each subject, and then subtract this value from each subsequent time. However, the number of observations per subject varies widely (from 1 to 20), and the intervals between times varies widely. Is there a package that can help do this, or a loop that can be set up to evaluate ID, then calculate the values? The outcome I would like is presented below. ID TIME OBS 001 0 23 001 200 11 001 1000 10 001 2300 22 003 0 45 003 1705 32 005 0 56 005 100 34 005 500 23 ... Any help appreciated. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
thanks so much - that did it. I am new to this - so the help is greatly appreciated. Matthew -- View this message in context: http://r.789695.n4.nabble.com/data-prep-question-tp3219824p3220026.html Sent from the R help mailing list archive at Nabble.com.
On Sat, Jan 15, 2011 at 4:26 PM, Matthew Strother <rstrothe at gmail.com> wrote:> I have a data set with several thousand observations across time, grouped by subject (example format below) > > ID ? ? ? ? ? ? ?TIME ? ?OBS > 001 ? ? ? ? ? ? 2200 ? ?23 > 001 ? ? ? ? ? ? 2400 ? ?11 > 001 ? ? ? ? ? ? 3200 ? ?10 > 001 ? ? ? ? ? ? 4500 ? ?22 > 003 ? ? ? ? ? ? 3900 ? ?45 > 003 ? ? ? ? ? ? 5605 ? ?32 > 005 ? ? ? ? ? ? 1800 ? ?56 > 005 ? ? ? ? ? ? 1900 ? ?34 > 005 ? ? ? ? ? ? 2300 ? ?23 > ... > > I would like to identify the first time for each subject, and then subtract this value from each subsequent time. ?However, the number of observations per subject varies widely (from 1 to 20), and the intervals between times varies widely. ? Is there a package that can help do this, or a loop that can be set up to evaluate ID, then calculate the values? ?The outcome I would like is presented below. > ID ? ? ? ? ? ? ?TIME ? ?OBS > 001 ? ? ? ? ? ? 0 ? ? ? ? ? ? ? 23 > 001 ? ? ? ? ? ? 200 ? ? ? ? ? ? 11 > 001 ? ? ? ? ? ? 1000 ? ?10 > 001 ? ? ? ? ? ? 2300 ? ?22 > 003 ? ? ? ? ? ? 0 ? ? ? ? ? ? ? 45 > 003 ? ? ? ? ? ? 1705 ? ?32 > 005 ? ? ? ? ? ? 0 ? ? ? ? ? ? ? 56 > 005 ? ? ? ? ? ? 100 ? ? ? ? ? ? 34 > 005 ? ? ? ? ? ? 500 ? ? ? ? ? ? 23Since the data frame appears to be already sorted by time within ID we can do this:> transform(DF, OBS = ave(OBS, ID, FUN = function(x) x - x[1]))ID TIME OBS 1 1 2200 0 2 1 2400 -12 3 1 3200 -13 4 1 4500 -1 5 3 3900 0 6 3 5605 -13 7 5 1800 0 8 5 1900 -22 9 5 2300 -33 -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
On Sun, Jan 16, 2011 at 5:48 AM, <Bill.Venables at csiro.au> wrote:> Here is one way > > Here is one way: > >> con <- textConnection(" > + ID ? ? ? ? ? ? ?TIME ? ?OBS > + 001 ? ? ? ? ? ? 2200 ? ?23 > + 001 ? ? ? ? ? ? 2400 ? ?11 > + 001 ? ? ? ? ? ? 3200 ? ?10 > + 001 ? ? ? ? ? ? 4500 ? ?22 > + 003 ? ? ? ? ? ? 3900 ? ? 45 > + 003 ? ? ? ? ? ? 5605 ? ? 32 > + 005 ? ? ? ? ? ? 1800 ? ?56 > + 005 ? ? ? ? ? ? 1900 ? ?34 > + 005 ? ? ? ? ? ? 2300 ? ?23") >> dat <- read.table(con, header = TRUE, > + colClasses = c("factor", "numeric", "numeric")) >> closeAllConnections() >> >> tmp <- lapply(split(dat, dat$ID), > + function(x) within(x, TIME <- TIME - min(TIME))) >> split(dat, dat$ID) <- tmpOr, in one line with ddply: library(plyr) ddply(dat, "ID", transform, TIME = TIME - min(TIME)) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/