Hi Everyone, I would like to do sequential subtractions within a group so that I know the time between separate observations for a group of individuals. My data: data <- structure(list(group = c("IND1", "IND1", "IND2", "IND2", "IND2", "IND3", "IND4", "IND5", "IND6", "IND6"), date_obs = structure(c(6468, 7063, 9981, 14186, 14372, 5129, 9767, 11168, 10243, 10647), class "Date")), .Names = c("group", "date_obs"), row.names = c(NA, 10L), class = "data.frame") So I start with: group date_obs 1 IND1 1987-09-17 2 IND1 1989-05-04 3 IND2 1997-04-30 4 IND2 2008-11-03 5 IND2 2009-05-08 6 IND3 1984-01-17 7 IND4 1996-09-28 8 IND5 2000-07-30 9 IND6 1998-01-17 10 IND6 1999-02-25 what I would like: group date_obs time 1 IND1 1987-09-17 NA 2 IND1 1989-05-04 595 3 IND2 1997-04-30 NA 4 IND2 2008-11-03 4205 5 IND2 2009-05-08 186 6 IND3 1984-01-17 NA 7 IND4 1996-09-28 NA 8 IND5 2000-07-30 NA 9 IND6 1998-01-17 NA 10 IND6 1999-02-25 404 So that if there is one entry/individual a 0/NA would be acceptable and if there is more than one entry/individual the sequential difference would be calculated. I started with some code but it I cannot edit it appropriately. x <- do.call(rbind, lapply(split(data, data$group), function(dat) { dat <- dat[order(dat$date_obs), ] d<-diff(dat$date_obs) dat <- rbind(dat,d) })) I get this error: "Error in as.Date.numeric(value) : 'origin' must be supplied" so I'm not sure if it does what I need it to do. In addition to this the vector lengths won't match up as the first date in the sequence won't be subtracted from itself. I'm not sure if anyone knows an easier way to achieve this. Thanks for the help, Natalie ----- Natalie Van Zuydam PhD Student University of Dundee nvanzuydam at dundee.ac.uk -- View this message in context: http://r.789695.n4.nabble.com/within-group-sequential-subtraction-tp3346033p3346033.html Sent from the R help mailing list archive at Nabble.com.
Dear Natalie, I am sure there are other ways, but one way you can do this is by applying diff() to each group using tapply() or by(). Because those return lists, if you want to add it back into your data frame, you can wrap the whole call in unlist(). Here is an example: dat <- structure(list(group = c("IND1", "IND1", "IND2", "IND2", "IND2", "IND3", "IND4", "IND5", "IND6", "IND6"), date_obs = structure(c(6468, 7063, 9981, 14186, 14372, 5129, 9767, 11168, 10243, 10647), class "Date")), .Names = c("group", "date_obs"), row.names = c(NA, 10L), class = "data.frame") ## calculate differences using diff() by each group ## note the prepended NA dat$time <- unlist(tapply(dat$date_obs, dat$group, function(x) {diff(c(NA, x))})) dat ## updated data frame HTH, Josh On Thu, Mar 10, 2011 at 6:56 AM, natalie.vanzuydam <nvanzuydam at gmail.com> wrote:> Hi Everyone, > > I would like to do sequential subtractions within a group so that I know the > time between separate observations for a group of individuals. > > My data: > > data <- structure(list(group = c("IND1", "IND1", "IND2", > "IND2", "IND2", "IND3", "IND4", "IND5", > "IND6", "IND6"), date_obs = structure(c(6468, > 7063, 9981, 14186, 14372, 5129, 9767, 11168, 10243, 10647), class > "Date")), .Names = c("group", > "date_obs"), row.names = c(NA, 10L), class = "data.frame") > > So I start with: > > ?group ? date_obs > 1 ? IND1 1987-09-17 > 2 ? IND1 1989-05-04 > 3 ? IND2 1997-04-30 > 4 ? IND2 2008-11-03 > 5 ? IND2 2009-05-08 > 6 ? IND3 1984-01-17 > 7 ? IND4 1996-09-28 > 8 ? IND5 2000-07-30 > 9 ? IND6 1998-01-17 > 10 ?IND6 1999-02-25 > > what I would like: > > ?group ? date_obs ? ? time > 1 ? IND1 1987-09-17 NA > 2 ? IND1 1989-05-04 595 > 3 ? IND2 1997-04-30 NA > 4 ? IND2 2008-11-03 4205 > 5 ? IND2 2009-05-08 186 > 6 ? IND3 1984-01-17 NA > 7 ? IND4 1996-09-28 NA > 8 ? IND5 2000-07-30 NA > 9 ? IND6 1998-01-17 NA > 10 ?IND6 1999-02-25 404 > > So that if there is one entry/individual a 0/NA would be acceptable and if > there is more than one entry/individual the sequential difference would be > calculated. > > I started with some code but it I cannot edit it appropriately. > > x <- do.call(rbind, lapply(split(data, data$group), > ? ? ? ?function(dat) { > ? ? ? ? ? ? ? ? ? ? ? ?dat <- dat[order(dat$date_obs), ] > ? ? ? ? ? ? ? ? ? ? ? ?d<-diff(dat$date_obs) > ? ? ? ? ? ? ? ? ? ? ? ? dat <- rbind(dat,d) > ? ? ? ? ? ? ? ? ? ? ? ?})) > > I get this error: "Error in as.Date.numeric(value) : 'origin' must be > supplied" so I'm not sure if it does what I need it to do. ?In addition to > this the vector lengths won't match up as the first date in the sequence > won't be subtracted from itself. > > I'm not sure if anyone knows an easier way to achieve this. > > Thanks for the help, > Natalie > > > > > ----- > Natalie Van Zuydam > > PhD Student > University of Dundee > nvanzuydam at dundee.ac.uk > -- > View this message in context: http://r.789695.n4.nabble.com/within-group-sequential-subtraction-tp3346033p3346033.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
Try this:> data$diff <- ave(as.numeric(data$date_obs), data$group, FUN=function(x)c(NA, diff(x))) > datagroup date_obs diff 1 IND1 1987-09-17 NA 2 IND1 1989-05-04 595 3 IND2 1997-04-30 NA 4 IND2 2008-11-03 4205 5 IND2 2009-05-08 186 6 IND3 1984-01-17 NA 7 IND4 1996-09-28 NA 8 IND5 2000-07-30 NA 9 IND6 1998-01-17 NA 10 IND6 1999-02-25 404>On Thu, Mar 10, 2011 at 9:56 AM, natalie.vanzuydam <nvanzuydam at gmail.com> wrote:> Hi Everyone, > > I would like to do sequential subtractions within a group so that I know the > time between separate observations for a group of individuals. > > My data: > > data <- structure(list(group = c("IND1", "IND1", "IND2", > "IND2", "IND2", "IND3", "IND4", "IND5", > "IND6", "IND6"), date_obs = structure(c(6468, > 7063, 9981, 14186, 14372, 5129, 9767, 11168, 10243, 10647), class > "Date")), .Names = c("group", > "date_obs"), row.names = c(NA, 10L), class = "data.frame") > > So I start with: > > ?group ? date_obs > 1 ? IND1 1987-09-17 > 2 ? IND1 1989-05-04 > 3 ? IND2 1997-04-30 > 4 ? IND2 2008-11-03 > 5 ? IND2 2009-05-08 > 6 ? IND3 1984-01-17 > 7 ? IND4 1996-09-28 > 8 ? IND5 2000-07-30 > 9 ? IND6 1998-01-17 > 10 ?IND6 1999-02-25 > > what I would like: > > ?group ? date_obs ? ? time > 1 ? IND1 1987-09-17 NA > 2 ? IND1 1989-05-04 595 > 3 ? IND2 1997-04-30 NA > 4 ? IND2 2008-11-03 4205 > 5 ? IND2 2009-05-08 186 > 6 ? IND3 1984-01-17 NA > 7 ? IND4 1996-09-28 NA > 8 ? IND5 2000-07-30 NA > 9 ? IND6 1998-01-17 NA > 10 ?IND6 1999-02-25 404 > > So that if there is one entry/individual a 0/NA would be acceptable and if > there is more than one entry/individual the sequential difference would be > calculated. > > I started with some code but it I cannot edit it appropriately. > > x <- do.call(rbind, lapply(split(data, data$group), > ? ? ? ?function(dat) { > ? ? ? ? ? ? ? ? ? ? ? ?dat <- dat[order(dat$date_obs), ] > ? ? ? ? ? ? ? ? ? ? ? ?d<-diff(dat$date_obs) > ? ? ? ? ? ? ? ? ? ? ? ? dat <- rbind(dat,d) > ? ? ? ? ? ? ? ? ? ? ? ?})) > > I get this error: "Error in as.Date.numeric(value) : 'origin' must be > supplied" so I'm not sure if it does what I need it to do. ?In addition to > this the vector lengths won't match up as the first date in the sequence > won't be subtracted from itself. > > I'm not sure if anyone knows an easier way to achieve this. > > Thanks for the help, > Natalie > > > > > ----- > Natalie Van Zuydam > > PhD Student > University of Dundee > nvanzuydam at dundee.ac.uk > -- > View this message in context: http://r.789695.n4.nabble.com/within-group-sequential-subtraction-tp3346033p3346033.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?