Hi Everyone,
I would like to do sequential subtractions within a group so that I know the
time between separate observations for a group of individuals.
My data:
data <- structure(list(group = c("IND1", "IND1",
"IND2",
"IND2", "IND2", "IND3", "IND4",
"IND5",
"IND6", "IND6"), date_obs = structure(c(6468,
7063, 9981, 14186, 14372, 5129, 9767, 11168, 10243, 10647), class
"Date")), .Names = c("group",
"date_obs"), row.names = c(NA, 10L), class = "data.frame")
So I start with:
group date_obs
1 IND1 1987-09-17
2 IND1 1989-05-04
3 IND2 1997-04-30
4 IND2 2008-11-03
5 IND2 2009-05-08
6 IND3 1984-01-17
7 IND4 1996-09-28
8 IND5 2000-07-30
9 IND6 1998-01-17
10 IND6 1999-02-25
what I would like:
group date_obs time
1 IND1 1987-09-17 NA
2 IND1 1989-05-04 595
3 IND2 1997-04-30 NA
4 IND2 2008-11-03 4205
5 IND2 2009-05-08 186
6 IND3 1984-01-17 NA
7 IND4 1996-09-28 NA
8 IND5 2000-07-30 NA
9 IND6 1998-01-17 NA
10 IND6 1999-02-25 404
So that if there is one entry/individual a 0/NA would be acceptable and if
there is more than one entry/individual the sequential difference would be
calculated.
I started with some code but it I cannot edit it appropriately.
x <- do.call(rbind, lapply(split(data, data$group),
function(dat) {
dat <- dat[order(dat$date_obs), ]
d<-diff(dat$date_obs)
dat <- rbind(dat,d)
}))
I get this error: "Error in as.Date.numeric(value) : 'origin' must
be
supplied" so I'm not sure if it does what I need it to do. In addition
to
this the vector lengths won't match up as the first date in the sequence
won't be subtracted from itself.
I'm not sure if anyone knows an easier way to achieve this.
Thanks for the help,
Natalie
-----
Natalie Van Zuydam
PhD Student
University of Dundee
nvanzuydam at dundee.ac.uk
--
View this message in context:
http://r.789695.n4.nabble.com/within-group-sequential-subtraction-tp3346033p3346033.html
Sent from the R help mailing list archive at Nabble.com.
Dear Natalie,
I am sure there are other ways, but one way you can do this is by
applying diff() to each group using tapply() or by(). Because those
return lists, if you want to add it back into your data frame, you can
wrap the whole call in unlist(). Here is an example:
dat <- structure(list(group = c("IND1", "IND1",
"IND2",
"IND2", "IND2", "IND3", "IND4",
"IND5",
"IND6", "IND6"), date_obs = structure(c(6468,
7063, 9981, 14186, 14372, 5129, 9767, 11168, 10243, 10647), class
"Date")), .Names = c("group",
"date_obs"), row.names = c(NA, 10L), class = "data.frame")
## calculate differences using diff() by each group
## note the prepended NA
dat$time <- unlist(tapply(dat$date_obs, dat$group,
function(x) {diff(c(NA, x))}))
dat ## updated data frame
HTH,
Josh
On Thu, Mar 10, 2011 at 6:56 AM, natalie.vanzuydam <nvanzuydam at
gmail.com> wrote:> Hi Everyone,
>
> I would like to do sequential subtractions within a group so that I know
the
> time between separate observations for a group of individuals.
>
> My data:
>
> data <- structure(list(group = c("IND1", "IND1",
"IND2",
> "IND2", "IND2", "IND3", "IND4",
"IND5",
> "IND6", "IND6"), date_obs = structure(c(6468,
> 7063, 9981, 14186, 14372, 5129, 9767, 11168, 10243, 10647), class >
"Date")), .Names = c("group",
> "date_obs"), row.names = c(NA, 10L), class =
"data.frame")
>
> So I start with:
>
> ?group ? date_obs
> 1 ? IND1 1987-09-17
> 2 ? IND1 1989-05-04
> 3 ? IND2 1997-04-30
> 4 ? IND2 2008-11-03
> 5 ? IND2 2009-05-08
> 6 ? IND3 1984-01-17
> 7 ? IND4 1996-09-28
> 8 ? IND5 2000-07-30
> 9 ? IND6 1998-01-17
> 10 ?IND6 1999-02-25
>
> what I would like:
>
> ?group ? date_obs ? ? time
> 1 ? IND1 1987-09-17 NA
> 2 ? IND1 1989-05-04 595
> 3 ? IND2 1997-04-30 NA
> 4 ? IND2 2008-11-03 4205
> 5 ? IND2 2009-05-08 186
> 6 ? IND3 1984-01-17 NA
> 7 ? IND4 1996-09-28 NA
> 8 ? IND5 2000-07-30 NA
> 9 ? IND6 1998-01-17 NA
> 10 ?IND6 1999-02-25 404
>
> So that if there is one entry/individual a 0/NA would be acceptable and if
> there is more than one entry/individual the sequential difference would be
> calculated.
>
> I started with some code but it I cannot edit it appropriately.
>
> x <- do.call(rbind, lapply(split(data, data$group),
> ? ? ? ?function(dat) {
> ? ? ? ? ? ? ? ? ? ? ? ?dat <- dat[order(dat$date_obs), ]
> ? ? ? ? ? ? ? ? ? ? ? ?d<-diff(dat$date_obs)
> ? ? ? ? ? ? ? ? ? ? ? ? dat <- rbind(dat,d)
> ? ? ? ? ? ? ? ? ? ? ? ?}))
>
> I get this error: "Error in as.Date.numeric(value) : 'origin'
must be
> supplied" so I'm not sure if it does what I need it to do. ?In
addition to
> this the vector lengths won't match up as the first date in the
sequence
> won't be subtracted from itself.
>
> I'm not sure if anyone knows an easier way to achieve this.
>
> Thanks for the help,
> Natalie
>
>
>
>
> -----
> Natalie Van Zuydam
>
> PhD Student
> University of Dundee
> nvanzuydam at dundee.ac.uk
> --
> View this message in context:
http://r.789695.n4.nabble.com/within-group-sequential-subtraction-tp3346033p3346033.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/
Try this:> data$diff <- ave(as.numeric(data$date_obs), data$group, FUN=function(x)c(NA, diff(x))) > datagroup date_obs diff 1 IND1 1987-09-17 NA 2 IND1 1989-05-04 595 3 IND2 1997-04-30 NA 4 IND2 2008-11-03 4205 5 IND2 2009-05-08 186 6 IND3 1984-01-17 NA 7 IND4 1996-09-28 NA 8 IND5 2000-07-30 NA 9 IND6 1998-01-17 NA 10 IND6 1999-02-25 404>On Thu, Mar 10, 2011 at 9:56 AM, natalie.vanzuydam <nvanzuydam at gmail.com> wrote:> Hi Everyone, > > I would like to do sequential subtractions within a group so that I know the > time between separate observations for a group of individuals. > > My data: > > data <- structure(list(group = c("IND1", "IND1", "IND2", > "IND2", "IND2", "IND3", "IND4", "IND5", > "IND6", "IND6"), date_obs = structure(c(6468, > 7063, 9981, 14186, 14372, 5129, 9767, 11168, 10243, 10647), class > "Date")), .Names = c("group", > "date_obs"), row.names = c(NA, 10L), class = "data.frame") > > So I start with: > > ?group ? date_obs > 1 ? IND1 1987-09-17 > 2 ? IND1 1989-05-04 > 3 ? IND2 1997-04-30 > 4 ? IND2 2008-11-03 > 5 ? IND2 2009-05-08 > 6 ? IND3 1984-01-17 > 7 ? IND4 1996-09-28 > 8 ? IND5 2000-07-30 > 9 ? IND6 1998-01-17 > 10 ?IND6 1999-02-25 > > what I would like: > > ?group ? date_obs ? ? time > 1 ? IND1 1987-09-17 NA > 2 ? IND1 1989-05-04 595 > 3 ? IND2 1997-04-30 NA > 4 ? IND2 2008-11-03 4205 > 5 ? IND2 2009-05-08 186 > 6 ? IND3 1984-01-17 NA > 7 ? IND4 1996-09-28 NA > 8 ? IND5 2000-07-30 NA > 9 ? IND6 1998-01-17 NA > 10 ?IND6 1999-02-25 404 > > So that if there is one entry/individual a 0/NA would be acceptable and if > there is more than one entry/individual the sequential difference would be > calculated. > > I started with some code but it I cannot edit it appropriately. > > x <- do.call(rbind, lapply(split(data, data$group), > ? ? ? ?function(dat) { > ? ? ? ? ? ? ? ? ? ? ? ?dat <- dat[order(dat$date_obs), ] > ? ? ? ? ? ? ? ? ? ? ? ?d<-diff(dat$date_obs) > ? ? ? ? ? ? ? ? ? ? ? ? dat <- rbind(dat,d) > ? ? ? ? ? ? ? ? ? ? ? ?})) > > I get this error: "Error in as.Date.numeric(value) : 'origin' must be > supplied" so I'm not sure if it does what I need it to do. ?In addition to > this the vector lengths won't match up as the first date in the sequence > won't be subtracted from itself. > > I'm not sure if anyone knows an easier way to achieve this. > > Thanks for the help, > Natalie > > > > > ----- > Natalie Van Zuydam > > PhD Student > University of Dundee > nvanzuydam at dundee.ac.uk > -- > View this message in context: http://r.789695.n4.nabble.com/within-group-sequential-subtraction-tp3346033p3346033.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?