Marie-Pierre Sylvestre
2010-Mar-24 15:27 UTC
[R] Converting a data set from 'long' format to 'interval' format
Hi, I have a data set in which the variable 'dose' is time-varying. Currently, the data set is in a long format, with 1 row for each time unit of follow-up for each individual "Id". It looks like this: orig.data <- cbind(Id = c(rep(1,4), rep(2,5)), time = c(1:4, 1:5), dose c(1,1,1,0,1,0,1,1,0)) orig.data Id time dose [1,] 1 1 1 [2,] 1 2 1 [3,] 1 3 1 [4,] 1 4 0 [5,] 2 1 1 [6,] 2 2 0 [7,] 2 3 1 [8,] 2 4 1 [9,] 2 5 0 What I would like to do is to convert the data set into an interval format. By that I mean a data set in which each row has a 'Start' and a 'Stop' value that indicates the time units in which the 'dose' is constant. For example, my orig.data example would now be: int.data <- cbind(Id = c(rep(1,2), rep(2,4)), Start = c(1,4,1,2,3,5), Stop = c(3,4,1,2,4,5), dose = c(1,0,1,0,1,0)) int.data Id Start Stop dose [1,] 1 1 3 1 [2,] 1 4 4 0 [3,] 2 1 1 1 [4,] 2 2 2 0 [5,] 2 3 4 1 [6,] 2 5 5 0 Basically, this implies collapsing rows that have the same "Id" and "dose" and creating "Start" and "Stop" to index the time. While I can write a clumsy routine with multiple loops to do it, it will be inefficient and will not work for large data set. I wonder if people know of a function that would reshape my data set from 'long' to 'interval'? Best, MP [[alternative HTML version deleted]]
Henrique Dallazuanna
2010-Mar-24 20:47 UTC
[R] Converting a data set from 'long' format to 'interval' format
Try this: foo <- function(x) { data.frame(Id = unique(x$Id), rbind(range(x$time)), dose = unique(x$dose)) } t(sapply(split(as.data.frame(orig.data), with(rle(orig.data[,'dose']), rep(seq_along(lengths), lengths))), foo)) On Wed, Mar 24, 2010 at 12:27 PM, Marie-Pierre Sylvestre <mp.sylvestre at gmail.com> wrote:> Hi, > > I have a data set in which the variable 'dose' is time-varying. Currently, > the data set is in a long format, with 1 row for each time unit of follow-up > for each individual "Id". It looks like this: > > > orig.data <- cbind(Id = c(rep(1,4), rep(2,5)), time = c(1:4, 1:5), dose > c(1,1,1,0,1,0,1,1,0)) > > orig.data > ? ? ?Id time dose > ?[1,] ?1 ? ?1 ? ?1 > ?[2,] ?1 ? ?2 ? ?1 > ?[3,] ?1 ? ?3 ? ?1 > ?[4,] ?1 ? ?4 ? ?0 > ?[5,] ?2 ? ?1 ? ?1 > ?[6,] ?2 ? ?2 ? ?0 > ?[7,] ?2 ? ?3 ? ?1 > ?[8,] ?2 ? ?4 ? ?1 > ?[9,] ?2 ? ?5 ? ?0 > > What I would like to do is to convert the data set into an interval format. > By that I mean a data set in which each row has a 'Start' and a 'Stop' value > that indicates the time units in which the 'dose' is constant. For example, > my orig.data example would now be: > > int.data <- ?cbind(Id = c(rep(1,2), rep(2,4)), Start = c(1,4,1,2,3,5), Stop > = c(3,4,1,2,4,5), dose = c(1,0,1,0,1,0)) > > int.data > ? ? Id Start Stop dose > [1,] ?1 ? ? 1 ? ?3 ? ?1 > [2,] ?1 ? ? 4 ? ?4 ? ?0 > [3,] ?2 ? ? 1 ? ?1 ? ?1 > [4,] ?2 ? ? 2 ? ?2 ? ?0 > [5,] ?2 ? ? 3 ? ?4 ? ?1 > [6,] ?2 ? ? 5 ? ?5 ? ?0 > > Basically, this implies collapsing rows that have the same "Id" and "dose" > and creating "Start" and "Stop" to index the time. > > While I can write a clumsy routine with multiple loops to do it, it will be > inefficient and will not work for large data set. > > I wonder if people know of a function that would reshape my data set from > 'long' to 'interval'? > > Best, > > MP > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paran?-Brasil 25? 25' 40" S 49? 16' 22" O