Marie-Pierre Sylvestre
2010-Mar-24 15:27 UTC
[R] Converting a data set from 'long' format to 'interval' format
Hi,
I have a data set in which the variable 'dose' is time-varying.
Currently,
the data set is in a long format, with 1 row for each time unit of follow-up
for each individual "Id". It looks like this:
orig.data <- cbind(Id = c(rep(1,4), rep(2,5)), time = c(1:4, 1:5), dose
c(1,1,1,0,1,0,1,1,0))
orig.data
Id time dose
[1,] 1 1 1
[2,] 1 2 1
[3,] 1 3 1
[4,] 1 4 0
[5,] 2 1 1
[6,] 2 2 0
[7,] 2 3 1
[8,] 2 4 1
[9,] 2 5 0
What I would like to do is to convert the data set into an interval format.
By that I mean a data set in which each row has a 'Start' and a
'Stop' value
that indicates the time units in which the 'dose' is constant. For
example,
my orig.data example would now be:
int.data <- cbind(Id = c(rep(1,2), rep(2,4)), Start = c(1,4,1,2,3,5), Stop
= c(3,4,1,2,4,5), dose = c(1,0,1,0,1,0))
int.data
Id Start Stop dose
[1,] 1 1 3 1
[2,] 1 4 4 0
[3,] 2 1 1 1
[4,] 2 2 2 0
[5,] 2 3 4 1
[6,] 2 5 5 0
Basically, this implies collapsing rows that have the same "Id" and
"dose"
and creating "Start" and "Stop" to index the time.
While I can write a clumsy routine with multiple loops to do it, it will be
inefficient and will not work for large data set.
I wonder if people know of a function that would reshape my data set from
'long' to 'interval'?
Best,
MP
[[alternative HTML version deleted]]
Henrique Dallazuanna
2010-Mar-24 20:47 UTC
[R] Converting a data set from 'long' format to 'interval' format
Try this:
foo <- function(x) {
data.frame(Id = unique(x$Id), rbind(range(x$time)), dose = unique(x$dose))
}
t(sapply(split(as.data.frame(orig.data),
with(rle(orig.data[,'dose']), rep(seq_along(lengths),
lengths))),
foo))
On Wed, Mar 24, 2010 at 12:27 PM, Marie-Pierre Sylvestre
<mp.sylvestre at gmail.com> wrote:> Hi,
>
> I have a data set in which the variable 'dose' is time-varying.
Currently,
> the data set is in a long format, with 1 row for each time unit of
follow-up
> for each individual "Id". It looks like this:
>
>
> orig.data <- cbind(Id = c(rep(1,4), rep(2,5)), time = c(1:4, 1:5), dose
> c(1,1,1,0,1,0,1,1,0))
>
> orig.data
> ? ? ?Id time dose
> ?[1,] ?1 ? ?1 ? ?1
> ?[2,] ?1 ? ?2 ? ?1
> ?[3,] ?1 ? ?3 ? ?1
> ?[4,] ?1 ? ?4 ? ?0
> ?[5,] ?2 ? ?1 ? ?1
> ?[6,] ?2 ? ?2 ? ?0
> ?[7,] ?2 ? ?3 ? ?1
> ?[8,] ?2 ? ?4 ? ?1
> ?[9,] ?2 ? ?5 ? ?0
>
> What I would like to do is to convert the data set into an interval format.
> By that I mean a data set in which each row has a 'Start' and a
'Stop' value
> that indicates the time units in which the 'dose' is constant. For
example,
> my orig.data example would now be:
>
> int.data <- ?cbind(Id = c(rep(1,2), rep(2,4)), Start = c(1,4,1,2,3,5),
Stop
> = c(3,4,1,2,4,5), dose = c(1,0,1,0,1,0))
>
> int.data
> ? ? Id Start Stop dose
> [1,] ?1 ? ? 1 ? ?3 ? ?1
> [2,] ?1 ? ? 4 ? ?4 ? ?0
> [3,] ?2 ? ? 1 ? ?1 ? ?1
> [4,] ?2 ? ? 2 ? ?2 ? ?0
> [5,] ?2 ? ? 3 ? ?4 ? ?1
> [6,] ?2 ? ? 5 ? ?5 ? ?0
>
> Basically, this implies collapsing rows that have the same "Id"
and "dose"
> and creating "Start" and "Stop" to index the time.
>
> While I can write a clumsy routine with multiple loops to do it, it will be
> inefficient and will not work for large data set.
>
> I wonder if people know of a function that would reshape my data set from
> 'long' to 'interval'?
>
> Best,
>
> MP
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Henrique Dallazuanna
Curitiba-Paran?-Brasil
25? 25' 40" S 49? 16' 22" O