PEL wrote on 12/07/2011 02:37:42 PM:
> Hi all,
>
> I have dataframe that was created from the fusion of two dataframes.
Both> spanned over the same time intervall but contained different
information.> When I put them together, the info overlapped since there is no holes in
the> time interval of one of the dataframe. Here is an example where the rows
> "sp=A and B" are part of a first df and the rows "sp=C"
come from a
second.> The first dataframe is continuous but the second consists of sporadic
> events. The final dataframe looks like this:
>
> start end sp
> 2010-06-01 17:00:00 2010-06-01 19:30:00 A
> 2010-06-01 19:30:01 2010-06-01 20:00:00 B
> 2010-06-01 19:45:00 2010-06-01 19:55:00 C
> 2010-06-01 20:00:01 2010-06-01 20:30:00 A
> 2010-06-01 20:05:00 2010-06-01 20:10:00 C
> 2010-06-01 20:12:00 2010-06-01 20:15:00 C
> 2010-06-01 20:30:01 2010-06-01 20:40:00 B
> 2010-06-01 20:35:00 2010-06-01 20:40:10 C
> 2010-06-01 20:40:01 2010-06-01 20:50:00 A
>
> I would like to prioritize "C" so when it overlaps the time
interval of
> another "sp", the time interval of "A" or "B"
is cut accordingly. As
seen in> the example, I sometimes have multiple events of "C" that overlap
a
single> event of "A" or "B". The result would be this:
>
> start end sp
> 2010-06-01 17:00:00 2010-06-01 19:30:00 A
> 2010-06-01 19:30:01 2010-06-01 19:44:59 B
> 2010-06-01 19:45:00 2010-06-01 19:55:00 C
> 2010-06-01 19:55:01 2010-06-01 20:00:00 B
> 2010-06-01 20:00:01 2010-06-01 20:04:59 A
> 2010-06-01 20:05:00 2010-06-01 20:10:00 C
> 2010-06-01 20:10:01 2010-06-01 20:11:59 A
> 2010-06-01 20:12:00 2010-06-01 20:15:00 C
> 2010-06-01 20:15:01 2010-06-01 20:30:00 A
> 2010-06-01 20:30:01 2010-06-01 20:34:59 B
> 2010-06-01 20:35:00 2010-06-01 20:40:10 C
> 2010-06-01 20:40:11 2010-06-01 20:50:00 A
>
> My date/time columns are in POSIXct. Don't hesitate to ask if something
is> unclear.
>
> Thanks in advance
The code below isn't pretty, but it works, at least on the example you
provided.
It's helpful to provide your example data as working code, for example
using the function dput().
Jean
df1 <- structure(list(start = structure(c(1275429600, 1275438601,
1275439500,
1275440401, 1275440700, 1275441120, 1275442201, 1275442500, 1275442801),
class = c("POSIXct", "POSIXt")), end =
structure(c(1275438600, 1275440400,
1275440100, 1275442200, 1275441000, 1275441300, 1275442800, 1275442810,
1275443400), class = c("POSIXct", "POSIXt")), sp =
c("A",
"B", "C", "A", "C", "C",
"B", "C", "A")), .Names = c("start",
"end", "sp"), row.names = c(NA, -9L), class =
"data.frame")
# rearrange data so that all of the times are in one column
len1 <- dim(df1)[1]
df2 <- data.frame(time=c(df1$start, df1$end), point=rep(c("start",
"end"),
c(len1, len1)), letter=c(df1$sp, df1$sp))
df2 <- df2[order(df2$time), ]
# create a new variable that indicates what long-segment (A or B) other
sub-segments (C) are in
len2 <- dim(df2)[1]
df2$within <- df2$letter
last <- df2$letter[1]
for(i in 2:len2) if(df2$letter[i]=="C") df2$within[i] <- last else
last <-
df2$letter[i]
# for every sub-segment start, add a new long-segment end 1-second before
it
newABends <- df2[df2$point=="start" &
df2$letter=="C", ]
newABends$time <- newABends$time - 1
newABends$point <- "end"
newABends$letter <- newABends$within
# for every sub-segment end, add a new long-segment start 1-second after
it
newABstarts <- df2[df2$point=="end" &
df2$letter=="C", ]
newABstarts$time <- newABstarts$time + 1
newABstarts$point <- "start"
newABstarts$letter <- newABstarts$within
# combine the original data with the new long-segment starts and ends
df3 <- rbind(df2, newABends, newABstarts)
df3 <- df3[order(df3$time), ]
# get rid of any long-segment bits within sub-segments
len3 <- dim(df3)[1]
startC <- seq(from=len3)[df3$point=="start" &
df3$letter=="C"]
endC <- seq(from=len3)[df3$point=="end" &
df3$letter=="C"]
startendC <- lapply(seq(along=startC), function(i) seq(startC[i],
endC[i]))
remove.rows <- unlist(lapply(startendC, function(x) x[-c(1, length(x))]))
df4 <- df3[-remove.rows, ]
# rearrange data so that start and end times are in different columns
df4s <- df4[df4$point=="start", c("time")]
df4e <- df4[df4$point=="end", c("time",
"letter")]
names(df4s) <- c("start")
names(df4e) <- c("end", "letter")
cbind(df4s, df4e)
[[alternative HTML version deleted]]