thr3ads.net - R help - [R] Dividing rows when time is overlapping [Dec 2011]

If this information is useful, please help other people find it:
Share via:

PEL

2011-Dec-07 20:37 UTC

[R] Dividing rows when time is overlapping

Hi all,

I have dataframe that was created from the fusion of two dataframes. Both
spanned over the same time intervall but contained different information.
When I put them together, the info overlapped since there is no holes in the
time interval of one of the dataframe. Here is an example where the rows
"sp=A and B" are part of a first df and the rows "sp=C" come
from a second.
The first dataframe is continuous but the second consists of sporadic
events. The final dataframe looks like this: 

start                               end                                 sp
2010-06-01 17:00:00    2010-06-01 19:30:00         A
2010-06-01 19:30:01    2010-06-01 20:00:00         B
2010-06-01 19:45:00    2010-06-01 19:55:00         C
2010-06-01 20:00:01    2010-06-01 20:30:00         A
2010-06-01 20:05:00    2010-06-01 20:10:00         C
2010-06-01 20:12:00    2010-06-01 20:15:00         C
2010-06-01 20:30:01    2010-06-01 20:40:00         B
2010-06-01 20:35:00    2010-06-01 20:40:10         C
2010-06-01 20:40:01    2010-06-01 20:50:00         A

I would like to prioritize "C" so when it overlaps the time interval
of
another "sp", the time interval of "A" or "B" is
cut accordingly. As seen in
the example, I sometimes have multiple events of "C" that overlap a
single
event of "A" or "B". The result would be this:

start                               end                                 sp
2010-06-01 17:00:00    2010-06-01 19:30:00         A
2010-06-01 19:30:01    2010-06-01 19:44:59         B
2010-06-01 19:45:00    2010-06-01 19:55:00         C
2010-06-01 19:55:01    2010-06-01 20:00:00         B
2010-06-01 20:00:01    2010-06-01 20:04:59         A
2010-06-01 20:05:00    2010-06-01 20:10:00         C
2010-06-01 20:10:01    2010-06-01 20:11:59         A
2010-06-01 20:12:00    2010-06-01 20:15:00         C
2010-06-01 20:15:01    2010-06-01 20:30:00         A
2010-06-01 20:30:01    2010-06-01 20:34:59         B
2010-06-01 20:35:00    2010-06-01 20:40:10         C
2010-06-01 20:40:11    2010-06-01 20:50:00         A

My date/time columns are in POSIXct. Don't hesitate to ask if something is
unclear.

Thanks in advance


--
View this message in context:
r.789695.n4.nabble.com/Dividing-rows-when-time-is-overlapping-tp4170428p4170428.html
Sent from the R help mailing list archive at Nabble.com.

Jean V Adams

2011-Dec-12 13:15 UTC

head link

[R] Dividing rows when time is overlapping

PEL wrote on 12/07/2011 02:37:42 PM:
> Hi all,
> 
> I have dataframe that was created from the fusion of two dataframes. 
Both> spanned over the same time intervall but contained different 
information.> When I put them together, the info overlapped since there is no holes in 
the> time interval of one of the dataframe. Here is an example where the rows
> "sp=A and B" are part of a first df and the rows "sp=C"
come from a
second.> The first dataframe is continuous but the second consists of sporadic
> events. The final dataframe looks like this: 
> 
> start                               end sp
> 2010-06-01 17:00:00    2010-06-01 19:30:00         A
> 2010-06-01 19:30:01    2010-06-01 20:00:00         B
> 2010-06-01 19:45:00    2010-06-01 19:55:00         C
> 2010-06-01 20:00:01    2010-06-01 20:30:00         A
> 2010-06-01 20:05:00    2010-06-01 20:10:00         C
> 2010-06-01 20:12:00    2010-06-01 20:15:00         C
> 2010-06-01 20:30:01    2010-06-01 20:40:00         B
> 2010-06-01 20:35:00    2010-06-01 20:40:10         C
> 2010-06-01 20:40:01    2010-06-01 20:50:00         A
> 
> I would like to prioritize "C" so when it overlaps the time
interval of
> another "sp", the time interval of "A" or "B"
is cut accordingly. As
seen in> the example, I sometimes have multiple events of "C" that overlap
a
single> event of "A" or "B". The result would be this:
> 
> start                               end sp
> 2010-06-01 17:00:00    2010-06-01 19:30:00         A
> 2010-06-01 19:30:01    2010-06-01 19:44:59         B
> 2010-06-01 19:45:00    2010-06-01 19:55:00         C
> 2010-06-01 19:55:01    2010-06-01 20:00:00         B
> 2010-06-01 20:00:01    2010-06-01 20:04:59         A
> 2010-06-01 20:05:00    2010-06-01 20:10:00         C
> 2010-06-01 20:10:01    2010-06-01 20:11:59         A
> 2010-06-01 20:12:00    2010-06-01 20:15:00         C
> 2010-06-01 20:15:01    2010-06-01 20:30:00         A
> 2010-06-01 20:30:01    2010-06-01 20:34:59         B
> 2010-06-01 20:35:00    2010-06-01 20:40:10         C
> 2010-06-01 20:40:11    2010-06-01 20:50:00         A
> 
> My date/time columns are in POSIXct. Don't hesitate to ask if something
is> unclear.
> 
> Thanks in advance

The code below isn't pretty, but it works, at least on the example you 
provided.

It's helpful to provide your example data as working code, for example 
using the function dput().

Jean


df1 <- structure(list(start = structure(c(1275429600, 1275438601, 
1275439500, 
1275440401, 1275440700, 1275441120, 1275442201, 1275442500, 1275442801), 
class = c("POSIXct", "POSIXt")), end =
structure(c(1275438600, 1275440400,

1275440100, 1275442200, 1275441000, 1275441300, 1275442800, 1275442810, 
1275443400), class = c("POSIXct", "POSIXt")), sp =
c("A",
"B", "C", "A", "C", "C",
"B", "C", "A")), .Names = c("start",
"end", "sp"), row.names = c(NA, -9L), class =
"data.frame")

# rearrange data so that all of the times are in one column
len1 <- dim(df1)[1] 
df2 <- data.frame(time=c(df1$start, df1$end), point=rep(c("start",
"end"),
c(len1, len1)), letter=c(df1$sp, df1$sp))
df2 <- df2[order(df2$time), ]

# create a new variable that indicates what long-segment (A or B) other 
sub-segments (C) are in
len2 <- dim(df2)[1]
df2$within <- df2$letter
last <- df2$letter[1]
for(i in 2:len2) if(df2$letter[i]=="C") df2$within[i] <- last else
last <-
df2$letter[i]

# for every sub-segment start, add a new long-segment end 1-second before 
it
newABends <- df2[df2$point=="start" &
df2$letter=="C", ]
newABends$time <- newABends$time - 1
newABends$point <- "end"
newABends$letter <- newABends$within

# for every sub-segment end, add a new long-segment start 1-second after 
it
newABstarts <- df2[df2$point=="end" &
df2$letter=="C", ]
newABstarts$time <- newABstarts$time + 1
newABstarts$point <- "start"
newABstarts$letter <- newABstarts$within

# combine the original data with the new long-segment starts and ends
df3 <- rbind(df2, newABends, newABstarts)
df3 <- df3[order(df3$time), ]

# get rid of any long-segment bits within sub-segments
len3 <- dim(df3)[1]
startC <- seq(from=len3)[df3$point=="start" &
df3$letter=="C"]
endC <- seq(from=len3)[df3$point=="end" &
df3$letter=="C"]
startendC <- lapply(seq(along=startC), function(i) seq(startC[i], 
endC[i]))
remove.rows <- unlist(lapply(startendC, function(x) x[-c(1, length(x))]))
df4 <- df3[-remove.rows, ]

# rearrange data so that start and end times are in different columns
df4s <- df4[df4$point=="start", c("time")]
df4e <- df4[df4$point=="end", c("time",
"letter")]
names(df4s) <- c("start")
names(df4e) <- c("end", "letter")
cbind(df4s, df4e)
	[[alternative HTML version deleted]]

PEL

2011-Dec-13 17:59 UTC

head link

[R] Dividing rows when time is overlapping

Thank you for your answer. It helps a lot and is very appreciated.
Also, thanks for the advice about dput(), I was wondering how to present my
data or code easily.

--
View this message in context:
r.789695.n4.nabble.com/Dividing-rows-when-time-is-overlapping-tp4170428p4191334.html
Sent from the R help mailing list archive at Nabble.com.

Apparently Analagous Threads

Search for more possibly parallel threads

R help - Dec 2011 - Dividing rows when time is overlapping

[R] Dividing rows when time is overlapping

[R] Dividing rows when time is overlapping

[R] Dividing rows when time is overlapping

Apparently Analagous Threads