thr3ads.net - R help - [R] Comparing dates in two large data frames [Apr 2021]

If this information is useful, please help other people find it:
Share via:

Kulupp

2021-Apr-10 11:06 UTC

[R] Comparing dates in two large data frames

Dear all,

I have two data frames (df1 and df2) and for each timepoint in df1 I 
want to know: is it whithin any of the timespans in df2? The result 
(e.g. "no" or "yes" or 0 and 1) should be shown in a new
column of df1

Here is the code to create the two data frames (the size of the two data 
frames is approx. the same as in my original data frames):

# create data frame df1
ti1 <- seq.POSIXt(from=as.POSIXct("2020/01/01",
tz="UTC"),
to=as.POSIXct("2020/06/01", tz="UTC"), by="10
min")
df1 <- data.frame(Time=ti1)

# create data frame df2 with random timespans, i.e. start and end dates
start <- sort(sample(seq(as.POSIXct("2020/01/01",
tz="UTC"),
as.POSIXct("2020/06/01", tz="UTC"), by="1 mins"),
5000))
end?? <- start + 120
df2 <- data.frame(start=start, end=end)

Everything I tried (ifelse combined with sapply or for loops) has been 
very very very slow. Thus, I am looking for a reasonably fast solution.

Thanks a lot for any hint in advance !

Cheers,

Thomas

Rui Barradas

2021-Apr-10 12:47 UTC

head link

[R] Comparing dates in two large data frames

Hello,

The following solution seems to work and is fast, like findInterval is.
It first determines where in df2$start is each value of df1$Time. Then 
uses that index to see if those Times are not greater than the 
corresponding df$end.
I checked against a small subset of df1 and the results were right.


result <- logical(nrow(df1))
inx <- findInterval(df1$Time, df2$start)
not_zero <- inx != 0
result[not_zero] <- df1$Time[not_zero] <= df2$end[ inx[not_zero] ]


Hope this helps,

Rui Barradas


?s 12:06 de 10/04/21, Kulupp escreveu:> Dear all,
> 
> I have two data frames (df1 and df2) and for each timepoint in df1 I 
> want to know: is it whithin any of the timespans in df2? The result 
> (e.g. "no" or "yes" or 0 and 1) should be shown in a
new column of df1
> 
> Here is the code to create the two data frames (the size of the two data 
> frames is approx. the same as in my original data frames):
> 
> # create data frame df1
> ti1 <- seq.POSIXt(from=as.POSIXct("2020/01/01",
tz="UTC"),
> to=as.POSIXct("2020/06/01", tz="UTC"), by="10
min")
> df1 <- data.frame(Time=ti1)
> 
> # create data frame df2 with random timespans, i.e. start and end dates
> start <- sort(sample(seq(as.POSIXct("2020/01/01",
tz="UTC"),
> as.POSIXct("2020/06/01", tz="UTC"), by="1
mins"), 5000))
> end?? <- start + 120
> df2 <- data.frame(start=start, end=end)
> 
> Everything I tried (ifelse combined with sapply or for loops) has been 
> very very very slow. Thus, I am looking for a reasonably fast solution.
> 
> Thanks a lot for any hint in advance !
> 
> Cheers,
> 
> Thomas
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Apr 2021 - Comparing dates in two large data frames

[R] Comparing dates in two large data frames

[R] Comparing dates in two large data frames