thr3ads.net - R help - [R] Merge with closest (not equal) time stamps [Aug 2023]

If this information is useful, please help other people find it:
Share via:

Naresh Gurbuxani

2023-Aug-08 02:31 UTC

[R] Merge with closest (not equal) time stamps

I have two dataframes, each with a column for timestamp.  I want to
merge the two dataframes such that each row from first dataframe
is matched with the row in the second dataframe with most recent but
preceding timestamp. Here is an example.

option.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07
10:23:22", "2023-08-07 10:25:33", "2023-08-07
10:28:41")), option.price = c(2.5, 2.7, 1.8))

stock.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07
10:23:21", "2023-08-07 10:23:34", "2023-08-07
10:24:57", "2023-08-07 10:28:37", "2023-08-07
10:29:01")), stock.price = c(102.2, 102.9, 103.1, 101.8, 101.7))

stock.trades <- stock.trades[order(stock.trades$timestamp),]

library(plyr)
mystock.prices <- ldply(option.trades$timestamp, function(tstamp)
tail(subset(stock.trades, timestamp <= tstamp), 1))
names(mystock.prices)[1] <- "stock.timestamp"
myres <- cbind(option.trades, mystock.prices)

This method works. But for large dataframes, it is very slow.  Is there
a way to speed up the merge?

Thanks,
Naresh

Eric Berger

2023-Aug-08 09:39 UTC

head link

[R] Merge with closest (not equal) time stamps

Hi Naresh,
Perhaps the below is faster than your approach

library(dplyr)
library(tidyr)
merge(option.trades, stock.trades, by="timestamp", all=TRUE) |>
  dplyr::arrange(timestamp) |>
  dplyr::mutate(stock.timestamp as.POSIXct(ifelse(is.na(option.price),
timestamp, NA))) |>
  tidyr::fill(stock.price, stock.timestamp) |>
  dplyr::filter(!is.na(option.price)) |>
  dplyr::select(1,2,4,3) ->
  myres2

identical(myres, myres2)  ## TRUE


On Tue, Aug 8, 2023 at 5:32?AM Naresh Gurbuxani
<naresh_gurbuxani at hotmail.com> wrote:>
>
> I have two dataframes, each with a column for timestamp.  I want to
> merge the two dataframes such that each row from first dataframe
> is matched with the row in the second dataframe with most recent but
> preceding timestamp. Here is an example.
>
> option.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07
10:23:22", "2023-08-07 10:25:33", "2023-08-07
10:28:41")), option.price = c(2.5, 2.7, 1.8))
>
> stock.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07
10:23:21", "2023-08-07 10:23:34", "2023-08-07
10:24:57", "2023-08-07 10:28:37", "2023-08-07
10:29:01")), stock.price = c(102.2, 102.9, 103.1, 101.8, 101.7))
>
> stock.trades <- stock.trades[order(stock.trades$timestamp),]
>
> library(plyr)
> mystock.prices <- ldply(option.trades$timestamp, function(tstamp)
tail(subset(stock.trades, timestamp <= tstamp), 1))
> names(mystock.prices)[1] <- "stock.timestamp"
> myres <- cbind(option.trades, mystock.prices)
>
> This method works. But for large dataframes, it is very slow.  Is there
> a way to speed up the merge?
>
> Thanks,
> Naresh
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Enrico Schumann

2023-Aug-08 11:34 UTC

head link

[R] Merge with closest (not equal) time stamps

On Mon, 07 Aug 2023, Naresh Gurbuxani writes:
> I have two dataframes, each with a column for timestamp.  I want to
> merge the two dataframes such that each row from first dataframe
> is matched with the row in the second dataframe with most recent but
> preceding timestamp. Here is an example.
>
> option.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07
10:23:22", "2023-08-07 10:25:33", "2023-08-07
10:28:41")), option.price = c(2.5, 2.7, 1.8))
>
> stock.trades <- data.frame(timestamp > as.POSIXct(c("2023-08-07
10:23:21", "2023-08-07
> 10:23:34", "2023-08-07 10:24:57", "2023-08-07
> 10:28:37", "2023-08-07 10:29:01")), stock.price >
c(102.2, 102.9, 103.1, 101.8, 101.7))
>
> stock.trades <- stock.trades[order(stock.trades$timestamp),]
>
> library(plyr)
> mystock.prices <- ldply(option.trades$timestamp, function(tstamp)
tail(subset(stock.trades, timestamp <= tstamp), 1))
> names(mystock.prices)[1] <- "stock.timestamp"
> myres <- cbind(option.trades, mystock.prices)
>
> This method works. But for large dataframes, it is very slow.  Is there
> a way to speed up the merge?
>
> Thanks,
> Naresh
>
If the timestamps are sorted (or you can sort them),
function ?findInterval might be helpful:

    i <- findInterval(option.trades$timestamp, stock.trades$timestamp)
    cbind(option.trades, stock.trades[i, ])
    ##             timestamp option.price           timestamp stock.price
    ## 1 2023-08-07 10:23:22          2.5 2023-08-07 10:23:21       102.2
    ## 3 2023-08-07 10:25:33          2.7 2023-08-07 10:24:57       103.1
    ## 4 2023-08-07 10:28:41          1.8 2023-08-07 10:28:37       101.8



-- 
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net

Hadley Wickham

2023-Aug-09 12:39 UTC

head link

[R] Merge with closest (not equal) time stamps

It sounds like you might want a rolling join, e.g.
https://dplyr.tidyverse.org/reference/join_by.html#rolling-joins.

(And data.table has similar functionality which inspired dplyr)

Hadley

On Mon, Aug 7, 2023 at 9:32?PM Naresh Gurbuxani
<naresh_gurbuxani at hotmail.com> wrote:>
>
> I have two dataframes, each with a column for timestamp.  I want to
> merge the two dataframes such that each row from first dataframe
> is matched with the row in the second dataframe with most recent but
> preceding timestamp. Here is an example.
>
> option.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07
10:23:22", "2023-08-07 10:25:33", "2023-08-07
10:28:41")), option.price = c(2.5, 2.7, 1.8))
>
> stock.trades <- data.frame(timestamp = as.POSIXct(c("2023-08-07
10:23:21", "2023-08-07 10:23:34", "2023-08-07
10:24:57", "2023-08-07 10:28:37", "2023-08-07
10:29:01")), stock.price = c(102.2, 102.9, 103.1, 101.8, 101.7))
>
> stock.trades <- stock.trades[order(stock.trades$timestamp),]
>
> library(plyr)
> mystock.prices <- ldply(option.trades$timestamp, function(tstamp)
tail(subset(stock.trades, timestamp <= tstamp), 1))
> names(mystock.prices)[1] <- "stock.timestamp"
> myres <- cbind(option.trades, mystock.prices)
>
> This method works. But for large dataframes, it is very slow.  Is there
> a way to speed up the merge?
>
> Thanks,
> Naresh
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
http://hadley.nz

R help - Aug 2023 - Merge with closest (not equal) time stamps

[R] Merge with closest (not equal) time stamps

[R] Merge with closest (not equal) time stamps

[R] Merge with closest (not equal) time stamps

[R] Merge with closest (not equal) time stamps