Arnaud, I won't comment on other aspects but want to ask how sure you are that your data is guaranteed to have a single row reflecting a closing price at 18:59:59 exactly? It may be true for your data source. I note that markets technically close at 4:00 PM, New York time, but many have after-hours trading, and there are days it closes early (such as 1 PM) and times when trading is halted. Generally, you can get closing prices (or other data) from other reliable sources and you could choose to merge data from such a source in rather than calculating them from your data. If you do want to use your data, one suggestion is to use the LAST record in each grouping for a day. I find that easy to do in dplyr by having a column containing the date info except for the time, and another containing the time in a sortable format. You can then sort the data.frame by the date and then time and then group your data.frame by the date and and select only last record in each group and you have the last time, whatever that may be. Again, this may not apply in your case. As you note, you are planning on doing many things, one step at a time, and early stages can set up your data.frames in ways that make later stages easier to do. As one example, you could create a column in early stages that marks if the current row is a closing row or not. -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Arnaud Gaboury Sent: Wednesday, April 2, 2025 2:10 PM To: Ebert,Timothy Aaron <tebert at ufl.edu>; r-help at r-project.org Subject: Re: [R] join/merge two data frames On Wed, 2025-04-02 at 16:55 +0000, Ebert,Timothy Aaron wrote:> Your result data frame example makes no sense to me. The price and > executed_qty are the same for all symbols? > > To get it all into one data frame you need a common variable that is > used to join the data frames. > My guess is that all_trade_sample$symbol has equivalents to the > variables in token_close_sample. > > You need to pivot one of the tables, and then full join them. > Please look at the join functions. In base R it is "merge()" and in > dyply it is full_join(), left_join() and similar. > > I will pivot all_trade_sample to make all of the elements in "symbol" > into variables. > > I will do this in tidyverse. There are some parts of this that I > really like. However, it can as easily be done using base R. > Library(tidyr) > Library(dplyr) > token_close_long <- token_close_sample %>% > pivot_longer(cols = -time, names_to = "symbol", values_to > "close_price") > #I rename time in token_close_long so that it will be preserved > colnames(token_close_long) <- c("time_close", "symbol", > "close_price") > combined <- full_join(all_trade_sample,token_close_long, by="symbol") > > This generates errors. This is ok, but I need to know more about > all_trade_sample. Each symbol appears multiple times in > all_trade_sample. Each instance of "AAVEUSIX" has a different time > stamp. So maybe filter to eliminate all but the first (earliest) > time? > > The program looks something like this: > token_close_long <- token_close_sample %>% > pivot_longer(cols = -time, names_to = "symbol", values_to > "close_price") > > #I will rename time in token_close_long so that it will be preserved > colnames(token_close_long) <- c("time_close", "symbol", > "close_price") > combined <- full_join(all_trade_sample,token_close_long, by="symbol") > > filtered_trades <- all_trade_sample |> > group_by(symbol) |> > slice_min(time, n = 1)|> > ungroup() > > combined <- full_join(filtered_trades,token_close_long, by="symbol") > > I did not do something right, as there are four closing prices for > each symbol. However, the general approach should work even if it > needs a little modification to give the correct result. Your closing > price is only relevant based on some other price (probably buying > price, but could be opening price). > > TimThank you Tim for your answer. I will have a close look at it later today. But in short: - I will do some work on my data later in the script. The tibble I want is far from being the result. - Time of closing price is everyday the same: 18:59:59. I will compute returns, volatility, correlations etc so I need a common reference. Prices for the trades change and are given by the exchange. I don't need them, won't do anything. - Yes, symbols may be the common variable. - I feel comfortable with tibbles and tidyverse - All_trade_sample has the goal to fetch price (price of asset when the trade is done) and executed_qty. btc_price is needed at one point of my calculation, but will not be shown in the very final table. - Multiple lines for one only symbol means I made multiple trades of this asset during the same day. I will test your suggestions. Thank you again.> > -----Original Message----- > From: R-help <r-help-bounces at r-project.org> On Behalf Of Arnaud > Gaboury > Sent: Wednesday, April 2, 2025 6:20 AM > To: r-help at r-project.org > Subject: [R] join/merge two data frames > > [External Email] > > I work on a trading journal for a portfolio of crypto currencies. The > goal is to fetch from my account (binance exchange) the trades I have > done and daily closing prices of my assets. > The first part (getting the data from exchange) are in two parts. > > 1- get the daily closing prices of my assets. Here is a sample of my > data frame: > token_close_sample <- structure(list(time > structure(c(1734371999.999, 1734458399.999, 1734544799.999, > 1734631199.999), tzone = "", class = c("POSIXct", "POSIXt")), BTCUSDC > = c(107112.36, 107517.25, 104630.49, 98692.01 ), SUIUSDC = c(4.7252, > 4.6923, 4.7017, 4.2422), ENAUSDC = c(1.1862, 1.1412, 1.0928, 1.0256), > AAVEUSDC = c(388, 365.68, 373.15, 316.69 ), ETHUSDC = c(4034.74, > 3975.39, 3879.06, 3474.91), FTMUSDC = c(1.381, 1.3596, 1.2222, > 1.0445)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", > "data.frame")) > > the tibble looks like this: > time BTCUSDC SUIUSDC ENAUSDC AAVEUSDC ETHUSDC FTMUSDC > <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> > 1 2024-12-16 18:59:59 107112. 4.73 1.19 388 4035. > 1.38 > 2 2024-12-17 18:59:59 107517. 4.69 1.14 366. 3975. > 1.36 > 3 2024-12-18 18:59:59 104630. 4.70 1.09 373. 3879. > 1.22 > 4 2024-12-19 18:59:59 98692. 4.24 1.03 317. 3475. > 1.04 > > 2- get my trades. Here is a sample: > all_trade_sample <- structure(list(time = structure(c(1737335082.949, > 1737336735.697, 1738059550.671, 1738142709.422, 1738142709.422, > 1738169351.788 ), tzone = "UTC", class = c("POSIXct", "POSIXt")), > symbol = c("AAVEUSDC", "AAVEUSDC", "SUIUSDC", "AAVEUSDC", "AAVEUSDC", > "ETHUSDC"), executed_qty = c(866.666, -834.998, 67649.3, -0.393, - > 0.393, 36.1158), price = c(0.003005, 0.003131, 0.000038, 294.738321, > 294.738321, 0.03027), cummulative_quote_qty = c(262699.317950113, - > 263696.723173419, 263987.20719179, -115.83216, -115.83216, > 111456.491386979 > ), day = structure(c(20108, 20108, 20116, 20117, 20117, 20117 ), > class = "Date")), row.names = c(NA, -6L), class = c("tbl_df", "tbl", > "data.frame")) > > the tibble looks like this: > time symbol executed_qty price > cummulative_quote_qty day > <dttm> <chr> <dbl> <dbl> > <dbl> <date> > 1 2025-01-20 01:04:42 AAVEUSDC 867. 0.00300 > 262699. 2025-01-20 > 2 2025-01-20 01:32:15 AAVEUSDC -835. 0.00313 - > 263697. 2025-01-20 > 3 2025-01-28 10:19:10 SUIUSDC 67649. 0.000038 > 263987. 2025-01-28 > 4 2025-01-29 09:25:09 AAVEUSDC -0.393 295. > -116. 2025-01-29 > 5 2025-01-29 09:25:09 AAVEUSDC -0.393 295. > -116. 2025-01-29 > 6 2025-01-29 16:49:11 ETHUSDC 36.1 0.0303 > 111456. 2025-01-29 > > Now, to finalize, I want to get all the info in one data frame so I > can compute daily valuation (with potential trades, or in/out of > asset). > The finalized tibble should look something like this: > result <- structure(list(time = structure(c(1734371999.999, > 1734458399.999, 1734544799.999, 1734631199.999, 1737335082.949, > 1737336735.697, 1738059550.671, 1734721199, 1734807599, 1734893999, > 1734980399, 1735066799, 1735153199, 1735239599, 1735325999, > 1738142709.422, 1735412399, 1738142709.422, 1738169351.788, > 1735498799), tzone = "", class = c("POSIXct", "POSIXt")), BTCUSDC > c(107112.36, 107517.25, 104630.49, 98692.01,NA_real_, NA_real_, > 102000, 101500, 101700,100300,100400,102300,102300,103100, NA_real_, > 99800, NA_real_, NA_real_,NA_real_, 99900 ), SUIUSDC = c(4.7252, > 4.6923, 4.7017, 4.2422, NA_real_, NA_real_, 4.25, 4.26, 4.7, 4.65, > 4.52, 4.23, 4.17, 4.34, NA_real_, 4.52, NA_real_, NA_real_,NA_real_, > 4.44), ENAUSDC = c(1.1862, 1.1412, 1.0928, 1.0256, NA_real_, > NA_real_, 1.176, 1.16, 1.163, 1.183, 1.196, 1.165, 1.158, 1.142, > NA_real_, 1.196, NA_real_, NA_real_,NA_real_, 1.113), AAVEUSDC > c(388, 365.68, 373.15, 316.69, NA_real_, NA_real_, 102000, 101500, > 101700,100300,100400,102300,102300,103100, NA_real_, 99800, NA_real_, > NA_real_,NA_real_, 99900 ), ETHUSDC = c(4034.74, 3975.39, 3879.06, > 3474.91, NA_real_,NA_real_, 3420, 3410, 3412, 3367, 3388, 3355, 3374, > 3392, NA_real_, 3401, NA_real_, NA_real_,NA_real_, 3411), FTMUSDC > c(1.381,1.3596, 1.2222, 1.0445, NA_real_,NA_real_, 1.36, 1.368, > 1.342, 1.339, 1.436, 1.562, 1.53, 1.62, NA_real_, 1.31, NA_real_, > NA_real_,NA_real_, 1.58), executed_qty = c(NA_real_, NA_real_, > NA_real_, NA_real_, 230, 559, NA_real_, NA_real_,NA_real_, NA_real_, > NA_real_,NA_real_, NA_real_, NA_real_, 200, NA_real_, 700, 200, 350, > NA_real_), price = c( NA_real_, NA_real_, NA_real_, NA_real_, 2, 3, > NA_real_, NA_real_,NA_real_, NA_real_, NA_real_,NA_real_, NA_real_, > NA_real_, 10, NA_real_, 3, 4, 5, NA_real_) ), row.names = c(NA, - > 20L), class = c("tbl_df", "tbl", "data.frame")) > > The idea is to have: > - one line each day with closing price of my portfolio assets at > 18:59:59 > - one line for each trade where I shall write 'asset', 'price' and > 'executed_qty'. > > I spent quite some time trying to figure out how to do it, but > couldn't. I don't even know if it is possible. Of course, I don't > want to add anything (price, date, asset...) by hand. > Thank you for any help. >
On Wed, 2025-04-02 at 15:41 -0400, avi.e.gross at gmail.com wrote:> Arnaud, > > I won't comment on other aspects but want to ask how sure you are > that your data is guaranteed to have a single row reflecting a > closing price at 18:59:59 exactly? > > It may be true for your data source. I note that markets technically > close at 4:00 PM, New York time, but many have after-hours trading, > and there are days it closes early (such as 1 PM) and times when > trading is halted. > > Generally, you can get closing prices (or other data) from other > reliable sources and you could choose to merge data from such a > source in rather than calculating them from your data. If you do want > to use your data, one suggestion is to use the LAST record in each > grouping for a day. I find that easy to do in dplyr by having a > column containing the date info except for the time, and another > containing the time in a sortable format. You can then sort the > data.frame by the date and then time and then? group your data.frame? > by the date and? and select only last record in each group and you > have the last time, whatever that may be. > > Again, this may not apply in your case. As you note, you are planning > on doing many things, one step at a time, and early stages can set up > your data.frames in ways that make later stages easier to do. As one > example, you could create a column in early stages that marks if the > current row is a closing row or not.I work on the crypto currency market which in fact never close, even on weekends and Xmas. So each exchange fix its own closing time and there is NO official price. I decided to fix it at 19:00 UTC. I don't compute the closing price but fetch it from Binance exchange with the Binancer package[1]. Here is the command to get BTCUSDT (Bitcoin against USD) on 2024-12-16: klines <- binance_klines('BTCUSDT', interval = '6h', start_time as.POSIXct("2024-12-17", tz = 'UTC'), end_time = as.POSIXct("2024-12- 17", tz = 'UTC')) Result is: klines <- structure(list(open_time = structure(c(1734307200, 1734328800, 1734350400, 1734372000, 1734393600), class = c("POSIXct", "POSIXt" ), tzone = ""), open = c(104463.99, 105028.01, 103758, 107078.55, 106058.65), high = c(106648, 105420.76, 107195.58, 107793.07, 107000), low = c(104259.48, 103625.78, 103333, 105480.02, 105657.34 ), close = c(105028, 103757.99, 107078.55, 106058.66, 106817.43 ), volume = c(10236.165136, 5623.21583, 14589.713338, 10853.308436, 4992.89346), close_time = structure(c(1734328799.999, 1734350399.999, 1734371999.999, 1734393599.999, 1734415199.999), class = c("POSIXct", "POSIXt"), tzone = ""), quote_asset_volume = c(1077588575.22963, 588656892.646352, 1542338025.25396, 1156380907.50461, 531077258.225013 ), trades = c(1914225L, 1110218L, 3148766L, 1874845L, 1208364L ), taker_buy_base_asset_volume = c(5181.465886, 2387.39941, 7576.364568, 5203.869636, 2543.63848), taker_buy_quote_asset_volume c(545582484.666663, 250042657.490346, 800715095.504901, 554210661.585612, 270539317.70439 ), symbol = c("BTCUSDT", "BTCUSDT", "BTCUSDT", "BTCUSDT", "BTCUSDT" )), row.names = c(NA, -5L), class = c("data.table", "data.frame" ), .internal.selfref = <pointer: 0x60ac8ea98850>) [1]https://cran.r-project.org/web/packages/binancer/binancer.pdf> > -----Original Message----- > From: R-help <r-help-bounces at r-project.org> On Behalf Of Arnaud > Gaboury > Sent: Wednesday, April 2, 2025 2:10 PM > To: Ebert,Timothy Aaron <tebert at ufl.edu>; r-help at r-project.org > Subject: Re: [R] join/merge two data frames > > > > On Wed, 2025-04-02 at 16:55 +0000, Ebert,Timothy Aaron wrote: > > Your result data frame example makes no sense to me. The price and > > executed_qty are the same for all symbols? > > > > To get it all into one data frame you need a common variable that > > is > > used to join the data frames. > > My guess is that all_trade_sample$symbol has equivalents to the > > variables in token_close_sample. > > > > You need to pivot one of the tables, and then full join them. > > Please look at the join functions. In base R it is "merge()" and in > > dyply it is full_join(), left_join() and similar. > > > > I will pivot all_trade_sample to make all of the elements in > > "symbol" > > into variables. > > > > I will do this in tidyverse. There are some parts of this that I > > really like. However, it can as easily be done using base R. > > Library(tidyr) > > Library(dplyr) > > token_close_long <- token_close_sample %>% > > ? pivot_longer(cols = -time, names_to = "symbol", values_to > > "close_price") > > #I rename time in token_close_long so that it will be preserved > > colnames(token_close_long) <- c("time_close", "symbol", > > "close_price") > > combined <- full_join(all_trade_sample,token_close_long, > > by="symbol") > > > > This generates errors. This is ok, but I need to know more about > > all_trade_sample. Each symbol appears multiple times in > > all_trade_sample. Each instance of "AAVEUSIX" has a different time > > stamp. So maybe filter to eliminate all but the first (earliest) > > time? > > > > The program looks something like this: > > token_close_long <- token_close_sample %>% > > ? pivot_longer(cols = -time, names_to = "symbol", values_to > > "close_price") > > > > #I will rename time in token_close_long so that it will be > > preserved > > colnames(token_close_long) <- c("time_close", "symbol", > > "close_price") > > combined <- full_join(all_trade_sample,token_close_long, > > by="symbol") > > > > filtered_trades <- all_trade_sample |> > > ? group_by(symbol) |> > > ? slice_min(time, n = 1)|> > > ? ungroup() > > > > combined <- full_join(filtered_trades,token_close_long, > > by="symbol") > > > > I did not do something right, as there are four closing prices for > > each symbol. However, the general approach should work even if it > > needs a little modification to give the correct result. Your > > closing > > price is only relevant based on some other price (probably buying > > price, but could be opening price). > > > > Tim > > Thank you Tim for your answer. I will have a close look at it later > today. But in short: > - I will do some work on my data later in the script. The tibble I > want > is far from being the result. > - Time of closing price is everyday the same: 18:59:59. I will > compute > returns, volatility, correlations etc so I need a common reference. > Prices for the trades change and are given by the exchange. I don't > need them, won't do anything. > - Yes, symbols may be the common variable. > - I feel comfortable with tibbles and tidyverse > - All_trade_sample has the goal to fetch price (price of asset when > the > trade is done) and executed_qty. btc_price is needed at one point of > my > calculation, but will not be shown in the very final table. > - Multiple lines for one only symbol means I made multiple trades of > this asset during the same day. > > I will test your suggestions. > Thank you again. > > > > > > -----Original Message----- > > From: R-help <r-help-bounces at r-project.org> On Behalf Of Arnaud > > Gaboury > > Sent: Wednesday, April 2, 2025 6:20 AM > > To: r-help at r-project.org > > Subject: [R] join/merge two data frames > > > > [External Email] > > > > I work on a trading journal for a portfolio of crypto currencies. > > The > > goal is to fetch from my account (binance exchange) the trades I > > have > > done and daily closing prices of my assets. > > The first part (getting the data from exchange) are in two parts. > > > > 1- get the daily closing prices of my assets. Here is a sample of > > my > > data frame: > > token_close_sample <- structure(list(time > > structure(c(1734371999.999, 1734458399.999, 1734544799.999, > > 1734631199.999), tzone = "", class = c("POSIXct", "POSIXt")), > > BTCUSDC > > = c(107112.36, 107517.25, 104630.49, 98692.01 ), SUIUSDC > > c(4.7252, > > 4.6923, 4.7017, 4.2422), ENAUSDC = c(1.1862, 1.1412, 1.0928, > > 1.0256), > > AAVEUSDC = c(388, 365.68, 373.15, 316.69 ), ETHUSDC = c(4034.74, > > 3975.39, 3879.06, 3474.91), FTMUSDC = c(1.381, 1.3596, 1.2222, > > 1.0445)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", > > "data.frame")) > > > > the tibble looks like this: > > time??????????????? BTCUSDC SUIUSDC ENAUSDC AAVEUSDC ETHUSDC > > FTMUSDC > > <dttm>??????????????? <dbl>?? <dbl>?? <dbl>??? <dbl>?? <dbl>?? > > <dbl> > > 1 2024-12-16 18:59:59 107112.??? 4.73??? 1.19???? 388??? 4035.?? > > 1.38 > > 2 2024-12-17 18:59:59 107517.??? 4.69??? 1.14???? 366.?? 3975.?? > > 1.36 > > 3 2024-12-18 18:59:59 104630.??? 4.70??? 1.09???? 373.?? 3879.?? > > 1.22 > > 4 2024-12-19 18:59:59? 98692.??? 4.24??? 1.03???? 317.?? 3475.?? > > 1.04 > > > > 2- get my trades. Here is a sample: > > all_trade_sample <- structure(list(time > > structure(c(1737335082.949, > > 1737336735.697, 1738059550.671, 1738142709.422, 1738142709.422, > > 1738169351.788 ), tzone = "UTC", class = c("POSIXct", "POSIXt")), > > symbol = c("AAVEUSDC", "AAVEUSDC", "SUIUSDC", "AAVEUSDC", > > "AAVEUSDC", > > "ETHUSDC"), executed_qty = c(866.666, -834.998, 67649.3, -0.393, - > > 0.393, 36.1158), price = c(0.003005, 0.003131, 0.000038, > > 294.738321, > > 294.738321, 0.03027), cummulative_quote_qty = c(262699.317950113, - > > 263696.723173419, 263987.20719179, -115.83216, -115.83216, > > 111456.491386979 > > ), day = structure(c(20108, 20108, 20116, 20117, 20117, 20117 ), > > class = "Date")), row.names = c(NA, -6L), class = c("tbl_df", > > "tbl", > > "data.frame")) > > > > the tibble looks like this: > > time??????????????? symbol?? executed_qty????? price > > cummulative_quote_qty day > > <dttm>????????????? <chr>?????????? <dbl>????? <dbl> > > <dbl> <date> > > 1 2025-01-20 01:04:42 AAVEUSDC????? 867.????? 0.00300 > > 262699. 2025-01-20 > > 2 2025-01-20 01:32:15 AAVEUSDC???? -835.????? 0.00313?????????????? > > - > > 263697. 2025-01-20 > > 3 2025-01-28 10:19:10 SUIUSDC???? 67649.????? 0.000038 > > 263987. 2025-01-28 > > 4 2025-01-29 09:25:09 AAVEUSDC?????? -0.393 295. > > -116. 2025-01-29 > > 5 2025-01-29 09:25:09 AAVEUSDC?????? -0.393 295. > > -116. 2025-01-29 > > 6 2025-01-29 16:49:11 ETHUSDC??????? 36.1???? 0.0303 > > 111456. 2025-01-29 > > > > Now, to finalize, I want to get all the info in one data frame so I > > can compute daily valuation (with potential trades, or in/out of > > asset). > > The finalized tibble should look something like this: > > result <- structure(list(time = structure(c(1734371999.999, > > 1734458399.999, 1734544799.999, 1734631199.999, 1737335082.949, > > 1737336735.697, 1738059550.671, 1734721199, 1734807599, 1734893999, > > 1734980399, 1735066799, 1735153199, 1735239599, 1735325999, > > 1738142709.422, 1735412399, 1738142709.422, 1738169351.788, > > 1735498799), tzone = "", class = c("POSIXct", "POSIXt")), BTCUSDC > > c(107112.36, 107517.25, 104630.49, 98692.01,NA_real_, NA_real_, > > 102000, 101500, 101700,100300,100400,102300,102300,103100, > > NA_real_, > > 99800, NA_real_, NA_real_,NA_real_, 99900 ), SUIUSDC = c(4.7252, > > 4.6923, 4.7017, 4.2422, NA_real_, NA_real_, 4.25, 4.26, 4.7, 4.65, > > 4.52, 4.23, 4.17, 4.34, NA_real_, 4.52, NA_real_, > > NA_real_,NA_real_, > > 4.44), ENAUSDC = c(1.1862, 1.1412, 1.0928, 1.0256, NA_real_, > > NA_real_, 1.176, 1.16, 1.163, 1.183, 1.196, 1.165, 1.158, 1.142, > > NA_real_, 1.196, NA_real_, NA_real_,NA_real_, 1.113), AAVEUSDC > > c(388, 365.68, 373.15, 316.69, NA_real_, NA_real_, 102000, 101500, > > 101700,100300,100400,102300,102300,103100, NA_real_, 99800, > > NA_real_, > > NA_real_,NA_real_, 99900 ), ETHUSDC = c(4034.74, 3975.39, 3879.06, > > 3474.91, NA_real_,NA_real_, 3420, 3410, 3412, 3367, 3388, 3355, > > 3374, > > 3392, NA_real_, 3401, NA_real_, NA_real_,NA_real_, 3411), FTMUSDC > > c(1.381,1.3596, 1.2222, 1.0445, NA_real_,NA_real_, 1.36, 1.368, > > 1.342, 1.339, 1.436, 1.562, 1.53, 1.62, NA_real_, 1.31, NA_real_, > > NA_real_,NA_real_, 1.58), executed_qty = c(NA_real_, NA_real_, > > NA_real_, NA_real_, 230, 559, NA_real_, NA_real_,NA_real_, > > NA_real_, > > NA_real_,NA_real_, NA_real_, NA_real_, 200, NA_real_, 700, 200, > > 350, > > NA_real_), price = c( NA_real_, NA_real_, NA_real_, NA_real_, 2, 3, > > NA_real_, NA_real_,NA_real_, NA_real_, NA_real_,NA_real_, NA_real_, > > NA_real_, 10, NA_real_, 3, 4, 5, NA_real_) ), row.names = c(NA, - > > 20L), class = c("tbl_df", "tbl", "data.frame")) > > > > The idea is to have: > > - one line each day with closing price of my portfolio assets at > > 18:59:59 > > - one line for each trade where I shall write 'asset', 'price' and > > 'executed_qty'. > > > > I spent quite some time trying to figure out how to do it, but > > couldn't. I don't even know if it is possible. Of course, I don't > > want to add anything (price, date, asset...) by hand. > > Thank you for any help. > >-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20250402/1c8210fd/attachment.sig>
I will try to be more precised. Here is the very final table I want: portfolio_valuation <- structure(list(time = structure(c(1735153199, 1735239599), tzone = "", class = c("POSIXct", "POSIXt")), BTCUSDC = c(107000, 106000), SUIUSDC = c(4.7, 4.5), USDT = c(0, 320000), BTCUSDC_qty = c(5, 2), SUIUSDC_qty = c(200, 250), valuation = c(535940, 533125), return = c(NA_real_, -0.525)), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame")) portfolio_valuation # A tibble: 2 ? 8 time BTCUSDC SUIUSDC USDT BTCUSDC_qty SUIUSDC_qty valuation return <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 2024-12-25 18:59:59 107000 4.7 0 5 200 535940 NA 2 2024-12-26 18:59:59 106000 4.5 320000 2 250 533125 -0.525 As you can see, I will keep only a few things. I want one line per day, @ 18:59:59. The line will show: - the token I have in my portfolio (can be zero)? - which quantity - their closing price for the day - valuation (sum of token_qty x token_price + USDT). Note: USDT is numeric dollar (cash). - return: (diff ptf val day 1 - diff ptf val day 0)/ptf val day 0 The two tibbles I start with are: - token_close_sample: daily closing price for each token. The token list is > the token in my portfolio. I will fetch a regularly refreshed list of token I usually trade. USDT value is 1 - all_trade_sample: a tibble with all my trades. One trade per line with its timestamp, and some trade info (price, quantity etc). I hope this details will help to clarify. On Wed, 2025-04-02 at 15:41 -0400, avi.e.gross at gmail.com wrote:> Arnaud, > > I won't comment on other aspects but want to ask how sure you are > that your data is guaranteed to have a single row reflecting a > closing price at 18:59:59 exactly? > > It may be true for your data source. I note that markets technically > close at 4:00 PM, New York time, but many have after-hours trading, > and there are days it closes early (such as 1 PM) and times when > trading is halted. > > Generally, you can get closing prices (or other data) from other > reliable sources and you could choose to merge data from such a > source in rather than calculating them from your data. If you do want > to use your data, one suggestion is to use the LAST record in each > grouping for a day. I find that easy to do in dplyr by having a > column containing the date info except for the time, and another > containing the time in a sortable format. You can then sort the > data.frame by the date and then time and then? group your data.frame? > by the date and? and select only last record in each group and you > have the last time, whatever that may be. > > Again, this may not apply in your case. As you note, you are planning > on doing many things, one step at a time, and early stages can set up > your data.frames in ways that make later stages easier to do. As one > example, you could create a column in early stages that marks if the > current row is a closing row or not. > > -----Original Message----- > From: R-help <r-help-bounces at r-project.org> On Behalf Of Arnaud > Gaboury > Sent: Wednesday, April 2, 2025 2:10 PM > To: Ebert,Timothy Aaron <tebert at ufl.edu>; r-help at r-project.org > Subject: Re: [R] join/merge two data frames > > > > On Wed, 2025-04-02 at 16:55 +0000, Ebert,Timothy Aaron wrote: > > Your result data frame example makes no sense to me. The price and > > executed_qty are the same for all symbols? > > > > To get it all into one data frame you need a common variable that > > is > > used to join the data frames. > > My guess is that all_trade_sample$symbol has equivalents to the > > variables in token_close_sample. > > > > You need to pivot one of the tables, and then full join them. > > Please look at the join functions. In base R it is "merge()" and in > > dyply it is full_join(), left_join() and similar. > > > > I will pivot all_trade_sample to make all of the elements in > > "symbol" > > into variables. > > > > I will do this in tidyverse. There are some parts of this that I > > really like. However, it can as easily be done using base R. > > Library(tidyr) > > Library(dplyr) > > token_close_long <- token_close_sample %>% > > ? pivot_longer(cols = -time, names_to = "symbol", values_to > > "close_price") > > #I rename time in token_close_long so that it will be preserved > > colnames(token_close_long) <- c("time_close", "symbol", > > "close_price") > > combined <- full_join(all_trade_sample,token_close_long, > > by="symbol") > > > > This generates errors. This is ok, but I need to know more about > > all_trade_sample. Each symbol appears multiple times in > > all_trade_sample. Each instance of "AAVEUSIX" has a different time > > stamp. So maybe filter to eliminate all but the first (earliest) > > time? > > > > The program looks something like this: > > token_close_long <- token_close_sample %>% > > ? pivot_longer(cols = -time, names_to = "symbol", values_to > > "close_price") > > > > #I will rename time in token_close_long so that it will be > > preserved > > colnames(token_close_long) <- c("time_close", "symbol", > > "close_price") > > combined <- full_join(all_trade_sample,token_close_long, > > by="symbol") > > > > filtered_trades <- all_trade_sample |> > > ? group_by(symbol) |> > > ? slice_min(time, n = 1)|> > > ? ungroup() > > > > combined <- full_join(filtered_trades,token_close_long, > > by="symbol") > > > > I did not do something right, as there are four closing prices for > > each symbol. However, the general approach should work even if it > > needs a little modification to give the correct result. Your > > closing > > price is only relevant based on some other price (probably buying > > price, but could be opening price). > > > > Tim > > Thank you Tim for your answer. I will have a close look at it later > today. But in short: > - I will do some work on my data later in the script. The tibble I > want > is far from being the result. > - Time of closing price is everyday the same: 18:59:59. I will > compute > returns, volatility, correlations etc so I need a common reference. > Prices for the trades change and are given by the exchange. I don't > need them, won't do anything. > - Yes, symbols may be the common variable. > - I feel comfortable with tibbles and tidyverse > - All_trade_sample has the goal to fetch price (price of asset when > the > trade is done) and executed_qty. btc_price is needed at one point of > my > calculation, but will not be shown in the very final table. > - Multiple lines for one only symbol means I made multiple trades of > this asset during the same day. > > I will test your suggestions. > Thank you again. > > > > > > -----Original Message----- > > From: R-help <r-help-bounces at r-project.org> On Behalf Of Arnaud > > Gaboury > > Sent: Wednesday, April 2, 2025 6:20 AM > > To: r-help at r-project.org > > Subject: [R] join/merge two data frames > > > > [External Email] > > > > I work on a trading journal for a portfolio of crypto currencies. > > The > > goal is to fetch from my account (binance exchange) the trades I > > have > > done and daily closing prices of my assets. > > The first part (getting the data from exchange) are in two parts. > > > > 1- get the daily closing prices of my assets. Here is a sample of > > my > > data frame: > > token_close_sample <- structure(list(time > > structure(c(1734371999.999, 1734458399.999, 1734544799.999, > > 1734631199.999), tzone = "", class = c("POSIXct", "POSIXt")), > > BTCUSDC > > = c(107112.36, 107517.25, 104630.49, 98692.01 ), SUIUSDC > > c(4.7252, > > 4.6923, 4.7017, 4.2422), ENAUSDC = c(1.1862, 1.1412, 1.0928, > > 1.0256), > > AAVEUSDC = c(388, 365.68, 373.15, 316.69 ), ETHUSDC = c(4034.74, > > 3975.39, 3879.06, 3474.91), FTMUSDC = c(1.381, 1.3596, 1.2222, > > 1.0445)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", > > "data.frame")) > > > > the tibble looks like this: > > time??????????????? BTCUSDC SUIUSDC ENAUSDC AAVEUSDC ETHUSDC > > FTMUSDC > > <dttm>??????????????? <dbl>?? <dbl>?? <dbl>??? <dbl>?? <dbl>?? > > <dbl> > > 1 2024-12-16 18:59:59 107112.??? 4.73??? 1.19???? 388??? 4035.?? > > 1.38 > > 2 2024-12-17 18:59:59 107517.??? 4.69??? 1.14???? 366.?? 3975.?? > > 1.36 > > 3 2024-12-18 18:59:59 104630.??? 4.70??? 1.09???? 373.?? 3879.?? > > 1.22 > > 4 2024-12-19 18:59:59? 98692.??? 4.24??? 1.03???? 317.?? 3475.?? > > 1.04 > > > > 2- get my trades. Here is a sample: > > all_trade_sample <- structure(list(time > > structure(c(1737335082.949, > > 1737336735.697, 1738059550.671, 1738142709.422, 1738142709.422, > > 1738169351.788 ), tzone = "UTC", class = c("POSIXct", "POSIXt")), > > symbol = c("AAVEUSDC", "AAVEUSDC", "SUIUSDC", "AAVEUSDC", > > "AAVEUSDC", > > "ETHUSDC"), executed_qty = c(866.666, -834.998, 67649.3, -0.393, - > > 0.393, 36.1158), price = c(0.003005, 0.003131, 0.000038, > > 294.738321, > > 294.738321, 0.03027), cummulative_quote_qty = c(262699.317950113, - > > 263696.723173419, 263987.20719179, -115.83216, -115.83216, > > 111456.491386979 > > ), day = structure(c(20108, 20108, 20116, 20117, 20117, 20117 ), > > class = "Date")), row.names = c(NA, -6L), class = c("tbl_df", > > "tbl", > > "data.frame")) > > > > the tibble looks like this: > > time??????????????? symbol?? executed_qty????? price > > cummulative_quote_qty day > > <dttm>????????????? <chr>?????????? <dbl>????? <dbl> > > <dbl> <date> > > 1 2025-01-20 01:04:42 AAVEUSDC????? 867.????? 0.00300 > > 262699. 2025-01-20 > > 2 2025-01-20 01:32:15 AAVEUSDC???? -835.????? 0.00313?????????????? > > - > > 263697. 2025-01-20 > > 3 2025-01-28 10:19:10 SUIUSDC???? 67649.????? 0.000038 > > 263987. 2025-01-28 > > 4 2025-01-29 09:25:09 AAVEUSDC?????? -0.393 295. > > -116. 2025-01-29 > > 5 2025-01-29 09:25:09 AAVEUSDC?????? -0.393 295. > > -116. 2025-01-29 > > 6 2025-01-29 16:49:11 ETHUSDC??????? 36.1???? 0.0303 > > 111456. 2025-01-29 > > > > Now, to finalize, I want to get all the info in one data frame so I > > can compute daily valuation (with potential trades, or in/out of > > asset). > > The finalized tibble should look something like this: > > result <- structure(list(time = structure(c(1734371999.999, > > 1734458399.999, 1734544799.999, 1734631199.999, 1737335082.949, > > 1737336735.697, 1738059550.671, 1734721199, 1734807599, 1734893999, > > 1734980399, 1735066799, 1735153199, 1735239599, 1735325999, > > 1738142709.422, 1735412399, 1738142709.422, 1738169351.788, > > 1735498799), tzone = "", class = c("POSIXct", "POSIXt")), BTCUSDC > > c(107112.36, 107517.25, 104630.49, 98692.01,NA_real_, NA_real_, > > 102000, 101500, 101700,100300,100400,102300,102300,103100, > > NA_real_, > > 99800, NA_real_, NA_real_,NA_real_, 99900 ), SUIUSDC = c(4.7252, > > 4.6923, 4.7017, 4.2422, NA_real_, NA_real_, 4.25, 4.26, 4.7, 4.65, > > 4.52, 4.23, 4.17, 4.34, NA_real_, 4.52, NA_real_, > > NA_real_,NA_real_, > > 4.44), ENAUSDC = c(1.1862, 1.1412, 1.0928, 1.0256, NA_real_, > > NA_real_, 1.176, 1.16, 1.163, 1.183, 1.196, 1.165, 1.158, 1.142, > > NA_real_, 1.196, NA_real_, NA_real_,NA_real_, 1.113), AAVEUSDC > > c(388, 365.68, 373.15, 316.69, NA_real_, NA_real_, 102000, 101500, > > 101700,100300,100400,102300,102300,103100, NA_real_, 99800, > > NA_real_, > > NA_real_,NA_real_, 99900 ), ETHUSDC = c(4034.74, 3975.39, 3879.06, > > 3474.91, NA_real_,NA_real_, 3420, 3410, 3412, 3367, 3388, 3355, > > 3374, > > 3392, NA_real_, 3401, NA_real_, NA_real_,NA_real_, 3411), FTMUSDC > > c(1.381,1.3596, 1.2222, 1.0445, NA_real_,NA_real_, 1.36, 1.368, > > 1.342, 1.339, 1.436, 1.562, 1.53, 1.62, NA_real_, 1.31, NA_real_, > > NA_real_,NA_real_, 1.58), executed_qty = c(NA_real_, NA_real_, > > NA_real_, NA_real_, 230, 559, NA_real_, NA_real_,NA_real_, > > NA_real_, > > NA_real_,NA_real_, NA_real_, NA_real_, 200, NA_real_, 700, 200, > > 350, > > NA_real_), price = c( NA_real_, NA_real_, NA_real_, NA_real_, 2, 3, > > NA_real_, NA_real_,NA_real_, NA_real_, NA_real_,NA_real_, NA_real_, > > NA_real_, 10, NA_real_, 3, 4, 5, NA_real_) ), row.names = c(NA, - > > 20L), class = c("tbl_df", "tbl", "data.frame")) > > > > The idea is to have: > > - one line each day with closing price of my portfolio assets at > > 18:59:59 > > - one line for each trade where I shall write 'asset', 'price' and > > 'executed_qty'. > > > > I spent quite some time trying to figure out how to do it, but > > couldn't. I don't even know if it is possible. Of course, I don't > > want to add anything (price, date, asset...) by hand. > > Thank you for any help. > >-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: This is a digitally signed message part URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20250403/2cc9d7a7/attachment.sig>