Hello I'm not sure if this is an appropriate use of this mailing list or not, please let me know if it isn't.? I'm struggling to figure out how to merge two data tables based on max effective date logic compared to when a payment occurred.? My dtDistributions DT is a transactional dataset while dtDepartments is a domain data set containing all department names and the effective date of when department name changes have occurred.? For the Bob example below, there was a payment on 2016-01-01 which occurred in H229000.? In 2012, this department was named "Modified Name", in 2019 the department will be named "Final Name".? When I merge these two tables, I'd like it to pull the transactional data and match it up to department name "Modified Name" since that was the active department name at the time of that transaction.? I've read documentation on foverlaps, but I'm not sure if this problem is considered a range of dates or not. At the bottom of this post is a temporarily solution that is working but it runs for a long time due to the amount of data in my actual source. Here is some sample data to get started: library(data.table) dtDistributions <- data.table(PayeeName = c("Bob", "Tracy", "Tom"), ????????????????????????? Department = factor(c("H229000", "H135000", "H047800")), ????????????????????????? Amount = c(5, 34, 87), ????????????????????????? PaymentDT = as.Date(c("2016-01-01", "2015-01-01", "2015-01-01"))) dtDepartments <- data.table(Department = factor(c("H229000", "H229000", "H229000", "H135000", "H047800")), ??????????????????????? EffDT = as.Date(c("2019-01-01", "2012-01-01", "1901-01-01", "1901-01-01", "1901-01-01")), ??????????????????????? Descr = c("Final Name","Modified Name","Original Name","Payables","Postal")) Here is the output I would like to see: PayeeName? Department???? PaymentDT?? Amount Bob??????? Modified Name ?2016-01-01 ?5 Tracy????? Payables????? ?2015-01-01 ?34 Tom??????? Postal??????? ?2015-01-01 ?87 I was able to get this working by using the sqldf library, but it runs for a very long time in my actual dataset and I'd like to use data.table if at all possible. library(sqldf) joinString <- "SELECT A.PayeeName, B.Descr, A.PaymentDT, A.Amount ?????? ?????FROM dtDistributions A, dtDepartments B ??????????? WHERE A.DEPARTMENT = B.Department ??????????? AND B.EffDT = (SELECT MAX(ED.EffDT) ??????????????????????????? FROM dtDepartments ED ??????????????????????????? WHERE B.Department = ED.Department ??? ????????????????????????AND ED.EffDT <= A.PaymentDT)" finalDT <- data.table(sqldf(joinString)) -----Message Disclaimer----- This e-mail message is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the intended recipient, any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by reply email to Connect at principal.com and delete or destroy all copies of the original message and attachments thereto. Email sent to or from the Principal Financial Group or any of its member companies may be retained as required by law or regulation. Nothing in this message is intended to constitute an Electronic signature for purposes of the Uniform Electronic Transactions Act (UETA) or the Electronic Signatures in Global and National Commerce Act ("E-Sign") unless a specific statement to the contrary is included in this message. If you no longer wish to receive any further solicitation from the Principal Financial Group you may unsubscribe at https://www.principal.com/do-not-contact-form any time. If you are a Canadian resident and no longer wish to receive commercial electronic messages you may unsubscribe at https://www.principal.com/do-not-email-request-canadian-residents any time. This message was secured by Zix(R).
Did you search first? (This is suggested by the posting guide -- below -- prior to posting). "merge 2 data.tables in R" brought up what looked like useful stuff, in particular the merge() function for data tables. If this does not do what you want, it may help to explain why not. Alternatively, there is a merge.data.frame function that may do the job if you first convert your data.table to a data.frame. As I do not use the data.table package, you or others may have to fill in details to make these work -- if they *can* work. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Thu, Feb 1, 2018 at 9:45 AM, Graeve, Nick <Graeve.Nick at principal.com> wrote:> Hello > > I'm not sure if this is an appropriate use of this mailing list or not, please let me know if it isn't. I'm struggling to figure out how to merge two data tables based on max effective date logic compared to when a payment occurred. My dtDistributions DT is a transactional dataset while dtDepartments is a domain data set containing all department names and the effective date of when department name changes have occurred. For the Bob example below, there was a payment on 2016-01-01 which occurred in H229000. In 2012, this department was named "Modified Name", in 2019 the department will be named "Final Name". When I merge these two tables, I'd like it to pull the transactional data and match it up to department name "Modified Name" since that was the active department name at the time of that transaction. I've read documentation on foverlaps, but I'm not sure if this problem is considered a range of dates or not. At the bottom of this post is a temporarily solution that is working but it runs for a long time due to the amount of data in my actual source. > > Here is some sample data to get started: > library(data.table) > dtDistributions <- data.table(PayeeName = c("Bob", "Tracy", "Tom"), > Department = factor(c("H229000", "H135000", "H047800")), > Amount = c(5, 34, 87), > PaymentDT = as.Date(c("2016-01-01", "2015-01-01", "2015-01-01"))) > > dtDepartments <- data.table(Department = factor(c("H229000", "H229000", "H229000", "H135000", "H047800")), > EffDT = as.Date(c("2019-01-01", "2012-01-01", "1901-01-01", "1901-01-01", "1901-01-01")), > Descr = c("Final Name","Modified Name","Original Name","Payables","Postal")) > > Here is the output I would like to see: > PayeeName Department PaymentDT Amount > Bob Modified Name 2016-01-01 5 > Tracy Payables 2015-01-01 34 > Tom Postal 2015-01-01 87 > > I was able to get this working by using the sqldf library, but it runs for a very long time in my actual dataset and I'd like to use data.table if at all possible. > library(sqldf) > joinString <- "SELECT A.PayeeName, B.Descr, A.PaymentDT, A.Amount > FROM dtDistributions A, dtDepartments B > WHERE A.DEPARTMENT = B.Department > AND B.EffDT = (SELECT MAX(ED.EffDT) > FROM dtDepartments ED > WHERE B.Department = ED.Department > AND ED.EffDT <= A.PaymentDT)" > > finalDT <- data.table(sqldf(joinString)) > > > > -----Message Disclaimer----- > > This e-mail message is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the intended recipient, any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by reply email to Connect at principal.com and delete or destroy all copies of the original message and attachments thereto. Email sent to or from the Principal Financial Group or any of its member companies may be retained as required by law or regulation. > > Nothing in this message is intended to constitute an Electronic signature for purposes of the Uniform Electronic Transactions Act (UETA) or the Electronic Signatures in Global and National Commerce Act ("E-Sign") unless a specific statement to the contrary is included in this message. > > If you no longer wish to receive any further solicitation from the Principal Financial Group you may unsubscribe at https://www.principal.com/do-not-contact-form any time. > > If you are a Canadian resident and no longer wish to receive commercial electronic messages you may unsubscribe at https://www.principal.com/do-not-email-request-canadian-residents any time. > > > > > > This message was secured by Zix(R). > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
I rarely use data.table, but I think the vignette for the package discusses rolling joins. Also, Google popped up [1]. [1] https://www.r-bloggers.com/understanding-data-table-rolling-joins/ -- Sent from my phone. Please excuse my brevity. On February 1, 2018 9:45:53 AM PST, "Graeve, Nick" <Graeve.Nick at principal.com> wrote:>Hello > >I'm not sure if this is an appropriate use of this mailing list or not, >please let me know if it isn't.? I'm struggling to figure out how to >merge two data tables based on max effective date logic compared to >when a payment occurred.? My dtDistributions DT is a transactional >dataset while dtDepartments is a domain data set containing all >department names and the effective date of when department name changes >have occurred.? For the Bob example below, there was a payment on >2016-01-01 which occurred in H229000.? In 2012, this department was >named "Modified Name", in 2019 the department will be named "Final >Name".? When I merge these two tables, I'd like it to pull the >transactional data and match it up to department name "Modified Name" >since that was the active department name at the time of that >transaction.? I've read documentation on foverlaps, but I'm not sure if >this problem is considered a range of dates or not. At the bottom of >this post is a temporarily solution that is working but it runs for a >long time due to the amount of data in my actual source. > >Here is some sample data to get started: >library(data.table) >dtDistributions <- data.table(PayeeName = c("Bob", "Tracy", "Tom"), >????????????????????????? Department = factor(c("H229000", "H135000", >"H047800")), >????????????????????????? Amount = c(5, 34, 87), >????????????????????????? PaymentDT = as.Date(c("2016-01-01", >"2015-01-01", "2015-01-01"))) > >dtDepartments <- data.table(Department = factor(c("H229000", "H229000", >"H229000", "H135000", "H047800")), >??????????????????????? EffDT = as.Date(c("2019-01-01", "2012-01-01", >"1901-01-01", "1901-01-01", "1901-01-01")), >??????????????????????? Descr = c("Final Name","Modified >Name","Original Name","Payables","Postal")) > >Here is the output I would like to see: >PayeeName? Department???? PaymentDT?? Amount >Bob??????? Modified Name ?2016-01-01 ?5 >Tracy????? Payables????? ?2015-01-01 ?34 >Tom??????? Postal??????? ?2015-01-01 ?87 > >I was able to get this working by using the sqldf library, but it runs >for a very long time in my actual dataset and I'd like to use >data.table if at all possible. >library(sqldf) >joinString <- "SELECT A.PayeeName, B.Descr, A.PaymentDT, A.Amount >?????? ?????FROM dtDistributions A, dtDepartments B >??????????? WHERE A.DEPARTMENT = B.Department >??????????? AND B.EffDT = (SELECT MAX(ED.EffDT) >??????????????????????????? FROM dtDepartments ED >??????????????????????????? WHERE B.Department = ED.Department >??? ????????????????????????AND ED.EffDT <= A.PaymentDT)" > >finalDT <- data.table(sqldf(joinString)) > > > >-----Message Disclaimer----- > >This e-mail message is intended only for the use of the individual or >entity to which it is addressed, and may contain information that is >privileged, confidential and exempt from disclosure under applicable >law. If you are not the intended recipient, any dissemination, >distribution or copying of this communication is strictly prohibited. >If you have received this communication in error, please notify us >immediately by reply email to Connect at principal.com and delete or >destroy all copies of the original message and attachments thereto. >Email sent to or from the Principal Financial Group or any of its >member companies may be retained as required by law or regulation. > >Nothing in this message is intended to constitute an Electronic >signature for purposes of the Uniform Electronic Transactions Act >(UETA) or the Electronic Signatures in Global and National Commerce Act >("E-Sign") unless a specific statement to the contrary is included in >this message. > >If you no longer wish to receive any further solicitation from the >Principal Financial Group you may unsubscribe at >https://www.principal.com/do-not-contact-form any time. > >If you are a Canadian resident and no longer wish to receive commercial >electronic messages you may unsubscribe at >https://www.principal.com/do-not-email-request-canadian-residents any >time. > > > > > >This message was secured by Zix(R). > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.