I am merging two dataframes using a relational key (incident number and incident year), but not all the records match up. I want to be able to review only the records that cannot be merged for each individual dataframe (essentially trying to select records from one dataframe using a multi-value relational key from the other dataframe). The following code shows what I am trying to do. The final two lines of code do not work, but if somebody could figure out a workable solution, that would be great. Thanks. --Markus incidents = data.frame( INC_NO = c(1,2,3,4,5,6,7,8,9,10), INC_YEAR = c(2006, 2006, 2006, 2007, 2008, 2008, 2008, 2008, 2009, 2010), INC_TYPE = c("EMS", "FIRE", "GAS", "MVA", "EMS", "EMS", "EMS", "FIRE", "EMS", "EMS")) responses = data.frame( INC_NO = c(1,2,2,2,3,4,5,6,7,8,8,8,9,10), INC_YEAR = c(2006, 2006, 2006, 2006, 2006, 2007, 2008, 2008, 2008, 2018, 2018, 2018, 2009, 2010), UNIT_TYPE = c("E2", "E2", "E5", "T1", "E7", "E6", "E2", "E2", "E1", "E3", "E7", "T1", "E7", "E5")) merged_data = merge(incidents, responses, by=c("INC_NO", "INC_YEAR")) relational_key = c("INC_NO", "INC_YEAR") ## following does not work, but I want DF of incidents that did not merge up with responses incidents[incidents[,relational_key] %in% responses[,relational_key],] ## following does not work, but I want DF of responses that did not merge up with incidents responses[responses[,relational_key] %in% incidents[,relational_key],] [[alternative HTML version deleted]]
Hello, On 09/07/2010 07:25 PM, Markus Weisner wrote:> I am merging two dataframes using a relational key (incident number and > incident year), but not all the records match up. I want to be able to > review only the records that cannot be merged for each individual dataframe > (essentially trying to select records from one dataframe using a multi-value > relational key from the other dataframe). The following code shows what I > am trying to do. The final two lines of code do not work, but if somebody > could figure out a workable solution, that would be great. Thanks. > --Markus > > incidents = data.frame( > INC_NO = c(1,2,3,4,5,6,7,8,9,10), > INC_YEAR = c(2006, 2006, 2006, 2007, 2008, 2008, 2008, 2008, 2009, > 2010), > INC_TYPE = c("EMS", "FIRE", "GAS", "MVA", "EMS", "EMS", "EMS", > "FIRE", "EMS", "EMS")) > > responses = data.frame( > INC_NO = c(1,2,2,2,3,4,5,6,7,8,8,8,9,10), > INC_YEAR = c(2006, 2006, 2006, 2006, 2006, 2007, 2008, 2008, 2008, > 2018, 2018, 2018, 2009, 2010), > UNIT_TYPE = c("E2", "E2", "E5", "T1", "E7", "E6", "E2", "E2", "E1", > "E3", "E7", "T1", "E7", "E5")) > > merged_data = merge(incidents, responses, by=c("INC_NO", "INC_YEAR")) > > relational_key = c("INC_NO", "INC_YEAR") > > ## following does not work, but I want DF of incidents that did not merge up > with responses > incidents[incidents[,relational_key] %in% responses[,relational_key],] > > ## following does not work, but I want DF of responses that did not merge up > with incidents > responses[responses[,relational_key] %in% incidents[,relational_key],]Surely there's a more elegant way... This function takes two data.frames, and returns those elements that aren't merged as a list, one element for each data.frame. You need to specify a key. compare <- function(df1, df2, key) { md <- merge(df1, df2, by = key) keys <- lapply(list(df1, df2, md), function(x) do.call("paste", c(x[key], sep = "\r"))) mapply(function(x, y) x[!y %in% keys[[3]], ], list(df1, df2), keys[-3], SIMPLIFY = FALSE) } > compare(responses, incidents, key = relational_key) [[1]] INC_NO INC_YEAR UNIT_TYPE 10 8 2018 E3 11 8 2018 E7 12 8 2018 T1 [[2]] INC_NO INC_YEAR INC_TYPE 8 8 2008 FIRE
try this:> merged_data = merge(incidents, responses, by=c("INC_NO", "INC_YEAR"), all=TRUE) > # responses that don't match > subset(merged_data, is.na(INC_TYPE), select=c(INC_NO, INC_YEAR, UNIT_TYPE))INC_NO INC_YEAR UNIT_TYPE 11 8 2018 E3 12 8 2018 E7 13 8 2018 T1> # incidents that don't match > subset(merged_data, is.na(UNIT_TYPE), select=c(INC_NO, INC_YEAR, INC_TYPE))INC_NO INC_YEAR INC_TYPE 10 8 2008 FIRE>On Tue, Sep 7, 2010 at 8:25 PM, Markus Weisner <maw4b at me.com> wrote:> I am merging two dataframes using a relational key (incident number and > incident year), but not all the records match up. ?I want to be able to > review only the records that cannot be merged for each individual dataframe > (essentially trying to select records from one dataframe using a multi-value > relational key from the other dataframe). ?The following code shows what I > am trying to do. ?The final two lines of code do not work, but if somebody > could figure out a workable solution, that would be great. ?Thanks. > --Markus > > incidents = data.frame( > ? ? ? ?INC_NO = c(1,2,3,4,5,6,7,8,9,10), > ? ? ? ?INC_YEAR = c(2006, 2006, 2006, 2007, 2008, 2008, 2008, 2008, 2009, > 2010), > ? ? ? ?INC_TYPE = c("EMS", "FIRE", "GAS", "MVA", "EMS", "EMS", "EMS", > "FIRE", "EMS", "EMS")) > > responses = data.frame( > ? ? ? ?INC_NO = c(1,2,2,2,3,4,5,6,7,8,8,8,9,10), > ? ? ? ?INC_YEAR = c(2006, 2006, 2006, 2006, 2006, 2007, 2008, 2008, 2008, > 2018, 2018, 2018, 2009, 2010), > ? ? ? ?UNIT_TYPE = c("E2", "E2", "E5", "T1", "E7", "E6", "E2", "E2", "E1", > "E3", "E7", "T1", "E7", "E5")) > > merged_data = merge(incidents, responses, by=c("INC_NO", "INC_YEAR")) > > relational_key = c("INC_NO", "INC_YEAR") > > ## following does not work, but I want DF of incidents that did not merge up > with responses > incidents[incidents[,relational_key] %in% responses[,relational_key],] > > ## following does not work, but I want DF of responses that did not merge up > with incidents > responses[responses[,relational_key] %in% incidents[,relational_key],] > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?