Tim Sippel
2008-Dec-28 03:36 UTC
[R] Conditional operation on multiple columns from two data frames
Hi- I have two data frames for which I wish to conditionally subtract the values of one dataframe from the other. I want to subtract df1$x from df2$x when their id is equal and the absolute value of difference between the dates is 12 hours or less. If there is no match of equal id's and dates less than 12 hours apart I want it to return "NA". Note that df1 has missing values in x and y columns, but df2 does not have any missing values. I think I'm close with the code at the end, but would appreciate a bit of help getting over this hurdle. Sample dataframes follow: *df1<-* ID Date y x STM07.1 30-05-2007 18:48 STM07.1 30-05-2007 20:21 STM07.1 02-06-2007 09:39 -20.101 -134.421 STM07.1 02-06-2007 17:36 STM07.1 03-06-2007 20:31 STM07.1 05-06-2007 08:34 STM07.2 19-02-2007 00:00 -37.4082 178.34395 STM07.2 10-03-2007 19:58 STM07.2 23-03-2007 06:33 STM07.2 23-03-2007 17:02 STM07.2 24-03-2007 04:27 -31.164 178.765 STM07.2 24-03-2007 11:35 STM07.2 25-03-2007 06:41 -31.181 178.77 STM07.2 26-03-2007 05:11 ** *df2<-* ID Date y x STM07.1 31/05/2007 23:00 -20.8291 217.7022 STM07.1 1/06/2007 11:00 -20.825 217.698 STM07.1 1/06/2007 23:00 -20.8224 217.70115 STM07.2 19/02/2007 0:00 -37.4096 178.306 STM07.2 19/02/2007 12:00 -37.3752 178.336 STM07.2 20/02/2007 0:00 -37.277 178.378 STM07.2 20/02/2007 12:00 -37.1786 178.3845 STM07.2 21/02/2007 0:00 -37.076 178.389 STM07.2 21/02/2007 12:00 -36.9197 178.3965 STM07.2 22/02/2007 0:00 -36.7435 178.391 STM07.2 22/02/2007 12:00 -36.4925 178.509 STM07.2 23/02/2007 0:00 -36.2415 178.575 STM07.2 23/02/2007 12:00 -35.955 178.628 d.x <- ifelse(c(test = df1$ID==df2$ID && abs(difftime(as.POSIXlt(df1$Date), as.POSIXlt(df2$Date), units="hours") < 12)), yes = delta.lon=df1$x-df2$x, no = "NA") I'm using R 2.7.1. Cheers, Tim [[alternative HTML version deleted]]
Tim Sippel
2008-Dec-28 04:55 UTC
[R] Conditional operation on multiple columns from two data frames
Sorry...I was unaware I needed to 'deput' to deparse data examples into a useful format. The data examples I provided initially are deparsed below: df1<- list(ID = c(2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3), Date = c("2007-05-30 18:48:20", "2007-05-30 20:21:56", "2007-06-02 09:39:58", "2007-06-02 17:36:20", "2007-06-03 20:31:35", "2007-06-05 08:34:37", "2007-02-19 00:00:00", "2007-03-10 19:58:18", "2007-03-23 06:33:26", "2007-03-23 17:02:21", "2007-03-24 04:27:44", "2007-03-24 11:35:47", "2007-03-25 06:41:03" ), y = c(NA, NA, -20.101, NA, NA, NA, -37.4082166666667, NA, NA, NA, -31.164, NA, -31.181), x = c(NA, NA, -134.421, NA, NA, NA, 178.34395, NA, NA, NA, 178.765, NA, 178.77)) df2<- list(ID = c(1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), Date = c("2007-05-31 23:00:00", "2007-06-01 11:00:00", "2007-06-01 23:00:00", "2007-02-19 00:00:00", "2007-02-19 12:00:00", "2007-02-20 00:00:00", "2007-02-20 12:00:00", "2007-02-21 00:00:00", "2007-02-21 12:00:00", "2007-02-22 00:00:00", "2007-02-22 12:00:00", "2007-02-23 00:00:00", "2007-02-23 12:00:00" ), y = c(-20.829075, -20.824995, -20.822355, -37.4096, -37.37515, -37.277, -37.17855, -37.07595, -36.91965, -36.74345, -36.49245, -36.2415, -35.955), x = c(217.7022, 217.698, 217.70115, 178.306, 178.336, 178.378, 178.3845, 178.389, 178.3965, 178.391, 178.509, 178.575, 178.628)) On Sun, Dec 28, 2008 at 4:36 PM, Tim Sippel <tsippel@gmail.com> wrote:> Hi- > I have two data frames for which I wish to conditionally subtract the > values of one dataframe from the other. > > I want to subtract df1$x from df2$x when their id is equal and the > absolute value of difference between the dates is 12 hours or less. If > there is no match of equal id's and dates less than 12 hours apart I want it > to return "NA". Note that df1 has missing values in x and y columns, but > df2 does not have any missing values. I think I'm close with the code at > the end, but would appreciate a bit of help getting over this hurdle. > > Sample dataframes follow: > > *df1<-* > ID Date y x STM07.1 30-05-2007 18:48 STM07.1 30-05-2007 20:21 > STM07.1 02-06-2007 09:39 -20.101 -134.421 STM07.1 02-06-2007 17:36 > STM07.1 03-06-2007 20:31 STM07.1 05-06-2007 08:34 STM07.2 19-02-2007 > 00:00 -37.4082 178.34395 STM07.2 10-03-2007 19:58 STM07.2 23-03-2007 > 06:33 STM07.2 23-03-2007 17:02 STM07.2 24-03-2007 04:27 -31.164 178.765 > STM07.2 24-03-2007 11:35 STM07.2 25-03-2007 06:41 -31.181 178.77 STM07.2 26-03-2007 > 05:11 > ** > *df2<-* > ID Date y x STM07.1 31/05/2007 23:00 -20.8291 217.7022 STM07.1 1/06/2007 > 11:00 -20.825 217.698 STM07.1 1/06/2007 23:00 -20.8224 217.70115 STM07.2 19/02/2007 > 0:00 -37.4096 178.306 STM07.2 19/02/2007 12:00 -37.3752 178.336 STM07.2 20/02/2007 > 0:00 -37.277 178.378 STM07.2 20/02/2007 12:00 -37.1786 178.3845 STM07.2 21/02/2007 > 0:00 -37.076 178.389 STM07.2 21/02/2007 12:00 -36.9197 178.3965 STM07.2 22/02/2007 > 0:00 -36.7435 178.391 STM07.2 22/02/2007 12:00 -36.4925 178.509 STM07.2 23/02/2007 > 0:00 -36.2415 178.575 STM07.2 23/02/2007 12:00 -35.955 178.628 > > d.x <- ifelse(c(test = df1$ID==df2$ID && abs(difftime(as.POSIXlt(df1$Date), > as.POSIXlt(df2$Date), > units="hours") < 12)), yes = delta.lon=df1$x-df2$x, no = "NA") > > I'm using R 2.7.1. > > Cheers, > > Tim >[[alternative HTML version deleted]]
tsippel
2008-Dec-29 00:27 UTC
[R] Conditional operation on multiple columns from two data frames
The suggestion below was made. df1$Date <- as.Date(df1$Date) df2$Date <- as.Date(df2$Date) ifelse(df1$ID==df2$ID & df1$Date-df2$Date<0.5,df1$y-df2$y, NA) However, because my dataframe rows do not align, I need the conditionals to be tested on every combination of cells. I'm starting to think I need to use tapply? Tim -- View this message in context: http://www.nabble.com/Conditional-operation-on-multiple-columns-from-two-data-frames-tp21189891p21197566.html Sent from the R help mailing list archive at Nabble.com.