James Rome
2010-Mar-20 14:20 UTC
[R] How to select a row from one dataframe that is "close" to a row in another dataframe
I have two data frames of flight data, but they have very different numbers of rows. They come from different sources, so the data are not identical.> names(oooi)[1] "FltOrigDt" "MkdCrrCd" [3] "MkdFltNbr" "DprtTrpnStnCd" [5] "ArrTrpnStnCd" "ActualOutLocalTimestamp" [7] "ActualOffLocal" "ActualOnLocal" [9] "ActualInLocal" "ArrivalGate" [11] "DepartureGate" "Flight" [13] "OnDate" "MinutesIntoDay" [15] "OnHour" "pt"> names(runway)[1] "OnDateTime" "IATA" "ICAO" "Flight" [5] "AircraftType" "Tail" "Arrived" "STA" [9] "Runway" "From.To" "Delay" "OnDate" [13] "MinutesIntoDay" "pt" These sets have several hundred thousand rows. In both sets, pt is a POSIXct for the arrival time (from different sources). They are not identical, but surely should be within an hour of each other (hopefully a lot less), and the Flight fields must be the same. So (abs(runway$pt - oooi$pt) < 3600) & (runway$Flight == oooi$Flight) should pick out the corresponding rows in the two data sets (if there is a match). What I need to do is to take the Runway from runway and insert it into the oooi df for the correct flight. What is the best way to do this in R? Thanks, Jim Rome
Gabor Grothendieck
2010-Mar-20 14:27 UTC
[R] How to select a row from one dataframe that is "close" to a row in another dataframe
Using the sqldf package you could do an SQL join with the indicated condition in your where clause. See the examples section of this page: http://sqldf.googlecode.com On Sat, Mar 20, 2010 at 10:20 AM, James Rome <jamesrome at gmail.com> wrote:> I have two data frames of flight data, ?but they have very different > numbers of rows. They come from different sources, so the data are not > identical. > >> names(oooi) > ?[1] "FltOrigDt" ? ? ? ? ? ? ? "MkdCrrCd" > ?[3] "MkdFltNbr" ? ? ? ? ? ? ? "DprtTrpnStnCd" > ?[5] "ArrTrpnStnCd" ? ? ? ? ? ?"ActualOutLocalTimestamp" > ?[7] "ActualOffLocal" ? ? ? ? ?"ActualOnLocal" > ?[9] "ActualInLocal" ? ? ? ? ? "ArrivalGate" > [11] "DepartureGate" ? ? ? ? ? "Flight" > [13] "OnDate" ? ? ? ? ? ? ? ? ?"MinutesIntoDay" > [15] "OnHour" ? ? ? ? ? ? ? ? ?"pt" > >> names(runway) > ?[1] "OnDateTime" ? ? "IATA" ? ? ? ? ? "ICAO" ? ? ? ? ? "Flight" > ?[5] "AircraftType" ? "Tail" ? ? ? ? ? "Arrived" ? ? ? ?"STA" > ?[9] "Runway" ? ? ? ? "From.To" ? ? ? ?"Delay" ? ? ? ? ?"OnDate" > [13] "MinutesIntoDay" "pt" > > These sets have several hundred thousand rows. > > In both sets, pt is a POSIXct for the arrival time (from different > sources). They are not identical, but surely should be within an hour of > each other (hopefully a lot less), and the Flight fields must be the > same. So > (abs(runway$pt - oooi$pt) < 3600) & (runway$Flight == oooi$Flight) > should pick out the corresponding rows in the two data sets (if there is > a match). > > What I need to do is to take the Runway from runway and insert it into > the oooi df for the correct flight. > > What is the best way to do this in R? > > Thanks, > Jim Rome > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Daniel Malter
2010-Mar-20 15:52 UTC
[R] How to select a row from one dataframe that is "close" to a row in another dataframe
If the flight identifiers runway$Flight and oooi$Flight are unique (i.e. only one observation has the same identifier in each dataset), you could use merge() to bind together the dataset based on matching the two. See, ?merge Also, I see an OnDate variable in both dataset. So if Flight does not provide unique identification, maybe Flight and OnDate together do, which can also be handled in merge. Let us know if that solves the problem. Best, Daniel ------------------------- cuncta stricte discussurus ------------------------- -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of James Rome Sent: Saturday, March 20, 2010 10:20 AM To: r-help at r-project.org Subject: [R] How to select a row from one dataframe that is "close" to a row in another dataframe I have two data frames of flight data, but they have very different numbers of rows. They come from different sources, so the data are not identical.> names(oooi)[1] "FltOrigDt" "MkdCrrCd" [3] "MkdFltNbr" "DprtTrpnStnCd" [5] "ArrTrpnStnCd" "ActualOutLocalTimestamp" [7] "ActualOffLocal" "ActualOnLocal" [9] "ActualInLocal" "ArrivalGate" [11] "DepartureGate" "Flight" [13] "OnDate" "MinutesIntoDay" [15] "OnHour" "pt"> names(runway)[1] "OnDateTime" "IATA" "ICAO" "Flight" [5] "AircraftType" "Tail" "Arrived" "STA" [9] "Runway" "From.To" "Delay" "OnDate" [13] "MinutesIntoDay" "pt" These sets have several hundred thousand rows. In both sets, pt is a POSIXct for the arrival time (from different sources). They are not identical, but surely should be within an hour of each other (hopefully a lot less), and the Flight fields must be the same. So (abs(runway$pt - oooi$pt) < 3600) & (runway$Flight == oooi$Flight) should pick out the corresponding rows in the two data sets (if there is a match). What I need to do is to take the Runway from runway and insert it into the oooi df for the correct flight. What is the best way to do this in R? Thanks, Jim Rome ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
jim holtman
2010-Mar-20 16:17 UTC
[R] How to select a row from one dataframe that is "close" to a row in another dataframe
sqldf package:> n <- 200 > x <- data.frame(id=sample(1:10,n, TRUE), timex=runif(n)) > y <- data.frame(id=sample(1:15, n, TRUE), timey=runif(n), runway=seq(n)) > require(sqldf) > sqldf("select x.id, timex, y.id, timey, y.runway from x join y+ where x.id = y.id and abs(timex - timey) < 0.005") id timex id timey runway 1 7 0.39110674 7 0.38859025 61 2 7 0.58829365 7 0.58581915 133 3 9 0.34610180 9 0.34866404 65 4 2 0.72069416 2 0.72080087 7 5 3 0.75491843 3 0.75640271 37 6 5 0.86930840 5 0.87373981 15 7 3 0.07181249 3 0.06935293 108 8 6 0.55273483 6 0.55286014 101 9 10 0.50377737 10 0.50563669 139 10 1 0.17325424 1 0.17129662 32 11 3 0.09013624 3 0.09281639 112 12 8 0.04805349 8 0.04618661 196 13 2 0.19428938 2 0.19260035 50 14 10 0.16194595 10 0.16565594 44 15 7 0.51193601 7 0.51352435 14 16 3 0.02331951 3 0.02119733 64 17 9 0.69456540 9 0.69376281 39 18 2 0.20070366 2 0.20432466 134 19 10 0.50438411 10 0.50563669 139 20 4 0.25271897 4 0.25211036 72>On Sat, Mar 20, 2010 at 10:20 AM, James Rome <jamesrome@gmail.com> wrote:> I have two data frames of flight data, but they have very different > numbers of rows. They come from different sources, so the data are not > identical. > > > names(oooi) > [1] "FltOrigDt" "MkdCrrCd" > [3] "MkdFltNbr" "DprtTrpnStnCd" > [5] "ArrTrpnStnCd" "ActualOutLocalTimestamp" > [7] "ActualOffLocal" "ActualOnLocal" > [9] "ActualInLocal" "ArrivalGate" > [11] "DepartureGate" "Flight" > [13] "OnDate" "MinutesIntoDay" > [15] "OnHour" "pt" > > > names(runway) > [1] "OnDateTime" "IATA" "ICAO" "Flight" > [5] "AircraftType" "Tail" "Arrived" "STA" > [9] "Runway" "From.To" "Delay" "OnDate" > [13] "MinutesIntoDay" "pt" > > These sets have several hundred thousand rows. > > In both sets, pt is a POSIXct for the arrival time (from different > sources). They are not identical, but surely should be within an hour of > each other (hopefully a lot less), and the Flight fields must be the > same. So > (abs(runway$pt - oooi$pt) < 3600) & (runway$Flight == oooi$Flight) > should pick out the corresponding rows in the two data sets (if there is > a match). > > What I need to do is to take the Runway from runway and insert it into > the oooi df for the correct flight. > > What is the best way to do this in R? > > Thanks, > Jim Rome > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]]