R-help, I'm getting some unexpected behavior with subsetting a data frame (aircraft flight data) that I can't sort out. Here is a simplified version of my data frame and problem: > flight FlightID TailNo FlightDate HobbsTime FlightCost Date year 1 4497 6009K <NA> 2.2 330.0 <NA> NA 2 4498 6009K <NA> 0.8 120.0 <NA> NA 3 4499 6009K <NA> 0.9 135.0 <NA> NA 4 4500 6009K <NA> 1.1 165.0 <NA> NA 5 4501 6009K <NA> 1.5 225.0 <NA> NA 2587 7083 9206N 4/8/2009 1.5 103.5 2009-04-08 2009 2588 7084 9206N 4/10/2009 1.3 89.7 2009-04-10 2009 2589 7085 9206N 4/11/2009 1.9 131.1 2009-04-11 2009 2590 7086 9206N 4/12/2009 1.3 89.7 2009-04-12 2009 2591 7087 9206N 4/15/2009 1.1 75.9 2009-04-15 2009 29793 35208 91630 1/21/2006 1.4 107.8 2006-01-21 2006 29794 35209 91630 1/21/2006 0.7 53.9 2006-01-21 2006 29795 35210 9725B 1/21/2006 1.4 138.6 2006-01-21 2006 29796 35212 91630 1/28/2006 1.0 77.0 2006-01-28 2006 29797 35213 91630 1/28/2006 1.6 123.2 2006-01-28 2006 29798 35214 3386E 1/5/2006 1.1 86.9 2006-01-05 2006 I then try to extract the error years : > errors <- flight[flight$year > 2006,] > errors FlightID TailNo FlightDate HobbsTime FlightCost Date year NA NA <NA> <NA> NA NA <NA> NA NA.1 NA <NA> <NA> NA NA <NA> NA NA.2 NA <NA> <NA> NA NA <NA> NA NA.3 NA <NA> <NA> NA NA <NA> NA NA.4 NA <NA> <NA> NA NA <NA> NA 2587 7083 9206N 4/8/2009 1.5 103.5 2009-04-08 2009 2588 7084 9206N 4/10/2009 1.3 89.7 2009-04-10 2009 2589 7085 9206N 4/11/2009 1.9 131.1 2009-04-11 2009 2590 7086 9206N 4/12/2009 1.3 89.7 2009-04-12 2009 2591 7087 9206N 4/15/2009 1.1 75.9 2009-04-15 2009 Would someone please explain to me why the new data frame has all columns (and row names) replaced with NA where year was NA and how to avoid this behavior?. Thanks in advance. I am using R v2.2.1 on Windows XP. Cheers, eric Sample Data: structure(list(FlightID = c(4497, 4498, 4499, 4500, 4501, 7083, 7084, 7085, 7086, 7087, 35208, 35209, 35210, 35212, 35213, 35214 ), TailNo = structure(c(28, 28, 28, 28, 28, 49, 49, 49, 49, 49, 47, 47, 54, 47, 47, 15), .Label = c("12345", "133BW", "152GB", "172CM", "172RW", "1955L", "2219E", "222WC", "231NW", "2496M", "2630V", "2726E", "2903A", "2977G", "3386E", "3803E", "3979V", "409EV", "43160", "46275", "4644B", "47885", "4922D", "4975F", "5073H", "5317P", "5335P", "6009K", "6013X", "6036J", "6360D", "64048", "6495R", "66038", "67844", "6913R", "733XL", "734BT", "738QA", "808LP", "8148F", "8164Z", "8269T", "8451R", "8654V", "8715E", "91630", "9199Z", "9206N", "92SA", "936GW", "9488G", "9596H", "9725B", "9756U", "ELITE", "N20BY", "N53MF"), class = "factor"), FlightDate = c(NA, NA, NA, NA, NA, "4/8/2009", "4/10/2009", "4/11/2009", "4/12/2009", "4/15/2009", "1/21/2006", "1/21/2006", "1/21/2006", "1/28/2006", "1/28/2006", "1/5/2006"), HobbsTime = c(2.2, 0.8, 0.9, 1.1, 1.5, 1.5, 1.3, 1.9, 1.3, 1.1, 1.4, 0.7, 1.4, 1, 1.6, 1.1), FlightCost = c(330, 120, 135, 165, 225, 103.5, 89.7, 131.1, 89.7, 75.9, 107.8, 53.9, 138.6, 77, 123.2, 86.9 ), Date = structure(c(NA, NA, NA, NA, NA, 1239174000, 1239346800, 1239433200, 1239519600, 1239778800, 1137830400, 1137830400, 1137830400, 1138435200, 1138435200, 1136448000), tzone = "", class = c("POSIXt", "POSIXct")), year = c(NA, NA, NA, NA, NA, 2009, 2009, 2009, 2009, 2009, 2006, 2006, 2006, 2006, 2006, 2006)), .Names = c("FlightID", "TailNo", "FlightDate", "HobbsTime", "FlightCost", "Date", "year" ), row.names = c("1", "2", "3", "4", "5", "2587", "2588", "2589", "2590", "2591", "29793", "29794", "29795", "29796", "29797", "29798"), class = "data.frame") -- Eric Archer, Ph.D. NOAA-SWFSC 8604 La Jolla Shores Dr. La Jolla, CA 92037 858-546-7121,7003(FAX) eric.archer at noaa.gov "Lighthouses are more helpful than churches." - Benjamin Franklin "Cogita tute" - Think for yourself
Eric Archer wrote:> R-help, > > I'm getting some unexpected behavior with subsetting a data frame > (aircraft flight data) that I can't sort out. > Here is a simplified version of my data frame and problem: > > > flight > FlightID TailNo FlightDate HobbsTime FlightCost Date year > 1 4497 6009K <NA> 2.2 330.0 <NA> NA > 2 4498 6009K <NA> 0.8 120.0 <NA> NA > 3 4499 6009K <NA> 0.9 135.0 <NA> NA > 4 4500 6009K <NA> 1.1 165.0 <NA> NA > 5 4501 6009K <NA> 1.5 225.0 <NA> NA > 2587 7083 9206N 4/8/2009 1.5 103.5 2009-04-08 2009 > 2588 7084 9206N 4/10/2009 1.3 89.7 2009-04-10 2009 > 2589 7085 9206N 4/11/2009 1.9 131.1 2009-04-11 2009 > 2590 7086 9206N 4/12/2009 1.3 89.7 2009-04-12 2009 > 2591 7087 9206N 4/15/2009 1.1 75.9 2009-04-15 2009 > 29793 35208 91630 1/21/2006 1.4 107.8 2006-01-21 2006 > 29794 35209 91630 1/21/2006 0.7 53.9 2006-01-21 2006 > 29795 35210 9725B 1/21/2006 1.4 138.6 2006-01-21 2006 > 29796 35212 91630 1/28/2006 1.0 77.0 2006-01-28 2006 > 29797 35213 91630 1/28/2006 1.6 123.2 2006-01-28 2006 > 29798 35214 3386E 1/5/2006 1.1 86.9 2006-01-05 2006 > > I then try to extract the error years : > > > errors <- flight[flight$year > 2006,] > > errors > FlightID TailNo FlightDate HobbsTime FlightCost Date year > NA NA <NA> <NA> NA NA <NA> NA > NA.1 NA <NA> <NA> NA NA <NA> NA > NA.2 NA <NA> <NA> NA NA <NA> NA > NA.3 NA <NA> <NA> NA NA <NA> NA > NA.4 NA <NA> <NA> NA NA <NA> NA > 2587 7083 9206N 4/8/2009 1.5 103.5 2009-04-08 2009 > 2588 7084 9206N 4/10/2009 1.3 89.7 2009-04-10 2009 > 2589 7085 9206N 4/11/2009 1.9 131.1 2009-04-11 2009 > 2590 7086 9206N 4/12/2009 1.3 89.7 2009-04-12 2009 > 2591 7087 9206N 4/15/2009 1.1 75.9 2009-04-15 2009 > > Would someone please explain to me why the new data frame has all > columns (and row names) replaced with NA where year was NA and how to > avoid this behavior?. > Thanks in advance. > > I am using R v2.2.1 on Windows XP. > > Cheers, > eric[snip] flight$year > 2006 will return TRUE/FALSE, not row numbers. Try this: errors <- subset(flight, subset = year > 2006) Peter Ehlers
Eric Archer wrote on 20 Mar 2006 19:46:44 MET:> I'm getting some unexpected behavior with subsetting a data > frame (aircraft flight data) that I can't sort out. Here is a > simplified version of my data frame and problem: > > > flight > FlightID TailNo FlightDate HobbsTime FlightCost Date year > 1 4497 6009K <NA> 2.2 330.0 <NA> NA > 2 4498 6009K <NA> 0.8 120.0 <NA> NA > 3 4499 6009K <NA> 0.9 135.0 <NA> NA > 4 4500 6009K <NA> 1.1 165.0 <NA> NA > 5 4501 6009K <NA> 1.5 225.0 <NA> NA > 2587 7083 9206N 4/8/2009 1.5 103.5 2009-04-08 2009 > 2588 7084 9206N 4/10/2009 1.3 89.7 2009-04-10 2009 > 2589 7085 9206N 4/11/2009 1.9 131.1 2009-04-11 2009 > 2590 7086 9206N 4/12/2009 1.3 89.7 2009-04-12 2009 > 2591 7087 9206N 4/15/2009 1.1 75.9 2009-04-15 2009 > 29793 35208 91630 1/21/2006 1.4 107.8 2006-01-21 2006 > 29794 35209 91630 1/21/2006 0.7 53.9 2006-01-21 2006 > 29795 35210 9725B 1/21/2006 1.4 138.6 2006-01-21 2006 > 29796 35212 91630 1/28/2006 1.0 77.0 2006-01-28 2006 > 29797 35213 91630 1/28/2006 1.6 123.2 2006-01-28 2006 > 29798 35214 3386E 1/5/2006 1.1 86.9 2006-01-05 2006 > > I then try to extract the error years :flight <- flight[complete.cases(flight),]# <- delete rows with NaNs> errors <- flight[flight$year > 2006,] > errorsFlightID TailNo FlightDate HobbsTime FlightCost Date year 2587 7083 9206N 4/8/2009 1.5 103.5 2009-04-08 08:00:00 2009 2588 7084 9206N 4/10/2009 1.3 89.7 2009-04-10 08:00:00 2009 2589 7085 9206N 4/11/2009 1.9 131.1 2009-04-11 08:00:00 2009 2590 7086 9206N 4/12/2009 1.3 89.7 2009-04-12 08:00:00 2009 2591 7087 9206N 4/15/2009 1.1 75.9 2009-04-15 08:00:00 2009 HTH Patrick -- Geld ist besser als Armut - wenn auch nur aus finanziellen Gr?nden. [Woody Allen]