I have a work around for this, but can someone explain why the first example does not work properly? I believed it worked in the previous version of R, by selecting just the rows=200525 and omitting the na's. I just upgraded to 2.13. I am also concern with the row numbers being different in the selections, should I be worried? FYI, I just selected the first few rows for demonstration, please do not worry that the number of rows shown are not equal. - Sarah With na.omit around the column, but it is showing other values in the F.WW column other than 200525, along with NA. I was hoping that this would omit all the NA's, and show all the rows that P$F.WW=200525. I believe it did with the previous version of R. P[na.omit(P$F.WW)==200525, c(51, 52)] F.WW R.WW 45 200525 NA 53 NA NA 61 200534 200534 63 200608 200608 66 200522 200541 80 NA NA 150 200521 200516 231 200530 200530 No na.omit, the F.WW=200525 seems to work, but lots of NA included. This is what is expected!! The row numbers are not the same as the above example, except the first row.> P[P$F.WW==200525, c(51, 52)]F.WW R.WW 45 200525 NA NA NA NA NA.1 NA NA NA.2 NA NA NA.3 NA NA 57 200525 200526 65 200525 NA 67 200525 NA 70 200525 200525 NA.4 NA NA NA.5 NA NA 86 200525 NA Na.omit excludes the na's. This is what I want. The concern I have is why the row numbers do not match any of those shown in the examples above.> na.omit(P[P$F.WW==200525, c(51, 52)])F.WW R.WW 57 200525 200526 70 200525 200525 161 200525 200525 245 200525 200525 246 200525 200525 247 200525 200526 256 200525 200525 266 200525 200525 269 200525 200525 271 200525 200526 276 200525 200526 278 200525 200526 [[alternative HTML version deleted]]
Hi Sarah, I'm not sure that I understand your problem. You have shown us three ways to try to omit missing values, and one of them seems to work. But you're concerned because some aspect of it doesn't match the ones that don't work? But they don't work! I wonder if you could send an example in commented, minimal, self-contained, reproducible code ... Cheers Andrew On Tue, May 03, 2011 at 12:18:03PM -0700, Kalicin, Sarah wrote:> > I have a work around for this, but can someone explain why the first example does not work properly? I believed it worked in the previous version of R, by selecting just the rows=200525 and omitting the na's. I just upgraded to 2.13. I am also concern with the row numbers being different in the selections, should I be worried? FYI, I just selected the first few rows for demonstration, please do not worry that the number of rows shown are not equal. - Sarah > > With na.omit around the column, but it is showing other values in the F.WW column other than 200525, along with NA. I was hoping that this would omit all the NA's, and show all the rows that P$F.WW=200525. I believe it did with the previous version of R. > P[na.omit(P$F.WW)==200525, c(51, 52)] > F.WW R.WW > 45 200525 NA > 53 NA NA > 61 200534 200534 > 63 200608 200608 > 66 200522 200541 > 80 NA NA > 150 200521 200516 > 231 200530 200530 > > No na.omit, the F.WW=200525 seems to work, but lots of NA included. This is what is expected!! The row numbers are not the same as the above example, except the first row. > > P[P$F.WW==200525, c(51, 52)] > F.WW R.WW > 45 200525 NA > NA NA NA > NA.1 NA NA > NA.2 NA NA > NA.3 NA NA > 57 200525 200526 > 65 200525 NA > 67 200525 NA > 70 200525 200525 > NA.4 NA NA > NA.5 NA NA > 86 200525 NA > > Na.omit excludes the na's. This is what I want. The concern I have is why the row numbers do not match any of those shown in the examples above. > > na.omit(P[P$F.WW==200525, c(51, 52)]) > F.WW R.WW > 57 200525 200526 > 70 200525 200525 > 161 200525 200525 > 245 200525 200525 > 246 200525 200525 > 247 200525 200526 > 256 200525 200525 > 266 200525 200525 > 269 200525 200525 > 271 200525 200526 > 276 200525 200526 > 278 200525 200526 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/
Kalicin, Sarah wrote: \begin{quote} I have a work around for this, but can someone explain why the first example does not work properly? I believed it worked in the previous version of R, by selecting just the rows=200525 and omitting the na's. \end{quote} You can prove this statement by providing reproducible code that we can test. Peter Ehlers I just upgraded to 2.13. I am also concern with the row numbers being different in the selections, should I be worried? FYI, I just selected the first few rows for demonstration, please do not worry that the number of rows shown are not equal. - Sarah> > With na.omit around the column, but it is showing other values in the F.WW column other than 200525, along with NA. I was hoping that this would omit all the NA's, and show all the rows that P$F.WW=200525. I believe it did with the previous version of R. > P[na.omit(P$F.WW)==200525, c(51, 52)] > F.WW R.WW > 45 200525 NA > 53 NA NA > 61 200534 200534 > 63 200608 200608 > 66 200522 200541 > 80 NA NA > 150 200521 200516 > 231 200530 200530 > > No na.omit, the F.WW=200525 seems to work, but lots of NA included. This is what is expected!! The row numbers are not the same as the above example, except the first row. >> P[P$F.WW==200525, c(51, 52)] > F.WW R.WW > 45 200525 NA > NA NA NA > NA.1 NA NA > NA.2 NA NA > NA.3 NA NA > 57 200525 200526 > 65 200525 NA > 67 200525 NA > 70 200525 200525 > NA.4 NA NA > NA.5 NA NA > 86 200525 NA > > Na.omit excludes the na's. This is what I want. The concern I have is why the row numbers do not match any of those shown in the examples above. >> na.omit(P[P$F.WW==200525, c(51, 52)]) > F.WW R.WW > 57 200525 200526 > 70 200525 200525 > 161 200525 200525 > 245 200525 200525 > 246 200525 200525 > 247 200525 200526 > 256 200525 200525 > 266 200525 200525 > 269 200525 200525 > 271 200525 200526 > 276 200525 200526 > 278 200525 200526 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On May 3, 2011, at 21:18 , Kalicin, Sarah wrote:> > I have a work around for this, but can someone explain why the first example does not work properly? I believed it worked in the previous version of R, by selecting just the rows=200525 and omitting the na's. I just upgraded to 2.13. I am also concern with the row numbers being different in the selections, should I be worried? FYI, I just selected the first few rows for demonstration, please do not worry that the number of rows shown are not equal. - Sarah > > With na.omit around the column, but it is showing other values in the F.WW column other than 200525, along with NA. I was hoping that this would omit all the NA's, and show all the rows that P$F.WW=200525. I believe it did with the previous version of R.That's highly unlikely. na.omit(P$WW) has fewer elements than there are rows in P so you get vector recycling in the style of> thuesen[c(F,F,F,F,T),]blood.glucose short.velocity 5 7.2 1.27 10 12.2 1.22 15 6.7 1.52 20 16.1 1.05 (now why don't we get the usual warning about "not a multiple of" in this case?) Worse, if you omit observations prior to comparison, the result won't line up. E.g. in the thuesen data, obs.> thuesen[na.omit(thuesen$short.velocity)==1.12,]blood.glucose short.velocity 16 8.6 NA 22 4.9 1.03 whereas in fact> subset(thuesen, short.velocity==1.12)blood.glucose short.velocity 17 4.2 1.12 23 8.8 1.12> P[na.omit(P$F.WW)==200525, c(51, 52)] > F.WW R.WW > 45 200525 NA > 53 NA NA > 61 200534 200534 > 63 200608 200608 > 66 200522 200541 > 80 NA NA > 150 200521 200516 > 231 200530 200530 > > No na.omit, the F.WW=200525 seems to work, but lots of NA included. This is what is expected!! The row numbers are not the same as the above example, except the first row. >> P[P$F.WW==200525, c(51, 52)] > F.WW R.WW > 45 200525 NA > NA NA NA > NA.1 NA NA > NA.2 NA NA > NA.3 NA NA > 57 200525 200526 > 65 200525 NA > 67 200525 NA > 70 200525 200525 > NA.4 NA NA > NA.5 NA NA > 86 200525 NAPresumably, a number of rows got omitted here? The NA's are a bit of a pain, but that's the way things work: If there is an observation that you don't know whether to include, you get an NA filled row.> thuesen[thuesen$short.velocity==1.12,]blood.glucose short.velocity NA NA NA 17 4.2 1.12 23 8.8 1.12 To avoid this, you explicitly test for NA using is.na() or use subset() which does it internally.> > Na.omit excludes the na's. This is what I want. The concern I have is why the row numbers do not match any of those shown in the examples above. >> na.omit(P[P$F.WW==200525, c(51, 52)]) > F.WW R.WW > 57 200525 200526 > 70 200525 200525 > 161 200525 200525 > 245 200525 200525 > 246 200525 200525 > 247 200525 200526 > 256 200525 200525 > 266 200525 200525 > 269 200525 200525 > 271 200525 200526 > 276 200525 200526 > 278 200525 200526 >Well, now you remove rows with NA _anywhere_, so e.g. row #65 is out because R.WW is missing. I expect #161 and higher was just chopped from the earlier list. In short, nothing out of the ordinary seems to be going on here. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com