thr3ads.net - R help - [R] na.omit - Is it working properly? [May 2011]

If this information is useful, please help other people find it:
Share via:

Kalicin, Sarah

2011-May-03 19:18 UTC

[R] na.omit - Is it working properly?

I have a work around for this, but can someone explain why the first example
does not work properly? I believed it worked in the previous version of R, by
selecting just the rows=200525 and omitting the na's. I just upgraded to
2.13. I am also concern with the row numbers being different in the selections,
should I be worried? FYI, I just selected the first few rows for demonstration,
please do not worry that the number of rows shown are not equal. - Sarah

With na.omit around the column, but it is showing other values in the F.WW
column other than 200525, along with NA.  I was hoping that this would omit all
the NA's, and show all the rows that P$F.WW=200525. I believe it did with
the previous version of R.
P[na.omit(P$F.WW)==200525, c(51, 52)]
          F.WW        R.WW
45      200525          NA
53          NA          NA
61      200534      200534
63      200608      200608
66      200522      200541
80          NA          NA
150     200521      200516
231     200530      200530

No na.omit, the F.WW=200525 seems to work, but lots of NA included. This is what
is expected!! The row numbers are not the same as the above example, except the
first row.> P[P$F.WW==200525, c(51, 52)]            F.WW     R.WW
45        200525          NA
NA            NA          NA
NA.1          NA          NA
NA.2          NA          NA
NA.3          NA          NA
57        200525      200526
65        200525          NA
67        200525          NA
70        200525      200525
NA.4          NA          NA
NA.5          NA          NA
86        200525          NA

Na.omit excludes the na's. This is what I want. The concern I have is why
the row numbers do not match any of those shown in the examples
above.> na.omit(P[P$F.WW==200525, c(51, 52)])        F.WW        R.WW
57    200525      200526
70    200525      200525
161   200525      200525
245   200525      200525
246   200525      200525
247   200525      200526
256   200525      200525
266   200525      200525
269   200525      200525
271   200525      200526
276   200525      200526
278   200525      200526

	[[alternative HTML version deleted]]

Andrew Robinson

2011-May-03 23:30 UTC

head link

[R] na.omit - Is it working properly?

Hi Sarah,

I'm not sure that I understand your problem.  You have shown us three
ways to try to omit missing values, and one of them seems to work.
But you're concerned because some aspect of it doesn't match the ones
that don't work?  But they don't work!  

I wonder if you could send an example in commented, minimal,
self-contained, reproducible code ...

Cheers

Andrew

On Tue, May 03, 2011 at 12:18:03PM -0700, Kalicin, Sarah
wrote:> 
> I have a work around for this, but can someone explain why the first
example does not work properly? I believed it worked in the previous version of
R, by selecting just the rows=200525 and omitting the na's. I just upgraded
to 2.13. I am also concern with the row numbers being different in the
selections, should I be worried? FYI, I just selected the first few rows for
demonstration, please do not worry that the number of rows shown are not equal.
- Sarah
> 
> With na.omit around the column, but it is showing other values in the F.WW
column other than 200525, along with NA.  I was hoping that this would omit all
the NA's, and show all the rows that P$F.WW=200525. I believe it did with
the previous version of R.
> P[na.omit(P$F.WW)==200525, c(51, 52)]
>           F.WW        R.WW
> 45      200525          NA
> 53          NA          NA
> 61      200534      200534
> 63      200608      200608
> 66      200522      200541
> 80          NA          NA
> 150     200521      200516
> 231     200530      200530
> 
> No na.omit, the F.WW=200525 seems to work, but lots of NA included. This is
what is expected!! The row numbers are not the same as the above example, except
the first row.
> > P[P$F.WW==200525, c(51, 52)]
>             F.WW     R.WW
> 45        200525          NA
> NA            NA          NA
> NA.1          NA          NA
> NA.2          NA          NA
> NA.3          NA          NA
> 57        200525      200526
> 65        200525          NA
> 67        200525          NA
> 70        200525      200525
> NA.4          NA          NA
> NA.5          NA          NA
> 86        200525          NA
> 
> Na.omit excludes the na's. This is what I want. The concern I have is
why the row numbers do not match any of those shown in the examples above.
> > na.omit(P[P$F.WW==200525, c(51, 52)])
>         F.WW        R.WW
> 57    200525      200526
> 70    200525      200525
> 161   200525      200525
> 245   200525      200525
> 246   200525      200525
> 247   200525      200526
> 256   200525      200525
> 266   200525      200525
> 269   200525      200525
> 271   200525      200526
> 276   200525      200526
> 278   200525      200526
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Andrew Robinson  
Program Manager, ACERA 
Department of Mathematics and Statistics            Tel: +61-3-8344-6410
University of Melbourne, VIC 3010 Australia               (prefer email)
http://www.ms.unimelb.edu.au/~andrewpr              Fax: +61-3-8344-4599
http://www.acera.unimelb.edu.au/

Forest Analytics with R (Springer, 2011) 
http://www.ms.unimelb.edu.au/FAwR/
Introduction to Scientific Programming and Simulation using R (CRC, 2009): 
http://www.ms.unimelb.edu.au/spuRs/

P Ehlers

2011-May-04 02:06 UTC

head link

[R] na.omit - Is it working properly?

Kalicin, Sarah wrote:
\begin{quote}
I have a work around for this, but can someone explain
why the first example does not work properly?
I believed it worked in the previous version of R,
by selecting just the rows=200525 and omitting the na's.
\end{quote}

You can prove this statement by providing reproducible
code that we can test.

Peter Ehlers

I just upgraded to 2.13. I am also concern with the row numbers being 
different in the selections, should I be worried? FYI, I just selected 
the first few rows for demonstration, please do not worry that the 
number of rows shown are not equal. - Sarah> 
> With na.omit around the column, but it is showing other values in the F.WW
column other than 200525, along with NA.  I was hoping that this would omit all
the NA's, and show all the rows that P$F.WW=200525. I believe it did with
the previous version of R.
> P[na.omit(P$F.WW)==200525, c(51, 52)]
>           F.WW        R.WW
> 45      200525          NA
> 53          NA          NA
> 61      200534      200534
> 63      200608      200608
> 66      200522      200541
> 80          NA          NA
> 150     200521      200516
> 231     200530      200530
> 
> No na.omit, the F.WW=200525 seems to work, but lots of NA included. This is
what is expected!! The row numbers are not the same as the above example, except
the first row.
>> P[P$F.WW==200525, c(51, 52)]
>             F.WW     R.WW
> 45        200525          NA
> NA            NA          NA
> NA.1          NA          NA
> NA.2          NA          NA
> NA.3          NA          NA
> 57        200525      200526
> 65        200525          NA
> 67        200525          NA
> 70        200525      200525
> NA.4          NA          NA
> NA.5          NA          NA
> 86        200525          NA
> 
> Na.omit excludes the na's. This is what I want. The concern I have is
why the row numbers do not match any of those shown in the examples above.
>> na.omit(P[P$F.WW==200525, c(51, 52)])
>         F.WW        R.WW
> 57    200525      200526
> 70    200525      200525
> 161   200525      200525
> 245   200525      200525
> 246   200525      200525
> 247   200525      200526
> 256   200525      200525
> 266   200525      200525
> 269   200525      200525
> 271   200525      200526
> 276   200525      200526
> 278   200525      200526
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

peter dalgaard

2011-May-04 06:02 UTC

head link

[R] na.omit - Is it working properly?

On May 3, 2011, at 21:18 , Kalicin, Sarah wrote:
> 
> I have a work around for this, but can someone explain why the first
example does not work properly? I believed it worked in the previous version of
R, by selecting just the rows=200525 and omitting the na's. I just upgraded
to 2.13. I am also concern with the row numbers being different in the
selections, should I be worried? FYI, I just selected the first few rows for
demonstration, please do not worry that the number of rows shown are not equal.
- Sarah
> 
> With na.omit around the column, but it is showing other values in the F.WW
column other than 200525, along with NA.  I was hoping that this would omit all
the NA's, and show all the rows that P$F.WW=200525. I believe it did with
the previous version of R.
That's highly unlikely. na.omit(P$WW) has fewer elements than there are rows
in P so you get vector recycling in the style of
> thuesen[c(F,F,F,F,T),]   blood.glucose short.velocity
5            7.2           1.27
10          12.2           1.22
15           6.7           1.52
20          16.1           1.05

(now why don't we get the usual warning about "not a multiple of"
in this case?)

Worse, if you omit observations prior to comparison, the result won't line
up. E.g. in the thuesen data, obs.
> thuesen[na.omit(thuesen$short.velocity)==1.12,]   blood.glucose short.velocity
16           8.6             NA
22           4.9           1.03

whereas in fact 
> subset(thuesen, short.velocity==1.12)   blood.glucose short.velocity
17           4.2           1.12
23           8.8           1.12


> P[na.omit(P$F.WW)==200525, c(51, 52)]
>          F.WW        R.WW
> 45      200525          NA
> 53          NA          NA
> 61      200534      200534
> 63      200608      200608
> 66      200522      200541
> 80          NA          NA
> 150     200521      200516
> 231     200530      200530
> 
> No na.omit, the F.WW=200525 seems to work, but lots of NA included. This is
what is expected!! The row numbers are not the same as the above example, except
the first row.
>> P[P$F.WW==200525, c(51, 52)]
>            F.WW     R.WW
> 45        200525          NA
> NA            NA          NA
> NA.1          NA          NA
> NA.2          NA          NA
> NA.3          NA          NA
> 57        200525      200526
> 65        200525          NA
> 67        200525          NA
> 70        200525      200525
> NA.4          NA          NA
> NA.5          NA          NA
> 86        200525          NA
Presumably, a number of rows got omitted here? The NA's are a bit of a pain,
but that's the way things work: If there is an observation that you
don't know whether to include, you get an NA filled row.
> thuesen[thuesen$short.velocity==1.12,]   blood.glucose short.velocity
NA            NA             NA
17           4.2           1.12
23           8.8           1.12

To avoid this, you explicitly test for NA using is.na() or use subset() which
does it internally.
> 
> Na.omit excludes the na's. This is what I want. The concern I have is
why the row numbers do not match any of those shown in the examples above.
>> na.omit(P[P$F.WW==200525, c(51, 52)])
>        F.WW        R.WW
> 57    200525      200526
> 70    200525      200525
> 161   200525      200525
> 245   200525      200525
> 246   200525      200525
> 247   200525      200526
> 256   200525      200525
> 266   200525      200525
> 269   200525      200525
> 271   200525      200526
> 276   200525      200526
> 278   200525      200526
> 
Well, now you remove rows with NA _anywhere_, so e.g. row #65 is out because
R.WW is missing. I expect #161 and higher was just chopped from the earlier
list.

In short, nothing out of the ordinary seems to be going on here.


-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Possibly Parallel Threads

Search for more seemingly similar threads

R help - May 2011 - na.omit - Is it working properly?

[R] na.omit - Is it working properly?

[R] na.omit - Is it working properly?

[R] na.omit - Is it working properly?

[R] na.omit - Is it working properly?

Possibly Parallel Threads