thr3ads.net - R help - [R] subset and na.rm not really suppressing <NA> values [Jan 2014]

If this information is useful, please help other people find it:
Share via:

Jeff Johnson

2014-Jan-22 23:58 UTC

[R] subset and na.rm not really suppressing <NA> values

I have a dataset "mydf" with a field EMAIL_ADDRESS. When importing, I
specified:
mydf <- read.csv(file = extract, header = TRUE, stringsAsFactors = FALSE,
na.strings=c("NA",""))

I've also tried setting na.strings=
c("NA","","<NA>") but I don't know if
it's appropriate to put <NA> there.

I'm running
a <- subset(mydf, VALID_EMAIL == FALSE, na.rm = TRUE, select EMAIL_ADDRESS)
dput(head(a,5))

structure(list(EMAIL_ADDRESS = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_)), .Names =
"EMAIL_ADDRESS",
row.names = c(17L,
22L, 23L, 24L, 30L), class = "data.frame")

The results show a lot of <NA> values on screen and in the dput statement.

I don't quite understand why it is doing that. I would have expected it to
exclude those since I had the na.rm = TRUE statement. Do you have any
suggestions?

Thanks!
-- 
Jeff

	[[alternative HTML version deleted]]

Jeff Newmiller

2014-Jan-23 02:59 UTC

head link

[R] subset and na.rm not really suppressing <NA> values

I don't think na.rm is a valid at parameter for the subset function. I would
normally use the is.na function to logically test for NA values. I also
don't know where your VALID_EMAIL variable is coming from.

a <- subset(mydf, !is.na(EMAIL_ADDRESS))

The na.strings argument to read.csv and friends is used to help recognise
strings in the input that should be treated as NA. If you don't see
"<NA>" in your input file then it will have no effect on the
data import.

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

Jeff Johnson <mrjefftoyou at gmail.com> wrote:>I have a dataset "mydf" with a field EMAIL_ADDRESS. When
importing, I
>specified:
>mydf <- read.csv(file = extract, header = TRUE, stringsAsFactors
>FALSE,
>na.strings=c("NA",""))
>
>I've also tried setting na.strings=
c("NA","","<NA>") but I don't know
>if
>it's appropriate to put <NA> there.
>
>I'm running
>a <- subset(mydf, VALID_EMAIL == FALSE, na.rm = TRUE, select
>EMAIL_ADDRESS)
>dput(head(a,5))
>
>structure(list(EMAIL_ADDRESS = c(NA_character_, NA_character_,
>NA_character_, NA_character_, NA_character_)), .Names
>"EMAIL_ADDRESS",
>row.names = c(17L,
>22L, 23L, 24L, 30L), class = "data.frame")
>
>The results show a lot of <NA> values on screen and in the dput
>statement.
>
>I don't quite understand why it is doing that. I would have expected it
>to
>exclude those since I had the na.rm = TRUE statement. Do you have any
>suggestions?
>
>Thanks!

peter dalgaard

2014-Jan-24 10:16 UTC

head link

[R] subset and na.rm not really suppressing <NA> values

subset.data.frame() does not have an na.rm argument!

-pd 

On 23 Jan 2014, at 00:58 , Jeff Johnson <mrjefftoyou at gmail.com> wrote:
> I have a dataset "mydf" with a field EMAIL_ADDRESS. When
importing, I
> specified:
> mydf <- read.csv(file = extract, header = TRUE, stringsAsFactors =
FALSE,
> na.strings=c("NA",""))
> 
> I've also tried setting na.strings=
c("NA","","<NA>") but I don't know if
> it's appropriate to put <NA> there.
> 
> I'm running
> a <- subset(mydf, VALID_EMAIL == FALSE, na.rm = TRUE, select >
EMAIL_ADDRESS)
> dput(head(a,5))
> 
> structure(list(EMAIL_ADDRESS = c(NA_character_, NA_character_,
> NA_character_, NA_character_, NA_character_)), .Names =
"EMAIL_ADDRESS",
> row.names = c(17L,
> 22L, 23L, 24L, 30L), class = "data.frame")
> 
> The results show a lot of <NA> values on screen and in the dput
statement.
> 
> I don't quite understand why it is doing that. I would have expected it
to
> exclude those since I had the na.rm = TRUE statement. Do you have any
> suggestions?
> 
> Thanks!
> -- 
> Jeff
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

R help - Jan 2014 - subset and na.rm not really suppressing <NA> values

[R] subset and na.rm not really suppressing <NA> values

[R] subset and na.rm not really suppressing <NA> values

[R] subset and na.rm not really suppressing <NA> values