thr3ads.net - R help - [R] Consistant test for NAs in a factor when exclude = NULL? [Oct 2011]

If this information is useful, please help other people find it:
Share via:

andrewH

2011-Oct-27 00:32 UTC

[R] Consistant test for NAs in a factor when exclude = NULL?

Dear folks?

Is there a function to correctly find (and count) the NAs in a factor when
exclude=NULL, regardless of whether their origin is in the original data or
by subsequent assignment?

In example number 1 below, where NAs are assigned by is.na()<-, testing the
factor with is.na() finds the correct number of NAs.  In example number 2,
where the NAs are from the data, neither is.na(), ==NA, nor =="NA"
correctly
identifies the NAs.  In example number 3, which mixes NAs from assignment
with NAs from data, is.na does not even find the NAs created by assignment,
as it did in example 1.

I'm running R 2.13.2 on Windows XP with ServicePack 3

Any assistance would be greatly appreciated.

Appreciatively, andrewH


Example #1
> # Origin: is.na()<-  Exclude: NULL
> KK <-
factor(c("A","A","B","B","C","C"),
exclude=NULL)
> KK[KK=="C"][1] C C
Levels: A B C> is.na(KK[KK=="C"]) <- TRUE
> KK[1] A    A    B    B    <NA> <NA>
Levels: A B C> levels(KK)
[1] "A" "B" "C"> levels(KK)[KK][1] "A" "A" "B" "B" NA  NA
> KK==NA
[1] NA NA NA NA NA NA> sum(KK==NA)
[1] NA> KK=="NA"
[1] FALSE FALSE FALSE FALSE    NA    NA> sum(KK=="NA")
[1] NA> is.na(KK)
[1] FALSE FALSE FALSE FALSE  TRUE  TRUE> sum(is.na(KK))[1] 2

Example #2
> # Origin: data Exclude: NULL
> GG <- factor(c("A","A","B","B",
NA, NA), exclude=NULL)
> GG[1] A    A    B    B    <NA> <NA>
Levels: A B <NA>> levels(GG)
[1] "A" "B" NA > levels(GG)[GG][1] "A" "A" "B" "B" NA  NA
> GG==NA
[1] NA NA NA NA NA NA> sum(GG==NA)
[1] NA> GG=="NA"
[1] FALSE FALSE FALSE FALSE FALSE FALSE> sum(GG=="NA")
[1] 0> is.na(GG)
[1] FALSE FALSE FALSE FALSE FALSE FALSE> sum(is.na(GG))
Example #3.
> MM <-
factor(c("A","A","B","B","C","C",
NA), exclude=NULL)
> is.na(MM[MM=="C"]) <- TRUE
> MM[1] A    A    B    B    <NA> <NA> <NA>
Levels: A B C <NA>> levels(MM)
[1] "A" "B" "C" NA > levels(MM)[MM][1] "A" "A" "B" "B" NA  NA  NA
> MM==NA
[1] NA NA NA NA NA NA NA> sum(MM==NA)
[1] NA> MM=="NA"
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE> sum(MM=="NA")
[1] 0> is.na(MM)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE> sum(is.na(MM))[1] 0

--
View this message in context:
http://r.789695.n4.nabble.com/Consistant-test-for-NAs-in-a-factor-when-exclude-NULL-tp3942755p3942755.html
Sent from the R help mailing list archive at Nabble.com.

Jeff Newmiller

2011-Oct-27 02:39 UTC

head link

[R] Consistant test for NAs in a factor when exclude = NULL?

There is a difference between the levels of a factor and the values in the
vector. If you make NA one of the levels then it will use an integer to
represent that level in the data just like any other level. At that point it
seems to me that you can do meta analysis on the existence of NA in the original
data, but the data in your working vector no longer really contains NA.

For my data analysis needs, I would stay away from exclude=NULL entirely, but
someone else might offer a good justification for using it. I would imagine that
avoiding mixing the actual data and your meta-analysis data (with exclude=NULL)
would be advisable in such a case, and that would have the side benefit of
eliminating the concerns you have raised.
If your goal is to eliminate unused levels, I usually convert to character and
then back to a factor to accomplish that, which works fine with NAs.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

andrewH <ahoerner@rprogress.org> wrote:

Dear folks?

Is there a function to correctly find (and count) the NAs in a factor when
exclude=NULL, regardless of whether their origin is in the original data or
by subsequent assignment?

In example number 1 below, where NAs are assigned by is.na()<-, testing the
factor with is.na() finds the correct number of NAs. In example number 2,
where the NAs are from the data, neither is.na(), ==NA, nor =="NA"
correctly
identifies the NAs. In example number 3, which mixes NAs from assignment
with NAs from data, is.na does not even find the NAs created by assignment,
as it did in example 1.

I'm running R 2.13.2 on Windows XP with ServicePack 3

Any assistance would be greatly appreciated.

Appreciatively, andrewH

Example #1
> # Origin: is.na()<- Exclude: NULL
> KK <-
factor(c("A","A","B","B","C","C"),
exclude=NULL)
> KK[KK=="C"][1] C C
Levels: A B C> is.na(KK[KK=="C"]) <- TRUE
> KK[1] A A B B <NA> <NA>
Levels: A B C> levels(KK)
[1] "A" "B" "C"> levels(KK)[KK][1] "A" "A" "B" "B" NA NA
> KK==NA
[1] NA NA NA NA NA NA> sum(KK==NA)
[1] NA> KK=="NA"
[1] FALSE FALSE FALSE FALSE NA NA> sum(KK=="NA")
[1] NA> is.na(KK)
[1] FALSE FALSE FALSE FALSE TRUE TRUE> sum(is.na(KK))[1] 2

Example #2
> # Origin: data Exclude: NULL
> GG <- factor(c("A","A","B","B",
NA, NA), exclude=NULL)
> GG[1] A A B B <NA> <NA>
Levels: A B <NA>> levels(GG)
[1] "A" "B" NA > levels(GG)[GG][1] "A" "A" "B" "B" NA NA
> GG==NA
[1] NA NA NA NA NA NA> sum(GG==NA)
[1] NA> GG=="NA"
[1] FALSE FALSE FALSE FALSE FALSE FALSE> sum(GG=="NA")
[1] 0> is.na(GG)
[1] FALSE FALSE FALSE FALSE FALSE FALSE> sum(is.na(GG))
Example #3.
> MM <-
factor(c("A","A","B","B","C","C",
NA), exclude=NULL)
> is.na(MM[MM=="C"]) <- TRUE
> MM[1] A A B B <NA> <NA> <NA>
Levels: A B C <NA>> levels(MM)
[1] "A" "B" "C" NA > levels(MM)[MM][1] "A" "A" "B" "B" NA NA NA
> MM==NA
[1] NA NA NA NA NA NA NA> sum(MM==NA)
[1] NA> MM=="NA"
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE> sum(MM=="NA")
[1] 0> is.na(MM)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE> sum(is.na(MM))[1] 0

--
View this message in context:
http://r.789695.n4.nabble.com/Consistant-test-for-NAs-in-a-factor-when-exclude-NULL-tp3942755p3942755.html
Sent from the R help mailing list archive at Nabble.com.

_____________________________________________

R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]

andrewH

2011-Oct-27 04:21 UTC

head link

[R] Consistant test for NAs in a factor when exclude = NULL?

Thanks Jeff! I appreciate you sharing your experience.

My data set is survey data, 13,209 records over nine years, collected by
someone else, converted from SPSS format. It includes missing values,
identified however SPSS does so, and translated to NAs by the import
process. It also includes values along the lines of "none of your
business"
or "beats me" that are missing so far as I am concerned. I have
assigned NAs
to these values.  Now I am trying to figure out some things about where
these missing values are -- whether they are disproportionately located in
any period or group.  I have been trying to get counts for subsets, but I
have not been able to make the subset counts add up to the total counts that
I get from, e.g. summary.  

So I wrote these simplified versions, and even for the simplest examples, I
could not find a function that correctly identified the NAs that I knew were
there because I put them there myself. That is why I am looking for help.
Does this make sense?

Warmest regards, andrewH


--
View this message in context:
http://r.789695.n4.nabble.com/Consistant-test-for-NAs-in-a-factor-when-exclude-NULL-tp3942755p3943157.html
Sent from the R help mailing list archive at Nabble.com.

Maybe Matching Threads

Search for more apparently analagous threads

R help - Oct 2011 - Consistant test for NAs in a factor when exclude = NULL?

[R] Consistant test for NAs in a factor when exclude = NULL?

[R] Consistant test for NAs in a factor when exclude = NULL?

[R] Consistant test for NAs in a factor when exclude = NULL?

Maybe Matching Threads