andrewH
2011-Oct-27 00:32 UTC
[R] Consistant test for NAs in a factor when exclude = NULL?
Dear folks? Is there a function to correctly find (and count) the NAs in a factor when exclude=NULL, regardless of whether their origin is in the original data or by subsequent assignment? In example number 1 below, where NAs are assigned by is.na()<-, testing the factor with is.na() finds the correct number of NAs. In example number 2, where the NAs are from the data, neither is.na(), ==NA, nor =="NA" correctly identifies the NAs. In example number 3, which mixes NAs from assignment with NAs from data, is.na does not even find the NAs created by assignment, as it did in example 1. I'm running R 2.13.2 on Windows XP with ServicePack 3 Any assistance would be greatly appreciated. Appreciatively, andrewH Example #1> # Origin: is.na()<- Exclude: NULL > KK <- factor(c("A","A","B","B","C","C"), exclude=NULL) > KK[KK=="C"][1] C C Levels: A B C> is.na(KK[KK=="C"]) <- TRUE > KK[1] A A B B <NA> <NA> Levels: A B C> levels(KK)[1] "A" "B" "C"> levels(KK)[KK][1] "A" "A" "B" "B" NA NA> KK==NA[1] NA NA NA NA NA NA> sum(KK==NA)[1] NA> KK=="NA"[1] FALSE FALSE FALSE FALSE NA NA> sum(KK=="NA")[1] NA> is.na(KK)[1] FALSE FALSE FALSE FALSE TRUE TRUE> sum(is.na(KK))[1] 2 Example #2> # Origin: data Exclude: NULL > GG <- factor(c("A","A","B","B", NA, NA), exclude=NULL) > GG[1] A A B B <NA> <NA> Levels: A B <NA>> levels(GG)[1] "A" "B" NA> levels(GG)[GG][1] "A" "A" "B" "B" NA NA> GG==NA[1] NA NA NA NA NA NA> sum(GG==NA)[1] NA> GG=="NA"[1] FALSE FALSE FALSE FALSE FALSE FALSE> sum(GG=="NA")[1] 0> is.na(GG)[1] FALSE FALSE FALSE FALSE FALSE FALSE> sum(is.na(GG))Example #3.> MM <- factor(c("A","A","B","B","C","C", NA), exclude=NULL) > is.na(MM[MM=="C"]) <- TRUE > MM[1] A A B B <NA> <NA> <NA> Levels: A B C <NA>> levels(MM)[1] "A" "B" "C" NA> levels(MM)[MM][1] "A" "A" "B" "B" NA NA NA> MM==NA[1] NA NA NA NA NA NA NA> sum(MM==NA)[1] NA> MM=="NA"[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE> sum(MM=="NA")[1] 0> is.na(MM)[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE> sum(is.na(MM))[1] 0 -- View this message in context: http://r.789695.n4.nabble.com/Consistant-test-for-NAs-in-a-factor-when-exclude-NULL-tp3942755p3942755.html Sent from the R help mailing list archive at Nabble.com.
Jeff Newmiller
2011-Oct-27 02:39 UTC
[R] Consistant test for NAs in a factor when exclude = NULL?
There is a difference between the levels of a factor and the values in the vector. If you make NA one of the levels then it will use an integer to represent that level in the data just like any other level. At that point it seems to me that you can do meta analysis on the existence of NA in the original data, but the data in your working vector no longer really contains NA. For my data analysis needs, I would stay away from exclude=NULL entirely, but someone else might offer a good justification for using it. I would imagine that avoiding mixing the actual data and your meta-analysis data (with exclude=NULL) would be advisable in such a case, and that would have the side benefit of eliminating the concerns you have raised. If your goal is to eliminate unused levels, I usually convert to character and then back to a factor to accomplish that, which works fine with NAs. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. andrewH <ahoerner@rprogress.org> wrote: Dear folks? Is there a function to correctly find (and count) the NAs in a factor when exclude=NULL, regardless of whether their origin is in the original data or by subsequent assignment? In example number 1 below, where NAs are assigned by is.na()<-, testing the factor with is.na() finds the correct number of NAs. In example number 2, where the NAs are from the data, neither is.na(), ==NA, nor =="NA" correctly identifies the NAs. In example number 3, which mixes NAs from assignment with NAs from data, is.na does not even find the NAs created by assignment, as it did in example 1. I'm running R 2.13.2 on Windows XP with ServicePack 3 Any assistance would be greatly appreciated. Appreciatively, andrewH Example #1> # Origin: is.na()<- Exclude: NULL > KK <- factor(c("A","A","B","B","C","C"), exclude=NULL) > KK[KK=="C"][1] C C Levels: A B C> is.na(KK[KK=="C"]) <- TRUE > KK[1] A A B B <NA> <NA> Levels: A B C> levels(KK)[1] "A" "B" "C"> levels(KK)[KK][1] "A" "A" "B" "B" NA NA> KK==NA[1] NA NA NA NA NA NA> sum(KK==NA)[1] NA> KK=="NA"[1] FALSE FALSE FALSE FALSE NA NA> sum(KK=="NA")[1] NA> is.na(KK)[1] FALSE FALSE FALSE FALSE TRUE TRUE> sum(is.na(KK))[1] 2 Example #2> # Origin: data Exclude: NULL > GG <- factor(c("A","A","B","B", NA, NA), exclude=NULL) > GG[1] A A B B <NA> <NA> Levels: A B <NA>> levels(GG)[1] "A" "B" NA> levels(GG)[GG][1] "A" "A" "B" "B" NA NA> GG==NA[1] NA NA NA NA NA NA> sum(GG==NA)[1] NA> GG=="NA"[1] FALSE FALSE FALSE FALSE FALSE FALSE> sum(GG=="NA")[1] 0> is.na(GG)[1] FALSE FALSE FALSE FALSE FALSE FALSE> sum(is.na(GG))Example #3.> MM <- factor(c("A","A","B","B","C","C", NA), exclude=NULL) > is.na(MM[MM=="C"]) <- TRUE > MM[1] A A B B <NA> <NA> <NA> Levels: A B C <NA>> levels(MM)[1] "A" "B" "C" NA> levels(MM)[MM][1] "A" "A" "B" "B" NA NA NA> MM==NA[1] NA NA NA NA NA NA NA> sum(MM==NA)[1] NA> MM=="NA"[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE> sum(MM=="NA")[1] 0> is.na(MM)[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE> sum(is.na(MM))[1] 0 -- View this message in context: http://r.789695.n4.nabble.com/Consistant-test-for-NAs-in-a-factor-when-exclude-NULL-tp3942755p3942755.html Sent from the R help mailing list archive at Nabble.com. _____________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
andrewH
2011-Oct-27 04:21 UTC
[R] Consistant test for NAs in a factor when exclude = NULL?
Thanks Jeff! I appreciate you sharing your experience. My data set is survey data, 13,209 records over nine years, collected by someone else, converted from SPSS format. It includes missing values, identified however SPSS does so, and translated to NAs by the import process. It also includes values along the lines of "none of your business" or "beats me" that are missing so far as I am concerned. I have assigned NAs to these values. Now I am trying to figure out some things about where these missing values are -- whether they are disproportionately located in any period or group. I have been trying to get counts for subsets, but I have not been able to make the subset counts add up to the total counts that I get from, e.g. summary. So I wrote these simplified versions, and even for the simplest examples, I could not find a function that correctly identified the NAs that I knew were there because I put them there myself. That is why I am looking for help. Does this make sense? Warmest regards, andrewH -- View this message in context: http://r.789695.n4.nabble.com/Consistant-test-for-NAs-in-a-factor-when-exclude-NULL-tp3942755p3943157.html Sent from the R help mailing list archive at Nabble.com.