Does anyone know, or know documentation that describes, how to declare multiple values in R as missing that does not involve coding them as NA? I wish to be able to treate values as missing, while still retaining codes that describe the reason for the value being missing. Thanks John MAcInnes -- Professor John MacInnes Sociology, School of Social and Political Studies, No 8 Buccleuch Place University of Edinburgh Edinburgh EH8 9LN +44 (0)131 651 3867 Centre d'Estudis Demogrà fics Universitat Autònoma de Barcelona Edifici E-2 08193 Bellaterra (Barcelona) Spain +34 93 581 3060 "The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336." [[alternative HTML version deleted]]
NA, Inf, -Inf, NaN would give you 4 possibilities and is.finite would check if its any of them:> x <- c(1, NA, 2, Inf, 3, -Inf, 4, NaN, 5) > is.finite(x)[1] TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE You might need to map them all to NA before using it with various functions depending on how the functions deal with these values. Other possibilities are to have an attribute with a factor defining the type of each NA. x <- c(1, NA, 2, NA, 3, NA) attr(x, "type.of.na") <- factor(c("A", "B", "A")) and depending on how much work you are prepared to do you could define a new R class that handles objects with such an attribute. On Sun, Feb 14, 2010 at 9:33 AM, John <john.macinnes at ed.ac.uk> wrote:> Does anyone know, or know documentation that describes, how to declare > multiple values in R as missing that does not involve coding them as NA? I > wish to be able to treate values as missing, while still retaining codes > that describe the reason for the value being missing. > > Thanks > > John MAcInnes > > > -- > Professor John MacInnes > Sociology, > School of Social and Political Studies, > No 8 Buccleuch Place > University of Edinburgh > Edinburgh EH8 9LN > +44 (0)131 651 3867 > > Centre d'Estudis Demogr?fics > Universitat Aut?noma de Barcelona > Edifici E-2 > 08193 Bellaterra (Barcelona) > Spain > +34 93 581 3060 > "The University of Edinburgh is a charitable body, registered in Scotland, > with registration number SC005336." > > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >
I can think of a few solutions, none perfect. * You could have a master dataset that has the missing value codes you want, and a dataset that you use which is a copy of it with real NA's in it. * You could add an attribute that gives the types of missing values in the various positions. The downside is that attributes tend to disappear with subsetting. * If you only have two types, you might be able to get away with using NaN as the second type of NA. On 14/02/2010 14:33, John wrote:> Does anyone know, or know documentation that describes, how to declare > multiple values in R as missing that does not involve coding them as NA? I > wish to be able to treate values as missing, while still retaining codes > that describe the reason for the value being missing. > > Thanks > > John MAcInnes > > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Patrick Burns pburns at pburns.seanet.com http://www.burns-stat.com (home of 'The R Inferno' and 'A Guide for the Unwilling S User')
John wrote:> ... > Does anyone know, or know documentation that describes, how to declare > multiple values in R as missing that does not involve coding them as NA? I > wish to be able to treate values as missing, while still retaining codes > that describe the reason for the value being missing.I would suggest leaving the "missing values" as is in your data file and recoding these to NA at the top of each analysis script you run. I find that the only place I usually make use of such information is in the initial descriptives, although you may want to selectively recode for different analyses. Jim