On 01-Nov-04 Robert Brown FM CEFAS wrote:> I have a data set of about 10000 records which was compiled from > several smaller data sets using SPSS. During compilation 88 false > records were accidentally introduced which comprise all NA values. I > want to delete these records but not other missing data. The functions > na.exclude and na.omit seem to remove all values of NA? How can I > delete just the relevant NA's? . i.e. I want to delete all records in > the data frame DATA where the field age contains NA valuesHi Robert, It's not quite clear what your "NA" criterion for deletion really is. If (as you state first) the false records "comprise all NA values", this suggests that in such a record every field is "NA". On the other hand you say you "want to delete all records in the data frame DATA where the field age contains NA values", so it looks as though you can check for deletion on the field "age" only. Suppose your dataframe is called DF. In the second case, which is simpler, you can simply do newDF <- DF[!is.na(DF$age),] In the first case, it's fundamentally the same but you have to run the check along every element in each row. So define a function notallna<-function(x){!all(is.na(x))} and then newDF <- DF[apply(DF,1,notallna),] This will leave in every record in which not all fields are"NA", so will include records in which only some fields are "NA". Hoping this helps, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 [NB: New number!] Date: 01-Nov-04 Time: 16:03:07 ------------------------------ XFMail ------------------------------
I have a data set of about 10000 records which was compiled from several smaller data sets using SPSS. During compilation 88 false records were accidentally introduced which comprise all NA values. I want to delete these records but not other missing data. The functions na.exclude and na.omit seem to remove all values of NA? How can I delete just the relevant NA's? . i.e. I want to delete all records in the data frame DATA where the field age contains NA values Regards, Robert Brown
This sort of things are most likely covered in `An Introduction to R': newDATA <- DATA[!is.na(DATA$age),] Andy> From: Robert Brown FM CEFAS > > I have a data set of about 10000 records which was compiled > from several smaller data sets using SPSS. During compilation > 88 false records were accidentally introduced which comprise > all NA values. I want to delete these records but not other > missing data. The functions na.exclude and na.omit seem to > remove all values of NA? How can I delete just the relevant > NA's? . i.e. I want to delete all records in the data frame > DATA where the field age contains NA values > > Regards, > > Robert Brown > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
I think you want something like so: # make some data foo.df <- data.frame(x = 1:100, y = runif(100), age = rnorm(100, 10, 1)) # stick some "real" NAs in all columns foo.df[c(2,78,32,56),] <- NA # make some "errant" NAs in the column age foo.df$age[c(99, 26, 75, 3)] <- NA # eg foo.df[1:5,] # remove the errant NAs with is.na foo.df <- foo.df[!(is.na(foo.df$age) == T & is.na(foo.df$x) == F),] foo.df[1:5,]
How about: all.nas <- apply( old, 1, function(x) sum( is.na( x ) ) ) new <- old[all.nas < dim( old )[2], ] ---------------------- Bendix Carstensen Senior Statistician Steno Diabetes Center Niels Steensens Vej 2 DK-2820 Gentofte Denmark tel: +45 44 43 87 38 mob: +45 30 75 87 38 fax: +45 44 43 07 06 bxc at steno.dk www.biostat.ku.dk/~bxc ----------------------> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Robert > Brown FM CEFAS > Sent: Monday, November 01, 2004 4:18 PM > To: r-help at stat.math.ethz.ch > Subject: [R] deleting specified NA values > > > I have a data set of about 10000 records which was compiled > from several smaller data sets using SPSS. During compilation > 88 false records were accidentally introduced which comprise > all NA values. I want to delete these records but not other > missing data. The functions na.exclude and na.omit seem to > remove all values of NA? How can I delete just the relevant > NA's? . i.e. I want to delete all records in the data frame > DATA where the field age contains NA values > > Regards, > > Robert Brown > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read > the posting guide! http://www.R-project.org/posting-guide.html >