dear R wizards: an operation I execute often is the deletion of all observations (in a matrix or data set) that have at least one NA. (I now need this operation for kde2d, because its internal quantile call complains; could this be considered a buglet?) usually, my data sets are small enough for speed not to matter, and there I do not care whether my method is pretty inefficient (ok, I admit it: I use the sum() function and test whether the result is NA)---but now I have some bigger data sets. Is there a recommended method of doing NA elimination most efficiently? sincerely, /iaw --- ivo welch professor of finance and economics brown / nber / yale
I find complete.cases() to be very useful for this kind of stuff (and very fast). As in, > d <- data.frame(x = c(1,2,3,NA,5), y = c(1,NA,3,4,5)) > d x y 1 1 1 2 2 NA 3 3 3 4 NA 4 5 5 5 > complete.cases(d) [1] TRUE FALSE TRUE FALSE TRUE > use <- complete.cases(d) > d[use, ] x y 1 1 1 3 3 3 5 5 5 > -roger ivo welch wrote:> > dear R wizards: an operation I execute often is the deletion of all > observations (in a matrix or data set) that have at least one NA. (I now > need this operation for kde2d, because its internal quantile call > complains; could this be considered a buglet?) usually, my data sets > are small enough for speed not to matter, and there I do not care > whether my method is pretty inefficient (ok, I admit it: I use the sum() > function and test whether the result is NA)---but now I have some bigger > data sets. Is there a recommended method of doing NA elimination most > efficiently? sincerely, /iaw > --- > ivo welch > professor of finance and economics > brown / nber / yale > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >-- Roger D. Peng http://www.biostat.jhsph.edu/~rpeng/
On Wed, 2004-07-07 at 09:35, ivo welch wrote:> dear R wizards: an operation I execute often is the deletion of all > observations (in a matrix or data set) that have at least one NA. (I > now need this operation for kde2d, because its internal quantile call > complains; could this be considered a buglet?) usually, my data sets > are small enough for speed not to matter, and there I do not care > whether my method is pretty inefficient (ok, I admit it: I use the > sum() function and test whether the result is NA)---but now I have some > bigger data sets. Is there a recommended method of doing NA elimination > most efficiently? sincerely, /iaw > --- > ivo welch > professor of finance and economics > brown / nber / yaleTake a look at ?complete.cases HTH, Marc Schwartz
Hi Ivo Try ?na.omit Example :>d <- data.frame(x = c(1:5,NA), y = c(NA,3:7)) dx y 1 1 NA 2 2 3 3 3 4 4 4 5 5 5 6 6 NA 7>do<-na.omit(d) >dox y 2 2 3 3 3 4 4 4 5 5 5 6 I usually pass na.omit within the data argument of a function i.e. m<-lm(x~y,data=na.omit(d)). In this way you don't have to store 2 datasets. I hopw that this helps Francisco>From: Marc Schwartz <MSchwartz at MedAnalytics.com> >Reply-To: MSchwartz at MedAnalytics.com >To: ivo welch <ivo_welch at mailblocks.com> >CC: R-Help <r-help at stat.math.ethz.ch> >Subject: Re: [R] fast NA elimination ? >Date: Wed, 07 Jul 2004 09:41:39 -0500 > >On Wed, 2004-07-07 at 09:35, ivo welch wrote: > > dear R wizards: an operation I execute often is the deletion of all > > observations (in a matrix or data set) that have at least one NA. (I > > now need this operation for kde2d, because its internal quantile call > > complains; could this be considered a buglet?) usually, my data sets > > are small enough for speed not to matter, and there I do not care > > whether my method is pretty inefficient (ok, I admit it: I use the > > sum() function and test whether the result is NA)---but now I have some > > bigger data sets. Is there a recommended method of doing NA elimination > > most efficiently? sincerely, /iaw > > --- > > ivo welch > > professor of finance and economics > > brown / nber / yale > > >Take a look at ?complete.cases > >HTH, > >Marc Schwartz > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://www.stat.math.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! >http://www.R-project.org/posting-guide.htmlTechnology 101. http://special.msn.com/tech/technology101.armx