Dimitri Liakhovitski
2010-Mar-30 00:21 UTC
[R] Efficiency question: replacing all NAs with a zero
Dear R'ers, I have a very large data frame (over 4000 rows and 2,500 columns). My task is very simple - I have to replace all NAs with a zero. My code works fine on smaller data frames - but I have to deal with a huge one and there are many NAs in each column. R runs out of memory on me ("Reached total allocation of 1535Mb: see help(memory.size)"). Is there any other, more efficient way of doing it? Thanks a lot for any hints! Dimitri # Building an example frame: frame<-data.frame(a=rnorm(1:100),b=rnorm(1:100),c=rnorm(1:100),d=rnorm(1:100),e=rnorm(1:100),f=rnorm(1:100),g=rnorm(1:100)) set.seed(1234) for(i in names(frame)){ i.for.NA<-sample(1:100,60) frame[[i]][i.for.NA]<-NA } # Replacing all NAs in "frame" with zeros - is of course fast in this example, because this data frame is very small system.time({ frame<-lapply(frame,function(x){ x[is.na(x)]<-0 return(x) })}) -- Dimitri Liakhovitski Ninah.com Dimitri.Liakhovitski at ninah.com
Gabor Grothendieck
2010-Mar-30 00:27 UTC
[R] Efficiency question: replacing all NAs with a zero
See if this works for you: DF[is.na(DF)] <- 0 On Mon, Mar 29, 2010 at 8:21 PM, Dimitri Liakhovitski <ld7631 at gmail.com> wrote:> Dear R'ers, > > I have a very large data frame (over 4000 rows and 2,500 columns). My > task is very simple - I have to replace all NAs with a zero. My code > works fine on smaller data frames - but I have to deal with a huge one > and there are many NAs in each column. > R runs out of memory on me ("Reached total allocation of 1535Mb: see > help(memory.size)"). Is there any other, more efficient way of doing > it? > Thanks a lot for any hints! > Dimitri > > > # Building an example frame: > frame<-data.frame(a=rnorm(1:100),b=rnorm(1:100),c=rnorm(1:100),d=rnorm(1:100),e=rnorm(1:100),f=rnorm(1:100),g=rnorm(1:100)) > set.seed(1234) > for(i in names(frame)){ > ? ? ? ?i.for.NA<-sample(1:100,60) > ? ? ? ?frame[[i]][i.for.NA]<-NA > } > > # Replacing all NAs in "frame" with zeros - is of course fast in this > example, because this data frame is very small > system.time({ > frame<-lapply(frame,function(x){ > ? ? ? ?x[is.na(x)]<-0 > ? ? ? ?return(x) > })}) > > > -- > Dimitri Liakhovitski > Ninah.com > Dimitri.Liakhovitski at ninah.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Possibly Parallel Threads
- any way to make it work faster (deleting rows that contain certain values)
- Code is too slow: mean-centering variables in a data frame by subgroup
- replacing period with a space
- Analogue to SPSS regression commands ENTER and REMOVE in R?
- Function to check if a vector contains a given value?