Dimitri Liakhovitski
2010-Mar-30 00:21 UTC
[R] Efficiency question: replacing all NAs with a zero
Dear R'ers,
I have a very large data frame (over 4000 rows and 2,500 columns). My
task is very simple - I have to replace all NAs with a zero. My code
works fine on smaller data frames - but I have to deal with a huge one
and there are many NAs in each column.
R runs out of memory on me ("Reached total allocation of 1535Mb: see
help(memory.size)"). Is there any other, more efficient way of doing
it?
Thanks a lot for any hints!
Dimitri
# Building an example frame:
frame<-data.frame(a=rnorm(1:100),b=rnorm(1:100),c=rnorm(1:100),d=rnorm(1:100),e=rnorm(1:100),f=rnorm(1:100),g=rnorm(1:100))
set.seed(1234)
for(i in names(frame)){
i.for.NA<-sample(1:100,60)
frame[[i]][i.for.NA]<-NA
}
# Replacing all NAs in "frame" with zeros - is of course fast in this
example, because this data frame is very small
system.time({
frame<-lapply(frame,function(x){
x[is.na(x)]<-0
return(x)
})})
--
Dimitri Liakhovitski
Ninah.com
Dimitri.Liakhovitski at ninah.com
Gabor Grothendieck
2010-Mar-30 00:27 UTC
[R] Efficiency question: replacing all NAs with a zero
See if this works for you: DF[is.na(DF)] <- 0 On Mon, Mar 29, 2010 at 8:21 PM, Dimitri Liakhovitski <ld7631 at gmail.com> wrote:> Dear R'ers, > > I have a very large data frame (over 4000 rows and 2,500 columns). My > task is very simple - I have to replace all NAs with a zero. My code > works fine on smaller data frames - but I have to deal with a huge one > and there are many NAs in each column. > R runs out of memory on me ("Reached total allocation of 1535Mb: see > help(memory.size)"). Is there any other, more efficient way of doing > it? > Thanks a lot for any hints! > Dimitri > > > # Building an example frame: > frame<-data.frame(a=rnorm(1:100),b=rnorm(1:100),c=rnorm(1:100),d=rnorm(1:100),e=rnorm(1:100),f=rnorm(1:100),g=rnorm(1:100)) > set.seed(1234) > for(i in names(frame)){ > ? ? ? ?i.for.NA<-sample(1:100,60) > ? ? ? ?frame[[i]][i.for.NA]<-NA > } > > # Replacing all NAs in "frame" with zeros - is of course fast in this > example, because this data frame is very small > system.time({ > frame<-lapply(frame,function(x){ > ? ? ? ?x[is.na(x)]<-0 > ? ? ? ?return(x) > })}) > > > -- > Dimitri Liakhovitski > Ninah.com > Dimitri.Liakhovitski at ninah.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Maybe Matching Threads
- any way to make it work faster (deleting rows that contain certain values)
- Code is too slow: mean-centering variables in a data frame by subgroup
- replacing period with a space
- Analogue to SPSS regression commands ENTER and REMOVE in R?
- Function to check if a vector contains a given value?