Dear All and Mark, Given a dataset that I have called dat, I was hoping to speed up the following loop: for(i in 1:835353){ for(j in 1:86){ if (is.na(dat[i,j])==TRUE){dat[i,j]<-0 }}} Actually I am also having a memory problem. I get the following: Error: cannot allocate vector of size 3.2 Mb In addition: Warning messages: 1: In dat[i, j] <- 0 : Reached total allocation of 1535Mb: see help(memory.size) 2: In dat[i, j] <- 0 : Reached total allocation of 1535Mb: see help(memory.size) 3: In dat[i, j] <- 0 : Reached total allocation of 1535Mb: see help(memory.size) 4: In dat[i, j] <- 0 : Reached total allocation of 1535Mb: see help(memory.size) If I try and apply the loop just to a particular column, rather than the whole dataset, so that I dont have the memory problem, ie for(i in 1:835353){ if (is.na(dat[i,4])==TRUE){dat[i,4]<-0 }} it takes ridiculously long to process, so I was hoping that there would be a quicker way to do this. Thank you all very much for the help, Denise [[alternative HTML version deleted]]
Un texte encapsul? et encod? dans un jeu de caract?res inconnu a ?t? nettoy?... Nom : non disponible URL : <https://stat.ethz.ch/pipermail/r-help/attachments/20080728/9d336932/attachment.pl>
If your matrix is 835353x86, then if it is numeric, then it will take about 550MB for a single copy. You should therefore have at least 2GB (so you can have a couple of copies as part of some processing) of real memory on your system. If you want to replace NAs with zero, then this is how you might do it with 'vectorization':> x[,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 NA NA 2 1 2 [2,] 2 2 2 NA 2 2 [3,] 2 2 NA NA 1 2 [4,] NA 1 2 1 2 1 [5,] 1 1 NA 2 NA NA [6,] NA 1 NA 1 2 NA> x[is.na(x)] <- 0 > x[,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 0 0 2 1 2 [2,] 2 2 2 0 2 2 [3,] 2 2 0 0 1 2 [4,] 0 1 2 1 2 1 [5,] 1 1 0 2 0 0 [6,] 0 1 0 1 2 0 Maybe you should read the Intro To R to understand how vectorization works. Same way with your last loop: x[is.na(x[,4]), 4] <- 0 On Mon, Jul 28, 2008 at 9:15 AM, Denise Xifara <dionysia-kiara.xifaras at st-hildas.ox.ac.uk> wrote:> Dear All and Mark, > > Given a dataset that I have called dat, I was hoping to speed up the > following loop: > > for(i in 1:835353){ > for(j in 1:86){ > if (is.na(dat[i,j])==TRUE){dat[i,j]<-0 }}} > Actually I am also having a memory problem. I get the following: > > Error: cannot allocate vector of size 3.2 Mb > In addition: Warning messages: > 1: In dat[i, j] <- 0 : > Reached total allocation of 1535Mb: see help(memory.size) > 2: In dat[i, j] <- 0 : > Reached total allocation of 1535Mb: see help(memory.size) > 3: In dat[i, j] <- 0 : > Reached total allocation of 1535Mb: see help(memory.size) > 4: In dat[i, j] <- 0 : > Reached total allocation of 1535Mb: see help(memory.size) > > If I try and apply the loop just to a particular column, rather than the > whole dataset, so that I dont have the memory problem, ie > > for(i in 1:835353){ > if (is.na(dat[i,4])==TRUE){dat[i,4]<-0 }} > > it takes ridiculously long to process, so I was hoping that there would be a > quicker way to do this. > > Thank you all very much for the help, > Denise > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
ONKELINX, Thierry
2008-Jul-28 13:43 UTC
[R] speeding up loop and dealing wtih memory problems
Dear Denise, It looks like you want to replace all NA with 0 in the dataset? The code below should do that trick without loops. And it will be rather fast. dat[is.na(dat)] <- 0> dat <- matrix(rbinom(40, 1, 0.75), ncol = 4, nrow = 10) > dat[dat == 0] <- NA > dat[,1] [,2] [,3] [,4] [1,] 1 1 1 1 [2,] 1 1 NA 1 [3,] NA 1 NA NA [4,] 1 1 NA 1 [5,] 1 1 1 NA [6,] 1 1 1 NA [7,] 1 1 1 1 [8,] 1 1 1 NA [9,] NA 1 1 1 [10,] 1 1 1 1> > dat[is.na(dat)] <- 0 > dat[,1] [,2] [,3] [,4] [1,] 1 1 1 1 [2,] 1 1 0 1 [3,] 0 1 0 0 [4,] 1 1 0 1 [5,] 1 1 1 0 [6,] 1 1 1 0 [7,] 1 1 1 1 [8,] 1 1 1 0 [9,] 0 1 1 1 [10,] 1 1 1 1>HTH, Thierry ------------------------------------------------------------------------ ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx op inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -----Oorspronkelijk bericht----- Van: r-help-bounces op r-project.org [mailto:r-help-bounces op r-project.org] Namens Denise Xifara Verzonden: maandag 28 juli 2008 15:15 Aan: r-help op r-project.org Onderwerp: [R] speeding up loop and dealing wtih memory problems Dear All and Mark, Given a dataset that I have called dat, I was hoping to speed up the following loop: for(i in 1:835353){ for(j in 1:86){ if (is.na(dat[i,j])==TRUE){dat[i,j]<-0 }}} Actually I am also having a memory problem. I get the following: Error: cannot allocate vector of size 3.2 Mb In addition: Warning messages: 1: In dat[i, j] <- 0 : Reached total allocation of 1535Mb: see help(memory.size) 2: In dat[i, j] <- 0 : Reached total allocation of 1535Mb: see help(memory.size) 3: In dat[i, j] <- 0 : Reached total allocation of 1535Mb: see help(memory.size) 4: In dat[i, j] <- 0 : Reached total allocation of 1535Mb: see help(memory.size) If I try and apply the loop just to a particular column, rather than the whole dataset, so that I dont have the memory problem, ie for(i in 1:835353){ if (is.na(dat[i,4])==TRUE){dat[i,4]<-0 }} it takes ridiculously long to process, so I was hoping that there would be a quicker way to do this. Thank you all very much for the help, Denise [[alternative HTML version deleted]] ______________________________________________ R-help op r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.