Respected R experts, I am trying to apply a user function that basically calls and applies the R loess function from stat package over each time series. I have a large matrix of size 21 X 9000000 and I need to apply the loess for each column and hence I have implemented this separate user function that applies loess over each column and I am calling this function foo as follows: xc<-apply(t,2,foo) where t is my 21 X 9000000 matrix and loess. This is turning out to be a very slow process and I need to repeat this step for 25-30 such large matrix chunks. Is there any trick I can use to make this work faster? Any help will be deeply appreciated. Regards Sudipta Sarkar PhD Senior Analyst/Scientist Lanworth Inc. (Formerly Forest One Inc.) 300 Park Blvd., Ste 425 Itasca, IL Ph: 630-250-0468
Hi Jim, Thanks for your prompt response, I am using a fairly powerful Mac with Leopard OS and 17GB RAM and 2x3 GhZ intel zeon processor so I do not think the system is paging. I also using the Rmpi and snow utilities to parallelize it but even then it takes 3.5-4 hours to just complete one chunk of matrices. You mentioned about storing the data and applying on 1 column at a time. Any hint on how I should I go about doing that? I cam across the filehash package but am not sure how to use apply over an environment variable. So any help in this direction will be most welcome. thanks ---- Original message ---->Date: Tue, 29 Apr 2008 16:05:41 -0400 >From: "jim holtman" <jholtman at gmail.com> >Subject: Re: [R] Applying user function over a large matrix >To: "Sudipta Sarkar" <ssarkar at lanworth.com> > >What size machine do you have. A single copy of your object will >require 1.5GB of memory. How slow is slow? Is the operatingsystem>paging because it does not have enough physical memory? canyou store>the data and only operate on 1 column at a time -- thisreduces the>size of the object to 72MB. > >On Tue, Apr 29, 2008 at 3:16 PM, Sudipta Sarkar<ssarkar at lanworth.com> wrote:>> Respected R experts, >> I am trying to apply a user function that basically calls and >> applies the R loess function from stat package over each time >> series. I have a large matrix of size 21 X 9000000 and I need >> to apply the loess for each column and hence I have >> implemented this separate user function that applies loess >> over each column and I am calling this function foo as follows: >> xc<-apply(t,2,foo) where t is my 21 X 9000000 matrix and >> loess. This is turning out to be a very slow process and I >> need to repeat this step for 25-30 such large matrix chunks. >> Is there any trick I can use to make this work faster? >> Any help will be deeply appreciated. >> Regards >> >> >> Sudipta Sarkar PhD >> Senior Analyst/Scientist >> Lanworth Inc. (Formerly Forest One Inc.) >> 300 Park Blvd., Ste 425 >> Itasca, IL >> Ph: 630-250-0468 >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html>> and provide commented, minimal, self-contained,reproducible code.>> > > > >-- >Jim Holtman >Cincinnati, OH >+1 513 646 9390 > >What is the problem you are trying to solve?Sudipta Sarkar PhD Senior Analyst/Scientist Lanworth Inc. (Formerly Forest One Inc.) 300 Park Blvd., Ste 425 Itasca, IL Ph: 630-250-0468
Jim Thanks again yes its 64 bit version of R. So you suggest writing out 9 million columns and then loading them individually and applying the function of each of these columns! But since we are using a for loop here too so won't it end up taking almost same? Plus we aren't we increasing the i/o overhead too? ---- Original message ---->Date: Tue, 29 Apr 2008 16:40:03 -0400 >From: "jim holtman" <jholtman at gmail.com> >Subject: Re: [R] Applying user function over a large matrix >To: "Sudipta Sarkar" <ssarkar at lanworth.com> > >Are you running a 64-bit version of R on the Mac? > >Here is an example script for writing out the columns wit 'save' > >x <- matrix(runif(100000), ncol=10) ># write out each column to a file ># also use 'save' so the data is already in binary >for (i in seq(ncol(x))){ > column <- x[,i] > save(column, file=sprintf("/column_%02d_.Rdata", i)) >} > ># you can then read them back in with 'load' and the data will be ># in the variable 'column' > >On Tue, Apr 29, 2008 at 4:27 PM, Sudipta Sarkar<ssarkar at lanworth.com> wrote:>> Hi Jim, >> Thanks for your prompt response, >> >> I am using a fairly powerful Mac with Leopard OS and 17GB RAM >> and 2x3 GhZ intel zeon processor so I do not think the system >> is paging. I also using the Rmpi and snow utilities to >> parallelize it but even then it takes 3.5-4 hours to just >> complete one chunk of matrices. >> You mentioned about storing the data and applying on 1 column >> at a time. Any hint on how I should I go about doing that? I >> cam across the filehash package but am not sure how to use >> apply over an environment variable. So any help in this >> direction will be most welcome. >> thanks >> >> >> >> ---- Original message ---- >> >Date: Tue, 29 Apr 2008 16:05:41 -0400 >> >From: "jim holtman" <jholtman at gmail.com> >> >Subject: Re: [R] Applying user function over a large matrix >> >To: "Sudipta Sarkar" <ssarkar at lanworth.com> >> > >> >What size machine do you have. A single copy of yourobject will>> >require 1.5GB of memory. How slow is slow? Is the operating >> system >> >paging because it does not have enough physical memory? can >> you store >> >the data and only operate on 1 column at a time -- this >> reduces the >> >size of the object to 72MB. >> > >> >On Tue, Apr 29, 2008 at 3:16 PM, Sudipta Sarkar >> <ssarkar at lanworth.com> wrote: >> >> Respected R experts, >> >> I am trying to apply a user function that basicallycalls and>> >> applies the R loess function from stat package over eachtime>> >> series. I have a large matrix of size 21 X 9000000 and Ineed>> >> to apply the loess for each column and hence I have >> >> implemented this separate user function that applies loess >> >> over each column and I am calling this function foo asfollows:>> >> xc<-apply(t,2,foo) where t is my 21 X 9000000 matrix and >> >> loess. This is turning out to be a very slow process and I >> >> need to repeat this step for 25-30 such large matrix chunks. >> >> Is there any trick I can use to make this work faster? >> >> Any help will be deeply appreciated. >> >> Regards >> >> >> >> >> >> Sudipta Sarkar PhD >> >> Senior Analyst/Scientist >> >> Lanworth Inc. (Formerly Forest One Inc.) >> >> 300 Park Blvd., Ste 425 >> >> Itasca, IL >> >> Ph: 630-250-0468 >> >> >> >> ______________________________________________ >> >> R-help at r-project.org mailing list >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, >> reproducible code. >> >> >> > >> > >> > >> >-- >> >Jim Holtman >> >Cincinnati, OH >> >+1 513 646 9390 >> > >> >What is the problem you are trying to solve? >> >> >> Sudipta Sarkar PhD >> Senior Analyst/Scientist >> Lanworth Inc. (Formerly Forest One Inc.) >> 300 Park Blvd., Ste 425 >> Itasca, IL >> Ph: 630-250-0468 >> > > > >-- >Jim Holtman >Cincinnati, OH >+1 513 646 9390 > >What is the problem you are trying to solve?Sudipta Sarkar PhD Senior Analyst/Scientist Lanworth Inc. (Formerly Forest One Inc.) 300 Park Blvd., Ste 425 Itasca, IL Ph: 630-250-0468
It's quite possible that much of the time spent in loess() is setting up the data (i.e., the formula, terms, model.frame, etc.), and that much of that is repeated identically for each call to loess(). I would suggest looking at the code of loess() and work out what arguments it is calling simpleLoess() with, and then try calling stats:::simpleLoess() directly. (Of course you have to be careful with this because this is not using the published API). -- Tony Plate Sudipta Sarkar wrote:> Respected R experts, > I am trying to apply a user function that basically calls and > applies the R loess function from stat package over each time > series. I have a large matrix of size 21 X 9000000 and I need > to apply the loess for each column and hence I have > implemented this separate user function that applies loess > over each column and I am calling this function foo as follows: > xc<-apply(t,2,foo) where t is my 21 X 9000000 matrix and > loess. This is turning out to be a very slow process and I > need to repeat this step for 25-30 such large matrix chunks. > Is there any trick I can use to make this work faster? > Any help will be deeply appreciated. > Regards > > > Sudipta Sarkar PhD > Senior Analyst/Scientist > Lanworth Inc. (Formerly Forest One Inc.) > 300 Park Blvd., Ste 425 > Itasca, IL > Ph: 630-250-0468 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Dear Folks Thanks for all your replies and suggestions, I will be trying out these suggestions today and let you know how it goes. Please let me know if you can think of anything else to resolve the issue. Regards ---- Original message ---->Date: Tue, 29 Apr 2008 15:43:41 -0700 >From: Bert Gunter <gunter.berton at gene.com> >Subject: Re: [R] Applying user function over a large matrix >To: "'Ray Brownrigg'" <Ray.Brownrigg at mcs.vuw.ac.nz>,<r-help at r-project.org>>Cc: "'Tony Plate'" <tplate at acm.org> > >If you can(one dimensional only), try using lowess() instead.Probably in a>for loop as Ray suggested. > >loess() is more powerful and flexible, but you pay for it inextra>complexity and time. Maybe in this case, it's not worth it. > >-- Bert Gunter >Genentech > >-----Original Message----- >From: r-help-bounces at r-project.org[mailto:r-help-bounces at r-project.org] On>Behalf Of Ray Brownrigg >Sent: Tuesday, April 29, 2008 3:19 PM >To: r-help at r-project.org >Cc: Tony Plate >Subject: Re: [R] Applying user function over a large matrix > >In addition to Tony's suggestion, have a look at thefollowing sequence,>which >I suspect is because the call to apply will duplicate your1.5GB matrix,>whereas the for loop doesn't [I stand to be corrected here]. > >> x <- matrix(runif(210000), 21) >> unix.time({res <- numeric(ncol(x)); for(i in 1:length(res))res[i] <->sum(x[, i])}) > user system elapsed > 0.079 0.000 0.079 >> unix.time(apply(x, 2, sum)) > user system elapsed > 0.10 0.01 0.11 >> x <- matrix(runif(2100000), 21) >> unix.time({res <- numeric(ncol(x)); for(i in 1:length(res))res[i] <->sum(x[, i])}) > user system elapsed > 0.791 0.010 0.801 >> unix.time(apply(x, 2, sum)) > user system elapsed > 1.096 0.011 1.107 >> x <- matrix(runif(21000000), 21) >> unix.time({res <- numeric(ncol(x)); for(i in 1:length(res))res[i] <->sum(x[, i])}) > user system elapsed > 7.825 0.011 7.840 >> unix.time(apply(x, 2, sum)) > user system elapsed > 15.431 0.142 15.592 >> > >Also, preliminary checking using the top utility shows thefor loop requires> >just over half the memory of the apply() call. This is on aNetBSD system>with 2GB memory. > >HTH, >Ray Brownrigg > >On Wed, 30 Apr 2008, Tony Plate wrote: >> It's quite possible that much of the time spent in loess()is setting up>> the data (i.e., the formula, terms, model.frame, etc.), andthat much of>> that is repeated identically for each call to loess(). Iwould suggest>> looking at the code of loess() and work out what argumentsit is calling>> simpleLoess() with, and then try callingstats:::simpleLoess() directly.>> (Of course you have to be careful with this because this isnot using the>> published API). >> >> -- Tony Plate >> >> Sudipta Sarkar wrote: >> > Respected R experts, >> > I am trying to apply a user function that basically calls and >> > applies the R loess function from stat package over each time >> > series. I have a large matrix of size 21 X 9000000 and I need >> > to apply the loess for each column and hence I have >> > implemented this separate user function that applies loess >> > over each column and I am calling this function foo asfollows:>> > xc<-apply(t,2,foo) where t is my 21 X 9000000 matrix and >> > loess. This is turning out to be a very slow process and I >> > need to repeat this step for 25-30 such large matrix chunks. >> > Is there any trick I can use to make this work faster? >> > Any help will be deeply appreciated. >> > Regards >> > >> > >> > Sudipta Sarkar PhD >> > Senior Analyst/Scientist >> > Lanworth Inc. (Formerly Forest One Inc.) >> > 300 Park Blvd., Ste 425 >> > Itasca, IL >> > Ph: 630-250-0468 >> > > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html>and provide commented, minimal, self-contained, reproduciblecode.> >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html>and provide commented, minimal, self-contained, reproduciblecode. Sudipta Sarkar PhD Senior Analyst/Scientist Lanworth Inc. (Formerly Forest One Inc.) 300 Park Blvd., Ste 425 Itasca, IL Ph: 630-250-0468