Weiwei Shi
2006-Oct-25 15:59 UTC
[R] how to improve the efficiency of the following lapply codes
Hi, I have a series of lda analysis using the following lapply function: n <- dim(intersect.matrix)[1] net1.lda <- lapply(1:(n), function(k) i.lda(data.list, intersect.matrix, i=k, w)) i.lda is function to do the real lda analysis. intersect.matrix is a nx1026 matrix, n can be a really huge number like 60k. The target is perform a random search. Building a n=120k matrix is impossible for my machine. When n=5k, the task can be done in 30 min while n=60k, it is estimated to take 5 days. So I am wondering where my coding problem is, which causes this to be a nonlinearity. If more info is needed, I will provide. thanks -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III
Weiwei Shi
2006-Oct-25 17:04 UTC
[R] how to improve the efficiency of the following lapply codes
object.size(intersect.matrix) 41314204 but my machine has 4 G memory, so it should be ok since after 12 hours, it finishes 16k out of 60k but still slow non-linearly. I am thinking to chop 60k into multiple 5k data.frames to run the program. but just wondering is there a way around it?> version_ platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major 2 minor 3.1 year 2006 month 06 day 01 svn rev 38247 language R version.string Version 2.3.1 (2006-06-01) [wshi at chopper ox]$ more /proc/meminfo total: used: free: shared: buffers: cached: Mem: 4189724672 3035549696 1154174976 0 282836992 2057129984 Swap: 4293586944 645042176 3648544768 [wshi at chopper ox]$ more /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 3.60GHz stepping : 3 cpu MHz : 3591.419 cache size : 2048 KB thanks. On 10/25/06, Weiwei Shi <helprhelp at gmail.com> wrote:> Hi, > I have a series of lda analysis using the following lapply function: > > n <- dim(intersect.matrix)[1] > net1.lda <- lapply(1:(n), function(k) i.lda(data.list, > intersect.matrix, i=k, w)) > > i.lda is function to do the real lda analysis. > > intersect.matrix is a nx1026 matrix, n can be a really huge number > like 60k. The target is perform a random search. Building a n=120k > matrix is impossible for my machine. When n=5k, the task can be done > in 30 min while n=60k, it is estimated to take 5 days. So I am > wondering where my coding problem is, which causes this to be a > nonlinearity. > > If more info is needed, I will provide. > > thanks > > -- > Weiwei Shi, Ph.D > Research Scientist > GeneGO, Inc. > > "Did you always know?" > "No, I did not. But I believed..." > ---Matrix III >-- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III
Liaw, Andy
2006-Oct-26 12:25 UTC
[R] how to improve the efficiency of the following lapply codes [Broadcast]
Make good use of Rprof(): It has helped me a great deal in pinpointing bottlenecks where I would not have suspected. Cheers, Andy From: Weiwei Shi> object.size(intersect.matrix) > 41314204 > > but my machine has 4 G memory, so it should be ok since after > 12 hours, it finishes 16k out of 60k but still slow non-linearly. > > I am thinking to chop 60k into multiple 5k data.frames to run > the program. but just wondering is there a way around it? > > > version > _ > platform i686-pc-linux-gnu > arch i686 > os linux-gnu > system i686, linux-gnu > status > major 2 > minor 3.1 > year 2006 > month 06 > day 01 > svn rev 38247 > language R > version.string Version 2.3.1 (2006-06-01) > > [wshi at chopper ox]$ more /proc/meminfo > total: used: free: shared: buffers: cached: > Mem: 4189724672 3035549696 1154174976 0 282836992 2057129984 > Swap: 4293586944 645042176 3648544768 > > [wshi at chopper ox]$ more /proc/cpuinfo > processor : 0 > vendor_id : GenuineIntel > cpu family : 15 > model : 4 > model name : Intel(R) Xeon(TM) CPU 3.60GHz > stepping : 3 > cpu MHz : 3591.419 > cache size : 2048 KB > > > > thanks. > > On 10/25/06, Weiwei Shi <helprhelp at gmail.com> wrote: > > Hi, > > I have a series of lda analysis using the following lapply function: > > > > n <- dim(intersect.matrix)[1] > > net1.lda <- lapply(1:(n), function(k) i.lda(data.list, > > intersect.matrix, i=k, w)) > > > > i.lda is function to do the real lda analysis. > > > > intersect.matrix is a nx1026 matrix, n can be a really huge number > > like 60k. The target is perform a random search. Building a n=120k > > matrix is impossible for my machine. When n=5k, the task > can be done > > in 30 min while n=60k, it is estimated to take 5 days. So I am > > wondering where my coding problem is, which causes this to be a > > nonlinearity. > > > > If more info is needed, I will provide. > > > > thanks > > > > -- > > Weiwei Shi, Ph.D > > Research Scientist > > GeneGO, Inc. > > > > "Did you always know?" > > "No, I did not. But I believed..." > > ---Matrix III > > > > > -- > Weiwei Shi, Ph.D > Research Scientist > GeneGO, Inc. > > "Did you always know?" > "No, I did not. But I believed..." > ---Matrix III > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > >------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments,...{{dropped}}