thr3ads.net - R help - [R] how to improve the efficiency of the following lapply codes [Oct 2006]

If this information is useful, please help other people find it:
Share via:

Weiwei Shi

2006-Oct-25 15:59 UTC

[R] how to improve the efficiency of the following lapply codes

Hi,
I have a series of lda analysis using the following lapply function:

n <- dim(intersect.matrix)[1]
net1.lda <- lapply(1:(n), function(k) i.lda(data.list,
intersect.matrix, i=k, w))

i.lda is function to do the real lda analysis.

intersect.matrix is a nx1026 matrix, n can be a really huge number
like 60k. The target is perform a random search. Building a n=120k
matrix is impossible for my machine. When n=5k, the task can be done
in 30 min while n=60k, it is estimated to take 5 days. So I am
wondering where my coding problem is, which causes this to be a
nonlinearity.

If more info is needed, I will provide.

thanks

-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III

Weiwei Shi

2006-Oct-25 17:04 UTC

head link

[R] how to improve the efficiency of the following lapply codes

object.size(intersect.matrix)
41314204

but my machine has 4 G memory, so it should be ok since after 12
hours, it finishes 16k out of 60k but still slow non-linearly.

I am thinking to chop 60k into multiple 5k data.frames to run the
program. but just wondering is there a way around it?
> version               _
platform       i686-pc-linux-gnu
arch           i686
os             linux-gnu
system         i686, linux-gnu
status
major          2
minor          3.1
year           2006
month          06
day            01
svn rev        38247
language       R
version.string Version 2.3.1 (2006-06-01)

[wshi at chopper ox]$ more /proc/meminfo
        total:    used:    free:  shared: buffers:  cached:
Mem:  4189724672 3035549696 1154174976        0 282836992 2057129984
Swap: 4293586944 645042176 3648544768

[wshi at chopper ox]$ more /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Xeon(TM) CPU 3.60GHz
stepping        : 3
cpu MHz         : 3591.419
cache size      : 2048 KB



thanks.

On 10/25/06, Weiwei Shi <helprhelp at gmail.com>
wrote:> Hi,
> I have a series of lda analysis using the following lapply function:
>
> n <- dim(intersect.matrix)[1]
> net1.lda <- lapply(1:(n), function(k) i.lda(data.list,
> intersect.matrix, i=k, w))
>
> i.lda is function to do the real lda analysis.
>
> intersect.matrix is a nx1026 matrix, n can be a really huge number
> like 60k. The target is perform a random search. Building a n=120k
> matrix is impossible for my machine. When n=5k, the task can be done
> in 30 min while n=60k, it is estimated to take 5 days. So I am
> wondering where my coding problem is, which causes this to be a
> nonlinearity.
>
> If more info is needed, I will provide.
>
> thanks
>
> --
> Weiwei Shi, Ph.D
> Research Scientist
> GeneGO, Inc.
>
> "Did you always know?"
> "No, I did not. But I believed..."
> ---Matrix III
>

-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III

Liaw, Andy

2006-Oct-26 12:25 UTC

head link

[R] how to improve the efficiency of the following lapply codes [Broadcast]

Make good use of Rprof():  It has helped me a great deal in pinpointing
bottlenecks where I would not have suspected.

Cheers,
Andy 

From: Weiwei Shi> object.size(intersect.matrix)
> 41314204
> 
> but my machine has 4 G memory, so it should be ok since after 
> 12 hours, it finishes 16k out of 60k but still slow non-linearly.
> 
> I am thinking to chop 60k into multiple 5k data.frames to run 
> the program. but just wondering is there a way around it?
> 
> > version
>                _
> platform       i686-pc-linux-gnu
> arch           i686
> os             linux-gnu
> system         i686, linux-gnu
> status
> major          2
> minor          3.1
> year           2006
> month          06
> day            01
> svn rev        38247
> language       R
> version.string Version 2.3.1 (2006-06-01)
> 
> [wshi at chopper ox]$ more /proc/meminfo
>         total:    used:    free:  shared: buffers:  cached:
> Mem:  4189724672 3035549696 1154174976        0 282836992 2057129984
> Swap: 4293586944 645042176 3648544768
> 
> [wshi at chopper ox]$ more /proc/cpuinfo
> processor       : 0
> vendor_id       : GenuineIntel
> cpu family      : 15
> model           : 4
> model name      : Intel(R) Xeon(TM) CPU 3.60GHz
> stepping        : 3
> cpu MHz         : 3591.419
> cache size      : 2048 KB
> 
> 
> 
> thanks.
> 
> On 10/25/06, Weiwei Shi <helprhelp at gmail.com> wrote:
> > Hi,
> > I have a series of lda analysis using the following lapply function:
> >
> > n <- dim(intersect.matrix)[1]
> > net1.lda <- lapply(1:(n), function(k) i.lda(data.list, 
> > intersect.matrix, i=k, w))
> >
> > i.lda is function to do the real lda analysis.
> >
> > intersect.matrix is a nx1026 matrix, n can be a really huge number 
> > like 60k. The target is perform a random search. Building a n=120k 
> > matrix is impossible for my machine. When n=5k, the task 
> can be done 
> > in 30 min while n=60k, it is estimated to take 5 days. So I am 
> > wondering where my coding problem is, which causes this to be a 
> > nonlinearity.
> >
> > If more info is needed, I will provide.
> >
> > thanks
> >
> > --
> > Weiwei Shi, Ph.D
> > Research Scientist
> > GeneGO, Inc.
> >
> > "Did you always know?"
> > "No, I did not. But I believed..."
> > ---Matrix III
> >
> 
> 
> --
> Weiwei Shi, Ph.D
> Research Scientist
> GeneGO, Inc.
> 
> "Did you always know?"
> "No, I did not. But I believed..."
> ---Matrix III
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 

------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments,...{{dropped}}

R help - Oct 2006 - how to improve the efficiency of the following lapply codes

[R] how to improve the efficiency of the following lapply codes

[R] how to improve the efficiency of the following lapply codes

[R] how to improve the efficiency of the following lapply codes [Broadcast]