Hi William,
Thanks for the comments and explanation.
It is really good to know the details of rowMeans.
I did modified Peter's codes from length(x[x=="02"]) to
sum(x=="02"), though it improved only in few seconds. :)
Best,
Mike
-----Original Message-----
From: William Dunlap [mailto:wdunlap at tibco.com]
Sent: Friday, May 15, 2009 10:09 AM
To: Ping-Hsun Hsieh
Subject: RE: [R] memory usage grows too fast
rowMeans(dataMatrix=="02") must
(a) make a logical matrix the dimensions of dataMatrix in which to put
the result of dataMatrix=="02" (4 bytes/logical element)
(b) make a double precision matrix (8 bytes/element) the size of that
logical matrix because rowMeans uses some C code that only works
on
doubles
apply(dataMatrix,1,function(x)length(x[x=="02"])/ncol(dataMatrix))
never has to make any copies of the entire matrix. It extracts a row
at a time and when it is done with the row, the memory used for
working on the row is available for other uses. Note that it would
probably
be a tad faster if it were changed to
apply(dataMatrix,1,function(x)sum(x=="02")) / ncol(dataMatrix)
as sum(logicalVector) is the same as length(x[logicalVector]) and there
is no need to compute ncol(dataMatrix) more than once.
Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com
> -----Original Message-----
> From: Ping-Hsun Hsieh [mailto:hsiehp at ohsu.edu]
> Sent: Friday, May 15, 2009 9:58 AM
> To: Peter Alspach; William Dunlap; hadley wickham
> Cc: r-help at r-project.org
> Subject: RE: [R] memory usage grows too fast
>
> Thanks for Peter, William, and Hadley's helps.
> Your codes are much more concise than mine. :P
>
> Both William and Hadley's comments are the same. Here are their codes.
>
> f <- function(dataMatrix) rowMeans(datamatrix=="02")
>
> And Peter's codes are the following.
>
> apply(yourMatrix, 1, function(x)
> length(x[x==yourPattern]))/ncol(yourMatrix)
>
>
> In terms of the running time, the first one ran faster than
> the later one on my dataset (2.5 mins vs. 6.4 mins)
> The memory consumption, however, of the first one is much
> higher than the later. ( >8G vs. ~3G )
>
> Any thoughts? My guess is the rowMeans created extra copies
> to perform its calculation, but not so sure.
> And I am also interested in understanding ways to handle
> memory issues. Help someone could shed light on this for me. :)
>
> Best,
> Mike
>
> -----Original Message-----
> From: Peter Alspach [mailto:PAlspach at hortresearch.co.nz]
> Sent: Thursday, May 14, 2009 4:47 PM
> To: Ping-Hsun Hsieh
> Subject: RE: [R] memory usage grows too fast
>
> Tena koe Mike
>
> If I understand you correctly, you should be able to use
> something like:
>
> apply(yourMatrix, 1, function(x)
> length(x[x==yourPattern]))/ncol(yourMatrix)
>
> I see you've divided by nrow(yourMatrix) so perhaps I am missing
> something.
>
> HTH ...
>
> Peter Alspach
>
>
>
> > -----Original Message-----
> > From: r-help-bounces at r-project.org
> > [mailto:r-help-bounces at r-project.org] On Behalf Of Ping-Hsun Hsieh
> > Sent: Friday, 15 May 2009 11:22 a.m.
> > To: r-help at r-project.org
> > Subject: [R] memory usage grows too fast
> >
> > Hi All,
> >
> > I have a 1000x1000000 matrix.
> > The calculation I would like to do is actually very simple:
> > for each row, calculate the frequency of a given pattern. For
> > example, a toy dataset is as follows.
> >
> > Col1 Col2 Col3 Col4
> > 01 02 02 00 => Freq of "02" is 0.5
> > 02 02 02 01 => Freq of "02" is 0.75
> > 00 02 01 01 ...
> >
> > My code is quite simple as the following to find the pattern
"02".
> >
> > OccurrenceRate_Fun<-function(dataMatrix)
> > {
> > tmp<-NULL
> > tmpMatrix<-apply(dataMatrix,1,match,"02")
> > for ( i in 1: ncol(tmpMatrix))
> > {
> > tmpRate<-table(tmpMatrix[,i])[[1]]/ nrow(tmpMatrix)
> > tmp<-c(tmp,tmpHET)
> > }
> > rm(tmpMatrix)
> > rm(tmpRate)
> > return(tmp)
> > gc()
> > }
> >
> > The problem is the memory usage grows very fast and hard to
> > be handled on machines with less RAM.
> > Could anyone please give me some comments on how to reduce
> > the space complexity in this calculation?
> >
> > Thanks,
> > Mike
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> The contents of this e-mail are confidential and may be
> subject to legal privilege.
> If you are not the intended recipient you must not use,
> disseminate, distribute or
> reproduce all or any part of this e-mail or attachments. If
> you have received this
> e-mail in error, please notify the sender and delete all
> material pertaining to this
> e-mail. Any opinion or views expressed in this e-mail are
> those of the individual
> sender and may not represent those of The New Zealand
> Institute for Plant and
> Food Research Limited.
>