thr3ads.net - R devel - [Rd] Severe memory problem using split() [Jul 2010]

If this information is useful, please help other people find it:
Share via:

cstrato

2010-Jul-12 20:45 UTC

[Rd] Severe memory problem using split()

Dear all,

With great interest I followed the discussion:
https://stat.ethz.ch/pipermail/r-devel/2010-July/057901.html
since I have currently a similar problem:

In a new R session (using xterm) I am importing a simple table 
"Hu6800_ann.txt" which has a size of 754KB only:

 > ann <- read.delim("Hu6800_ann.txt")
 > dim(ann)
[1] 7129   11


When I call "object.size(ann)" the estimated memory used to store
"ann"
is already 2MB:

 > object.size(ann)
2034784 bytes


Now I call "split()" and check the estimated memory used which turns
out
to be 3.3GB:

 > u2p  <- split(ann[,"ProbesetID"],ann[,"UNIT_ID"])
 > object.size(u2p)
3323768120 bytes

During the R session I am running "top" in another xterm and can see 
that the memory usage of R increases to about 550MB RSIZE.


Now I do:

 > object.size(unlist(u2p))
894056 bytes

It takes about 3 minutes to complete this call and the memory usage of R 
increases to about 1.3GB RSIZE. Furthermore, during evaluation of this 
function the free RAM of my Mac decreases to less than 8MB free PhysMem, 
until it needs to swap memory. When finished, free PhysMem is 734MB but 
the size of R increased to 577MB RSIZE.

Doing
"split(ann[,"ProbesetID"],ann[,"UNIT_ID"],drop=TRUE)"
did not
change the object.size, only processing was faster and it did use less 
memory on my Mac.

Do you have any idea what the reason for this behavior is?
Why is the size of list "u2p" so large?
Do I make any mistake?


Here is my sessionInfo on a MacBook Pro with 2GB RAM:

 > sessionInfo()
R version 2.11.1 (2010-05-31)
x86_64-apple-darwin9.8.0

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

Best regards
Christian
_._._._._._._._._._._._._._._._._._
C.h.r.i.s.t.i.a.n   S.t.r.a.t.o.w.a
V.i.e.n.n.a           A.u.s.t.r.i.a
e.m.a.i.l:        cstrato at aon.at
_._._._._._._._._._._._._._._._._._

Martin Morgan

2010-Jul-12 21:44 UTC

head link

[Rd] Severe memory problem using split()

On 07/12/2010 01:45 PM, cstrato wrote:> Dear all,
> 
> With great interest I followed the discussion:
> https://stat.ethz.ch/pipermail/r-devel/2010-July/057901.html
> since I have currently a similar problem:
> 
> In a new R session (using xterm) I am importing a simple table
> "Hu6800_ann.txt" which has a size of 754KB only:
> 
>> ann <- read.delim("Hu6800_ann.txt")
>> dim(ann)
> [1] 7129   11
> 
> 
> When I call "object.size(ann)" the estimated memory used to store
"ann"
> is already 2MB:
> 
>> object.size(ann)
> 2034784 bytes
> 
> 
> Now I call "split()" and check the estimated memory used which
turns out
> to be 3.3GB:
> 
>> u2p  <-
split(ann[,"ProbesetID"],ann[,"UNIT_ID"])
>> object.size(u2p)
> 3323768120 bytes
I guess things improve with stringsAsFactors=FALSE in read.delim?

Martin
> 
> During the R session I am running "top" in another xterm and can
see
> that the memory usage of R increases to about 550MB RSIZE.
> 
> 
> Now I do:
> 
>> object.size(unlist(u2p))
> 894056 bytes
> 
> It takes about 3 minutes to complete this call and the memory usage of R
> increases to about 1.3GB RSIZE. Furthermore, during evaluation of this
> function the free RAM of my Mac decreases to less than 8MB free PhysMem,
> until it needs to swap memory. When finished, free PhysMem is 734MB but
> the size of R increased to 577MB RSIZE.
> 
> Doing
"split(ann[,"ProbesetID"],ann[,"UNIT_ID"],drop=TRUE)"
did not
> change the object.size, only processing was faster and it did use less
> memory on my Mac.
> 
> Do you have any idea what the reason for this behavior is?
> Why is the size of list "u2p" so large?
> Do I make any mistake?
> 
> 
> Here is my sessionInfo on a MacBook Pro with 2GB RAM:
> 
>> sessionInfo()
> R version 2.11.1 (2010-05-31)
> x86_64-apple-darwin9.8.0
> 
> locale:
> [1] C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> Best regards
> Christian
> _._._._._._._._._._._._._._._._._._
> C.h.r.i.s.t.i.a.n   S.t.r.a.t.o.w.a
> V.i.e.n.n.a           A.u.s.t.r.i.a
> e.m.a.i.l:        cstrato at aon.at
> _._._._._._._._._._._._._._._._._._
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

cstrato

2010-Jul-12 22:00 UTC

head link

[Rd] Severe memory problem using split()

Dear Martin,

Thank you, you are right, now I get:

 > ann <- read.delim("Hu6800_ann.txt", stringsAsFactors=FALSE)
 > object.size(ann)
2035952 bytes
 > u2p  <- split(ann[,"ProbesetID"],ann[,"UNIT_ID"])
 > object.size(u2p)
1207368 bytes
 > object.size(unlist(u2p))
865176 bytes

Nevertheless, a size of 1.2MB for a list representing 2 of 11 columns of 
a table of size 754KB seems still to be pretty large?

Best regards
Christian


On 7/12/10 11:44 PM, Martin Morgan wrote:> On 07/12/2010 01:45 PM, cstrato wrote:
>> Dear all,
>>
>> With great interest I followed the discussion:
>> https://stat.ethz.ch/pipermail/r-devel/2010-July/057901.html
>> since I have currently a similar problem:
>>
>> In a new R session (using xterm) I am importing a simple table
>> "Hu6800_ann.txt" which has a size of 754KB only:
>>
>>> ann<- read.delim("Hu6800_ann.txt")
>>> dim(ann)
>> [1] 7129   11
>>
>>
>> When I call "object.size(ann)" the estimated memory used to
store "ann"
>> is already 2MB:
>>
>>> object.size(ann)
>> 2034784 bytes
>>
>>
>> Now I call "split()" and check the estimated memory used
which turns out
>> to be 3.3GB:
>>
>>> u2p<-
split(ann[,"ProbesetID"],ann[,"UNIT_ID"])
>>> object.size(u2p)
>> 3323768120 bytes
>
> I guess things improve with stringsAsFactors=FALSE in read.delim?
>
> Martin
>
>>
>> During the R session I am running "top" in another xterm and
can see
>> that the memory usage of R increases to about 550MB RSIZE.
>>
>>
>> Now I do:
>>
>>> object.size(unlist(u2p))
>> 894056 bytes
>>
>> It takes about 3 minutes to complete this call and the memory usage of
R
>> increases to about 1.3GB RSIZE. Furthermore, during evaluation of this
>> function the free RAM of my Mac decreases to less than 8MB free
PhysMem,
>> until it needs to swap memory. When finished, free PhysMem is 734MB but
>> the size of R increased to 577MB RSIZE.
>>
>> Doing
"split(ann[,"ProbesetID"],ann[,"UNIT_ID"],drop=TRUE)"
did not
>> change the object.size, only processing was faster and it did use less
>> memory on my Mac.
>>
>> Do you have any idea what the reason for this behavior is?
>> Why is the size of list "u2p" so large?
>> Do I make any mistake?
>>
>>
>> Here is my sessionInfo on a MacBook Pro with 2GB RAM:
>>
>>> sessionInfo()
>> R version 2.11.1 (2010-05-31)
>> x86_64-apple-darwin9.8.0
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> Best regards
>> Christian
>> _._._._._._._._._._._._._._._._._._
>> C.h.r.i.s.t.i.a.n   S.t.r.a.t.o.w.a
>> V.i.e.n.n.a           A.u.s.t.r.i.a
>> e.m.a.i.l:        cstrato at aon.at
>> _._._._._._._._._._._._._._._._._._
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

R devel - Jul 2010 - Severe memory problem using split()

[Rd] Severe memory problem using split()

[Rd] Severe memory problem using split()

[Rd] Severe memory problem using split()