thr3ads.net - R help - [R] 'object.size' takes a long time to return a value [Dec 2004]

If this information is useful, please help other people find it:
Share via:

james.holtman@convergys.com

2004-Dec-12 22:03 UTC

[R] 'object.size' takes a long time to return a value

I was using 'object.size' to see how much memory a list was taking up.
After executing the command, I had thought that my computer had locked up.
After further testing, I determined that it was taking 241 seconds for
object.size to return a value.

I did notice in the release notes that 'object.size' did take longer
when
the list contained character vectors.  Is the time that it is taking
'object.size' to return a value to be expected for such a list?

Much better results were obtained when the character vectors were converted
to factors.


######  Results from the testing  ###################> str(x.1)List of 10
 $ : chr [1:227299] "sadc" "sar" "date"
"ksh" ...
 $ : chr [1:227299] "aprperf" "aprperf" "aprperf"
"aprperf" ...
 $ : num [1:227299] 23 23 0 23 23 0 0 0 0 23 ...
 $ : num [1:227299] 0 0 0 0 0 0 0 0 0 0 ...
 $ : num [1:227299] 3600 3600 0.01 3600 3600 0.01 0.01 0.01 0.01 3600 ...
 $ : num [1:227299] 0.01 0 0.01 0 0.01 0 0.01 0 0 0.01 ...
 $ : num [1:227299] 0 0 0 0 0 0 0 0 0 0 ...
 $ : num [1:227299] 0.01 0 0.01 0 0.01 0 0.01 0 0 0.01 ...
 $ : num [1:227299] 62608 67968    29 10208 13128 ...
 $ : num [1:227299] 0 1 0 0 1 0 0 0 0 0 ...

# takes a long time (241 seconds) to report the size> gc();system.time(print(object.size(x.1)))          used (Mb) gc trigger  (Mb)
Ncells  711007 19.0    2235810  59.8
Vcells 5191294 39.7   14409257 110.0
[1] 34154972
[1] 241.07   0.00 241.08     NA     NA

# trying list of 1000> x.2 <- list.subset(x.1,
1:1000);gc();system.time(print(object.size(x.2)))          used (Mb) gc trigger  (Mb)
Ncells  711006 19.0    2235810  59.8
Vcells 4300288 32.9   14409257 110.0
[1] 145860
[1] 0.01 0.00 0.01   NA   NA

# trying list of 10,000> x.2 <- list.subset(x.1,1:10000);gc();system.time(print(object.size(x.2)))
          used (Mb) gc trigger  (Mb)
Ncells  711006 19.0    2235810  59.8
Vcells 4381288 33.5   14409257 110.0
[1] 1491948
[1] 0.28 0.00 0.28   NA   NA

# list of 40,000> x.2 <- list.subset(x.1,1:40000);gc();system.time(print(object.size(x.2)))
          used (Mb) gc trigger  (Mb)
Ncells  711006 19.0    2235810  59.8
Vcells 4651288 35.5   14409257 110.0
[1] 5988460
[1] 7.15 0.00 7.15   NA   NA

# list of 60,000> x.2 <- list.subset(x.1,1:60000);gc();system.time(print(object.size(x.2)))
          used (Mb) gc trigger  (Mb)
Ncells  711006 19.0    2235810  59.8
Vcells 4831288 36.9   14409257 110.0
[1] 9001556
[1] 17.33  0.00 17.32    NA    NA

# list of 100,000> x.2 <- list.subset(x.1,1:100000);gc();system.time(print(object.size(x.2)))
          used (Mb) gc trigger  (Mb)
Ncells  711006 19.0    2235810  59.8
Vcells 5191288 39.7   14409257 110.0
[1] 15044780
[1] 51.85  0.00 51.86    NA    NA

# list structure of the last object> str(x.2)List of 10
 $ : chr [1:100000] "sadc" "sar" "date"
"ksh" ...
 $ : chr [1:100000] "aprperf" "aprperf" "aprperf"
"aprperf" ...
 $ : num [1:100000] 23 23 0 23 23 0 0 0 0 23 ...
 $ : num [1:100000] 0 0 0 0 0 0 0 0 0 0 ...
 $ : num [1:100000] 3600 3600 0.01 3600 3600 0.01 0.01 0.01 0.01 3600 ...
 $ : num [1:100000] 0.01 0 0.01 0 0.01 0 0.01 0 0 0.01 ...
 $ : num [1:100000] 0 0 0 0 0 0 0 0 0 0 ...
 $ : num [1:100000] 0.01 0 0.01 0 0.01 0 0.01 0 0 0.01 ...
 $ : num [1:100000] 62608 67968    29 10208 13128 ...
 $ : num [1:100000] 0 1 0 0 1 0 0 0 0 0 ...

# with the first two items on the list converted to factors,
#     'object.size' performs a lot better> str(x.1)List of 10
 $ : Factor w/ 175 levels "#bpbkar","#bpcd",..: 132 133 60
93 13 160 60 84
60 132 ...
 $ : Factor w/ 8 levels "apra3g","aprperf",..: 2 2 2 2 2 2 2
2 2 2 ...
 $ : num [1:227299] 23 23 0 23 23 0 0 0 0 23 ...
 $ : num [1:227299] 0 0 0 0 0 0 0 0 0 0 ...
 $ : num [1:227299] 3600 3600 0.01 3600 3600 0.01 0.01 0.01 0.01 3600 ...
 $ : num [1:227299] 0.01 0 0.01 0 0.01 0 0.01 0 0 0.01 ...
 $ : num [1:227299] 0 0 0 0 0 0 0 0 0 0 ...
 $ : num [1:227299] 0.01 0 0.01 0 0.01 0 0.01 0 0 0.01 ...
 $ : num [1:227299] 62608 67968    29 10208 13128 ...
 $ : num [1:227299] 0 1 0 0 1 0 0 0 0 0 ...> system.time(print(object.size(x.1)))  # now it is fast[1] 16374176
[1]  0  0  0 NA NA
> version         _
platform i386-pc-mingw32
arch     i386
os       mingw32
system   i386, mingw32
status
major    2
minor    0.1
year     2004
month    11
day      15
language R>__________________________________________________________
James Holtman        "What is the problem you are trying to solve?"
Executive Technical Consultant  --  Office of Technology, Convergys
james.holtman at convergys.com
+1 (513) 723-2929
--
"NOTICE:  The information contained in this electronic mail ...{{dropped}}

Martin Maechler

2004-Dec-13 11:20 UTC

head link

[R] 'object.size' takes a long time to return a value

>>>>> "james" == james holtman <james.holtman at
convergys.com>
>>>>>     on Sun, 12 Dec 2004 17:03:31 -0500 writes:
    james> I was using 'object.size' to see how much memory a
    james> list was taking up.  After executing the command, I
    james> had thought that my computer had locked up.  After
    james> further testing, I determined that it was taking 241
    james> seconds for object.size to return a value.

    james> I did notice in the release notes that 'object.size'
    james> did take longer when the list contained character
    james> vectors.  Is the time that it is taking 'object.size'
    james> to return a value to be expected for such a list?

yes, partly its expected to take longer than for others,
but, actually, it does take longer than I would have expected,
even after starting to think about it:

Every element of your character vector is a string which is
coded ``as a vector of bytes with a string terminator'' 
(simplification).  To find a string length, i.e., what the R
function  nchar() also does,  "one" has to read all character up
to the string terminator.  That's much slower than just
using the hard coded fact that an integer is 4 bytes or a double
is 8.

    james> Much better results were obtained when the character
    james> vectors were converted to factors.

yes; since your factor only had a dozen or at most 175 levels;
and only these are character; the factor *data* are integers.

However, what I say above does not explain everything about
the slowness of object.size( <character> ).
We would have to go into the C code and the exact implementation
of object.size() to see the reason - and think about possible
improvements.

BTW: Note that R saves memory when character elements are
     "shared"; e.g., for me (on 64-bit Linux, 2.0.1patched),

  > object.size(rep("abcedfghijklmn", 3))
  [1] 152
  > object.size(c("abcedfghijklmn", "ABCEDFGHIJKLMN",
"ABCEDFGHijklmn"))
  [1] 296


Here is some code to experiment further
which slowly constructs character vectors where (I think)
no "sharing" takes place:

rChar <- function(n, m, ch.set = c(LETTERS,letters))
{
    ## Purpose: create random character vector
    ## ----------------------------------------------------------------------
    ## Arguments: n: length of vector
    ##            m: "average" string length
    ## ----------------------------------------------------------------------
    ## Author: Martin Maechler, Date: 13 Dec 2004, 11:35
    sapply(rpois(n, lambda=m),
           function(m) paste(sample(ch.set, size=m), collapse=""))
}

lc <- rChar(1e5, 4)# already takes several seconds on a fast machine

## This is on 64-bit [AMD Athlon(tm) 64 Processor 2800+] "lynne":
system.time(print(object.size(lc)))
## [1] 7240464
## [1] 2.11 0.00 2.14 0.00 0.00

system.time(print(sum(nchar(lc)))) # which is **MUCH** faster
## [1] 399461
## [1] 0.02 0.00 0.02 0.00 0.00

## but still quite slower
system.time(print(for(i in 1:10)sn <- sum(nchar(lc))))## 0.10
## than
lx <- rnorm(1e5)
system.time(print(for(i in 1:10)os <- object.size(lx)))## 0.01

##------------


Note that if we continue this topic, it should probably be moved
to R-devel, since it's getting technical and about R internals
(in coded in C).

--
Martin Maechler, ETH Zurich

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Dec 2004 - 'object.size' takes a long time to return a value

[R] 'object.size' takes a long time to return a value

[R] 'object.size' takes a long time to return a value

Possibly Parallel Threads