On Tue, 2004-07-20 at 07:55, Christian Schulz wrote:> Hi,
>
> somtimes i have trivial recodings like this:
>
> > dim(tt)
> [1] 252382 98
>
> system.time(for(i in 2:length(tt)){
> tt[,i][is.na(tt[,i])] <- 0
> })
>
> ...and a win2000(XP2000+,1GB) machine makes it in several minutes, but
> my linux notebook (XP2.6GHZ,512MB) don't get success after some hours.
>
> I recognize that the cpu load is most time relative small, but the hardisk
> have a lot of work.
>
> Is this a problem of --max-vsize and --max-nsize and i should play with
that,
> because i can't believe that the difference of RAM is the reason?
>
> Have anybody experience what is an "optimal" setting with i.e.
> 512 MB RAM in Linux?
>
> Many thanks for help and comments
> regards,christian
Christian,
I am unclear as to the nature of your loop above.
Note that:
> length(tt)
[1] 24733436
which is 252382 * 98. Your looping approach is not efficient and
incorrect.
Note that when trying to run your loop 'as is', I get:
> system.time(for(i in 2:length(tt)){
+ tt[,i][is.na(tt[,i])] <- 0
+ })
Error: subscript out of bounds
Timing stopped at: 3.54 1.81 5.5 0 0
This is because 'i' eventually exceeds the number of columns (98) in
'tt', since you have 'i' going from 2 to 24733436.
I am presuming that you simply want to set any 'NA' values in
'tt' to 0?
Take note of using a vectorized approach:
tt <- matrix(sample(c(1:10, NA), 252382 * 98, replace = TRUE),
ncol = 98)
> dim(tt)
[1] 252382 98
> table(is.na(tt))
FALSE TRUE
22484834 2248602
Now use:
> system.time(tt[is.na(tt)] <- 0)
[1] 1.56 0.73 2.42 0.00 0.00
> table(is.na(tt))
FALSE
24733436
This is on a 3.2 Ghz system with 2 Gb of RAM.
However, this is not a memory issue, it is an inefficient use of loops.
HTH,
Marc Schwartz