Taking a look at your script: there are a some potential optimizations
you can do:
# Fine
poi <- as.character(top.GSM396290) #5000 characters
x.data <- h1[,c(1,7:9)] # 485577 obs of 4 variables
# Pre-allocate the space
x <- vector("list", 485577) # x <- list()
# Do the "a" stuff once outside the loop so you aren't doing it
485577 times
a <- strsplit(as.character(x.data[, "UCSC_REFGENE_NAME"]),
";")
# Lets use an apply statement instead of a for loop
# vapply is the fastest since we prespecify the return type.
x.data[vapply(a, function(x) any(poi %in% x), logical(1)), ]
I think this will do what you wanted (and hopefully much faster)
Note that you could probably tune this further but I think this
strikes a good balance between clarity and performance (for now)
Hope this helps,
Michael
On Fri, Mar 23, 2012 at 11:52 AM, Kurinji Pandiyan
<kurinji.pandiyan at gmail.com> wrote:>
> Thank you for the input.
>
> As it were, I realized that my script is utilizing a lot more memory than
> I claimed - it was initially using 3 GB but has gone up to 20.24 active but
> 29.63 assigned to the R session.
>
> The script has run overnight and now I don't think it is active anymore
> since I keep getting the error message that I am out of startup disk space
> for application memory.
>
> I am attaching screen shots of my RAM usage distribution (given that there
> is no fluctuation in the usage by the R session I believe it is not running
> anymore) and of my available HD.
>
>
>
>
>
> Here is my script -
>
> poi <- as.character(top.GSM396290) #5000 characters
> x.data <- h1[,c(1,7:9)] # 485577 obs of 4 variables
> head(x.data)
>
> x <- list()
>
> for(i in 1:485577){
> ?a <- as.character(x.data[i, "UCSC_REFGENE_NAME"])
> ?a <- unlist(strsplit(a, ";"))
> ?if(any(poi %in% a) == TRUE) {x[[i]] <- x.data[i,]}
> ? }
>
> ?# this step completed in a few hours
>
> x <- do.call(rbind, x) # this step has been running overnight and is
still
> stuck
>
> Thanks, I really appreciate the help.
> Kurinji
>
> On Thu, Mar 22, 2012 at 10:44 PM, R. Michael Weylandt
> <michael.weylandt at gmail.com> wrote:
>>
>> Well... what makes you think you are hitting memory constraints then?
>> If you have significantly less than 3GB of data, it shouldn't
surprise
>> you if R never needs more than 3GB of memory.
>>
>> You could just be running your scripts inefficiently...it's an
extreme
>> example, but all the memory and gigaflopping in the world can't
speed
>> this up (by much):
>>
>> for(i in seq_len(1e6)) Sys.sleep(10)
>>
>> Perhaps you should look into profiling tools or parallel
>> computation...if you can post a representative example of your
>> scripts, we might be able to give performance pointers.
>>
>> Michael
>>
>> On Fri, Mar 23, 2012 at 1:33 AM, Kurinji Pandiyan
>> <kurinji.pandiyan at gmail.com> wrote:
>> > Yes, I am.
>> >
>> > Thank you,
>> > Kurinji
>> >
>> > On Mar 22, 2012, at 10:27 PM, "R. Michael Weylandt"
>> > <michael.weylandt at gmail.com> wrote:
>> >
>> >> Use 64bit R?
>> >>
>> >> Michael
>> >>
>> >> On Thu, Mar 22, 2012 at 5:22 PM, Kurinji Pandiyan
>> >> <kurinji.pandiyan at gmail.com> wrote:
>> >>> Hello,
>> >>>
>> >>> I have a 32 GB RAM Mac Pro with a 2*2.4 GHz quad core
processor and
>> >>> 2TB
>> >>> storage. Despite this having so much memory, I am not able
to get R
>> >>> to
>> >>> utilize much more than 3 GBs. Some of my scripts take
hours to run
>> >>> but I
>> >>> would think they would be much faster if more memory is
utilized. How
>> >>> do I
>> >>> optimize the memory usage on R by my Mac Pro?
>> >>>
>> >>> Thank you!
>> >>> Kurinji
>> >>>
>> >>> ? ? ? ?[[alternative HTML version deleted]]
>> >>>
>> >>> ______________________________________________
>> >>> R-help at r-project.org mailing list
>> >>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> PLEASE do read the posting guide
>> >>> http://www.R-project.org/posting-guide.html
>> >>> and provide commented, minimal, self-contained,
reproducible code.
>
>