Dear All, I am experiencing some problems with a script of mine. It crashes with this message Error in grepl(fut_string, past_string) : invalid regular expression '12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12 Calls: entropy_estimate_hash -> total_entropy_lz -> entropy_lz -> grepl In addition: Warning message: In grepl(fut_string, past_string) : regcomp error: 'Out of memory' Execution halted To make a long story short, I use some functions which eventually call grepl on very long strings to check whether a certain substring is part of a longer string. Now, the script technically works (it never crashes when I run it on a smaller dataset) and the problem does not seem to be RAM memory (I have several GB of RAM on my machine and its consumption never shoots up so my machine never resorts to swap memory). So (though I am not an expert) it looks like the problem is some limitation of grepl or R memory management. Any idea about how I could tackle this problem or how I can profile my code to fix it (though it really seems to me that I have to find a way to allow R to process longer strings). Any suggestion is appreciated. Cheers Lorenzo
These questions are OS-specific. Please provide sessionInfo() or other details as needed -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Lorenzo Isella Sent: Friday, October 08, 2010 1:12 PM To: r-help Subject: [R] Memory management in R Dear All, I am experiencing some problems with a script of mine. It crashes with this message Error in grepl(fut_string, past_string) : invalid regular expression '12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12 Calls: entropy_estimate_hash -> total_entropy_lz -> entropy_lz -> grepl In addition: Warning message: In grepl(fut_string, past_string) : regcomp error: 'Out of memory' Execution halted To make a long story short, I use some functions which eventually call grepl on very long strings to check whether a certain substring is part of a longer string. Now, the script technically works (it never crashes when I run it on a smaller dataset) and the problem does not seem to be RAM memory (I have several GB of RAM on my machine and its consumption never shoots up so my machine never resorts to swap memory). So (though I am not an expert) it looks like the problem is some limitation of grepl or R memory management. Any idea about how I could tackle this problem or how I can profile my code to fix it (though it really seems to me that I have to find a way to allow R to process longer strings). Any suggestion is appreciated. Cheers Lorenzo ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
More specificity: how long is the string, what is the pattern you are matching against? It sounds like you might have a complex pattern that in trying to match the string might be doing a lot of back tracking and such. There is an O'Reilly book on Mastering Regular Expression that might help you understand what might be happening. So if you can provide a better example than just the error message, it would be helpful. On Fri, Oct 8, 2010 at 1:11 PM, Lorenzo Isella <lorenzo.isella at gmail.com> wrote:> Dear All, > I am experiencing some problems with a script of mine. > It crashes with this message > > Error in grepl(fut_string, past_string) : > ?invalid regular expression > '12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12 > Calls: entropy_estimate_hash -> total_entropy_lz -> entropy_lz -> grepl > In addition: Warning message: > In grepl(fut_string, past_string) : regcomp error: ?'Out of memory' > Execution halted > > To make a long story short, I use some functions which eventually call grepl > on very long strings to check whether a certain substring is part of a > longer string. > Now, the script technically works (it never crashes when I run it on a > smaller dataset) and the problem does not seem to be RAM memory (I have > several GB of RAM on my machine and its consumption never shoots up so my > machine never resorts to swap memory). > So (though I am not an expert) it looks like the problem is some limitation > of grepl or R memory management. > Any idea about how I could tackle this problem or how I can profile my code > to fix it (though it really seems to me that I have to find a way to allow R > to process longer strings). > Any suggestion is appreciated. > Cheers > > Lorenzo > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?