Leonard Mada
2022-May-19 00:08 UTC
[R] regexpr: R takes very long with non-existent pattern
Dear R Users, I have run the following command in R: # x = larger vector of strings (1200 Pubmed abstracts); # patt = not defined; npos = regexpr(patt, x, perl=TRUE); # Error in regexpr(patt, x, perl = TRUE) : object 'patt' not found The problem: R becomes unresponsive and it takes 1-2 minutes to return the error. The operation completes almost instantaneously with a valid pattern. Is there a reason for this behavior? Tested with R 4.2.0 on MS Windows 10. I have uploaded a set with 1200 Pubmed abstracts on Github, if anyone wants to check: - see file: Example_Abstracts_Title_Pubmed.csv; github.com/discoleo/R/tree/master/TextMining/Pubmed The variable patt was not defined due to an error: but it took very long to exit the operation and report the error. Many thanks, Leonard
Andrew Simmons
2022-May-19 00:26 UTC
[R] regexpr: R takes very long with non-existent pattern
Hello, I tried this myself, something like: dat <- utils::read.csv( "raw.githubusercontent.com/discoleo/R/master/TextMining/Pubmed/Example_Abstracts_Title_Pubmed.csv", check.names = FALSE ) regexpr(patt, dat$Abstract, perl = TRUE) regexpr(patt, dat$Title, perl = TRUE) and I can't reproduce your issue. Mine seems to raise the error within a second or less that object 'patt' does not exist. I'm using R 4.2.0 and Windows 11, though that shouldn't be making a difference: if you look at Sys.info(), it's still Windows 10 with a build version of 22000. Don't really know what else to say, have you tried it again since? Regards, Andrew Simmons On Wed, May 18, 2022 at 5:09 PM Leonard Mada via R-help <r-help at r-project.org> wrote:> > Dear R Users, > > > I have run the following command in R: > > # x = larger vector of strings (1200 Pubmed abstracts); > # patt = not defined; > npos = regexpr(patt, x, perl=TRUE); > # Error in regexpr(patt, x, perl = TRUE) : object 'patt' not found > > > The problem: > > R becomes unresponsive and it takes 1-2 minutes to return the error. The > operation completes almost instantaneously with a valid pattern. > > Is there a reason for this behavior? > > Tested with R 4.2.0 on MS Windows 10. > > > I have uploaded a set with 1200 Pubmed abstracts on Github, if anyone > wants to check: > > - see file: Example_Abstracts_Title_Pubmed.csv; > > github.com/discoleo/R/tree/master/TextMining/Pubmed > > The variable patt was not defined due to an error: but it took very long > to exit the operation and report the error. > > > Many thanks, > > > Leonard > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Bert Gunter
2022-May-19 00:31 UTC
[R] regexpr: R takes very long with non-existent pattern
Doubt that I can help, but what does "not defined" mean? -- NA, "", " " ? Something else? I would guess that if it's NA, you should get an immediate error. If it's "" , that's a legitimate pattern and would result in matches of 0 length for everything, which might trigger an error in other parts of your code. All a guess, though. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, May 18, 2022 at 5:08 PM Leonard Mada via R-help < r-help at r-project.org> wrote:> Dear R Users, > > > I have run the following command in R: > > # x = larger vector of strings (1200 Pubmed abstracts); > # patt = not defined; > npos = regexpr(patt, x, perl=TRUE); > # Error in regexpr(patt, x, perl = TRUE) : object 'patt' not found > > > The problem: > > R becomes unresponsive and it takes 1-2 minutes to return the error. The > operation completes almost instantaneously with a valid pattern. > > Is there a reason for this behavior? > > Tested with R 4.2.0 on MS Windows 10. > > > I have uploaded a set with 1200 Pubmed abstracts on Github, if anyone > wants to check: > > - see file: Example_Abstracts_Title_Pubmed.csv; > > github.com/discoleo/R/tree/master/TextMining/Pubmed > > The variable patt was not defined due to an error: but it took very long > to exit the operation and report the error. > > > Many thanks, > > > Leonard > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]