Dan Abner
2017-Sep-28 20:25 UTC
[R] Searching for Enumerated Items using str_count() from the stringr package
Hi all, I have a large number of text strings to search for enumerated items. However, I am receiving this error message even though I thought that I properly escaped the special character closed parenthesis:> Count<-str_count(text3,keywords)Error in stri_count_regex(string, pattern, opts_regex = opts(pattern)) : Syntax error in regexp pattern. (U_REGEX_RULE_SYNTAX) == Here is example code: text1<-"This is a list: 1) Number 1 2) Etc 3) Etc" text2<-"This is NOT a list: Blah, blah, blah Blah, blah, blah" text3<-c(text1,text2) text3 {keywords<-c(paste(0:9,"\\)"),paste(0:9,"\\)",sep=""), paste(0:9,"."),paste(0:9,".",sep=""),"-","*")} keywords Count<-str_count(text3,keywords) == I am looking for Count<-c(3,0) Any suggestions? Thanks! Dan [[alternative HTML version deleted]]
Tóth Dénes
2017-Sep-28 22:02 UTC
[R] Searching for Enumerated Items using str_count() from the stringr package
On 09/28/2017 10:25 PM, Dan Abner wrote:> Hi all, > > I have a large number of text strings to search for enumerated items. > However, I am receiving this error message even though I thought that I > properly escaped the special character closed parenthesis: > > >> Count<-str_count(text3,keywords) > Error in stri_count_regex(string, pattern, opts_regex = opts(pattern)) : > Syntax error in regexp pattern. (U_REGEX_RULE_SYNTAX) > > > ==> > Here is example code: > > > text1<-"This is a list: > 1) Number 1 > 2) Etc > 3) Etc" > > text2<-"This is NOT a list: > Blah, blah, blah > Blah, blah, blah" > > text3<-c(text1,text2) > text3 > > {keywords<-c(paste(0:9,"\\)"),paste(0:9,"\\)",sep=""), > paste(0:9,"."),paste(0:9,".",sep=""),"-","*")} >You should carefully read the docs, see ?regexp. You really do not want to pass a multi-element vector as 'keywords' in this case, but instead: stri_count_regex(text3, "[0-9]+\\) ") or: stri_count_regex(text3, "[[:digit:]]+\\) ") BTW, I do not understand why to use the stringr package if it is just a wrapper around the stringi package. Regards, Denes> keywords > > Count<-str_count(text3,keywords) > > ==> > I am looking for Count<-c(3,0) > > Any suggestions? > > Thanks! > > Dan > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dr. T?th D?nes ?gyvezet? Kogentum Kft. Tel.: 06-30-2583723 Web: www.kogentum.hu
Tóth Dénes
2017-Sep-28 22:14 UTC
[R] Searching for Enumerated Items using str_count() from the stringr package
On 09/29/2017 12:02 AM, T?th D?nes wrote:> > > On 09/28/2017 10:25 PM, Dan Abner wrote: >> Hi all, >> >> I have a large number of text strings to search for enumerated items. >> However, I am receiving this error message even though I thought that I >> properly escaped the special character closed parenthesis: >> >> >>> Count<-str_count(text3,keywords) >> Error in stri_count_regex(string, pattern, opts_regex = opts(pattern)) : >> Syntax error in regexp pattern. (U_REGEX_RULE_SYNTAX) >> >> >> ==>> >> Here is example code: >> >> >> text1<-"This is a list: >> 1) Number 1 >> 2) Etc >> 3) Etc" >> >> text2<-"This is NOT a list: >> Blah, blah, blah >> Blah, blah, blah" >> >> text3<-c(text1,text2) >> text3 >> >> {keywords<-c(paste(0:9,"\\)"),paste(0:9,"\\)",sep=""), >> paste(0:9,"."),paste(0:9,".",sep=""),"-","*")} >> > > You should carefully read the docs, see ?regexp. > You really do not want to pass a multi-element vector as 'keywords' in > this case, but instead: > > stri_count_regex(text3, "[0-9]+\\) ") > > or: > > stri_count_regex(text3, "[[:digit:]]+\\) ") >Ah, now I see what you were after: enumerations are not in a standard format, so "1) " can be "1)", "1.", "1 .". In this case: text <- "1)Hello\n2.Hi\n3 .Cheers" keywords <- "[0-9]+(\\)| *?\\.)" stri_count_regex(text, keywords) Note the '|' sign in the keyword definition. It means OR in this context. So literally the regexp expression above can be translated as: A digit or a digit string followed by a parenthesis, or by arbitrary number of spaces (even 0) before a dot. HTH, Denes> BTW, I do not understand why to use the stringr package if it is just a > wrapper around the stringi package. > > Regards, > Denes > > > > >> keywords >> >> Count<-str_count(text3,keywords) >> >> ==>> >> I am looking for Count<-c(3,0) >> >> Any suggestions? >> >> Thanks! >> >> Dan >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >-- Dr. T?th D?nes ?gyvezet? Kogentum Kft. Tel.: 06-30-2583723 Web: www.kogentum.hu