Hi, So I just took an intro to R programming class and one of the lectures was on Regular Expressions. I've been playing around with various R functions that use Regular Expressions. But this has me stumped. This was part of a quiz and I got it right through understanding the syntax. But when I try to run the thing it returns 'integer(0)'. Can you please tell me what I am doing wrong? #I copied and pasted this: going up and up and up night night at 8 bye bye from up high heading, heading by 9 #THEN lines<-readLines("clipboard") #This is what it looks like in R lines [1] "going up and up and up" [2] "night night at 8" [3] "bye bye from up high" [4] "heading, heading by 9" #THIS IS WHAT IS NOT WORKING THE WAY I THOUGHT. I was expecting it to return 2. # "night night at 8" follows the pattern: Begins with a word then has at least one space then the same word then has at least one space then a word then a space then a single digit number. grep("^([a-z]+) +\1 +[a-z]+ [0-9]",lines) integer(0) #But simple examples DO work grep("[Hh]",lines) [1] 2 3 4 grep('[0-9]',lines) [1] 2 4 [[alternative HTML version deleted]]
On Tue, Oct 29, 2013 at 1:13 PM, Lopez, Dan <lopez235 at llnl.gov> wrote:> grep("^([a-z]+) +\1 +[a-z]+ [0-9]",lines)Your expression has a typo: R> grep("^([a-z]+) +\\1 +[a-z]+ [0-9]",lines) [1] 2 -- Sarah Goslee http://www.functionaldiversity.org
Please read and follow the Posting Guide, in particular re plain text email. You need to keep in mind that the characters in literal strings in R source have to make it into RAM before the regex code can parse it. Since regex needs a single backslash to escape normal parsing and interpret 1 as a back reference, but the R parser also recognizes and removes backslashes in string literals as escape characters, you need to escape the backslash with a backslash in your R string literal. nchar tells you how many characters are in the string. print renders the string as it would need to be entered as R source code. cat sends the string directly to the output (console). Study the output of the following commands at the R prompt. ?Quotes nchar("^([a-z]+) +\1 +[a-z]+ [0-9]") print("^([a-z]+) +\1 +[a-z]+ [0-9]") cat("^([a-z]+) +\1 +[a-z]+ [0-9]") On most systems, a raw character code 1 is also known as Control-A, but the effect it has on the terminal used as the console may vary according to your setup, and it's effect on my system is not clear to me. nchar("^([a-z]+) +\\1 +[a-z]+ [0-9]") print("^([a-z]+) +\\1 +[a-z]+ [0-9]") cat("^([a-z]+) +\\1 +[a-z]+ [0-9]") grep("^([a-z]+) +\\1 +[a-z]+ [0-9]",lines) --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. "Lopez, Dan" <lopez235 at llnl.gov> wrote:>Hi, > >So I just took an intro to R programming class and one of the lectures >was on Regular Expressions. I've been playing around with various R >functions that use Regular Expressions. >But this has me stumped. This was part of a quiz and I got it right >through understanding the syntax. But when I try to run the thing it >returns 'integer(0)'. Can you please tell me what I am doing wrong? > >#I copied and pasted this: >going up and up and up >night night at 8 >bye bye from up high >heading, heading by 9 > >#THEN >lines<-readLines("clipboard") >#This is what it looks like in R >lines >[1] "going up and up and up" >[2] "night night at 8" >[3] "bye bye from up high" >[4] "heading, heading by 9" > >#THIS IS WHAT IS NOT WORKING THE WAY I THOUGHT. I was expecting it to >return 2. ># "night night at 8" follows the pattern: Begins with a word then has >at least one space then the same word then has at least one space then >a word then a space then a single digit number. >grep("^([a-z]+) +\1 +[a-z]+ [0-9]",lines) >integer(0) > >#But simple examples DO work >grep("[Hh]",lines) >[1] 2 3 4 >grep('[0-9]',lines) >[1] 2 4 > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
>From ?regex"(do remember that backslashes need to be doubled when entering R character strings, e.g. from the keyboard)."> lines[grep("^([a-z]+) +\\1 +[a-z]+ [0-9]",lines)][1] "night night at 8" ------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Lopez, Dan Sent: Tuesday, October 29, 2013 12:13 PM To: R help (r-help at r-project.org) Subject: [R] Regular Expression returning unexpected results Hi, So I just took an intro to R programming class and one of the lectures was on Regular Expressions. I've been playing around with various R functions that use Regular Expressions. But this has me stumped. This was part of a quiz and I got it right through understanding the syntax. But when I try to run the thing it returns 'integer(0)'. Can you please tell me what I am doing wrong? #I copied and pasted this: going up and up and up night night at 8 bye bye from up high heading, heading by 9 #THEN lines<-readLines("clipboard") #This is what it looks like in R lines [1] "going up and up and up" [2] "night night at 8" [3] "bye bye from up high" [4] "heading, heading by 9" #THIS IS WHAT IS NOT WORKING THE WAY I THOUGHT. I was expecting it to return 2. # "night night at 8" follows the pattern: Begins with a word then has at least one space then the same word then has at least one space then a word then a space then a single digit number. grep("^([a-z]+) +\1 +[a-z]+ [0-9]",lines) integer(0) #But simple examples DO work grep("[Hh]",lines) [1] 2 3 4 grep('[0-9]',lines) [1] 2 4 [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.