Hi, I am getting stuck over an apparently simple problem in the use of regular expressions : To collect together the first letters of the words from the Perl motto, ?There is more than one way to do it? in the following form ? TIMTOWTDI. I tried the following code : ? ##### A regex problem with the Perl motto astr<-"There is more than one way to do it" b1<-grep("\\<", astr,value=T) ## This just?retrieves ?the whole string ## Next trial with gregexpr b2<-gregexpr("\\<",astr) ##?This gives ?:> b3[[1]] [1]? 1? 7 10 15 20 24 28 31 34 attr(,"match.length") [1] 0 0 0 0 0 0 0 0 0 ? A vector of indices corresponding to the first letter is obtained all right with gregexpr but the next step is not so clear. I am not able to figure out how I can use this information to pick out the letters from the original string. My problem is that I don?t know how I can treat the string as a vector and pluck out the letters. ? There may be many ways to do it, but I have not succeeded in coming up with even one way! I will appreciate any tips that I can get. Thanking you, Ravi
Try this:> library(gsubfn) > strapply(astr, "\\w+", ~ substr(x, 1, 1), simplify = c)[1] "T" "i" "m" "t" "o" "w" "t" "d" "i" "i" "t" "f" "f" "T" On Tue, Aug 4, 2009 at 1:28 PM, ravi<rv15i at yahoo.se> wrote:> > Hi, > I am getting stuck over an apparently simple problem in the use of regular expressions : > To collect together the first letters of the words from the Perl motto, ?There is more than one way to do it? in the following form ? TIMTOWTDI. > I tried the following code : > > ##### A regex problem with the Perl motto > astr<-"There is more than one way to do it" > b1<-grep("\\<", astr,value=T) > ## This just?retrieves ?the whole string > ## Next trial with gregexpr > b2<-gregexpr("\\<",astr) > ##?This gives ?: >> b3 > [[1]] > [1]? 1? 7 10 15 20 24 28 31 34 > attr(,"match.length") > [1] 0 0 0 0 0 0 0 0 0 > > A vector of indices corresponding to the first letter is obtained all right with gregexpr but the next step is not so clear. I am not able to figure out how I can use this information to pick out the letters from the original string. My problem is that I don?t know how I can treat the string as a vector and pluck out the letters. > > There may be many ways to do it, but I have not succeeded in coming up with even one way! I will appreciate any tips that I can get. > Thanking you, > Ravi > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
And here is a second way:> strapply(astr, "(\\w)\\w+", c, simplify = c)[1] "T" "i" "m" "t" "o" "w" "t" "d" "i" "i" "t" "f" "f" "T" On Tue, Aug 4, 2009 at 1:42 PM, Gabor Grothendieck<ggrothendieck at gmail.com> wrote:> Try this: > >> library(gsubfn) >> strapply(astr, "\\w+", ~ substr(x, 1, 1), simplify = c) > ?[1] "T" "i" "m" "t" "o" "w" "t" "d" "i" "i" "t" "f" "f" "T" > > > On Tue, Aug 4, 2009 at 1:28 PM, ravi<rv15i at yahoo.se> wrote: >> >> Hi, >> I am getting stuck over an apparently simple problem in the use of regular expressions : >> To collect together the first letters of the words from the Perl motto, ?There is more than one way to do it? in the following form ? TIMTOWTDI. >> I tried the following code : >> >> ##### A regex problem with the Perl motto >> astr<-"There is more than one way to do it" >> b1<-grep("\\<", astr,value=T) >> ## This just?retrieves ?the whole string >> ## Next trial with gregexpr >> b2<-gregexpr("\\<",astr) >> ##?This gives ?: >>> b3 >> [[1]] >> [1]? 1? 7 10 15 20 24 28 31 34 >> attr(,"match.length") >> [1] 0 0 0 0 0 0 0 0 0 >> >> A vector of indices corresponding to the first letter is obtained all right with gregexpr but the next step is not so clear. I am not able to figure out how I can use this information to pick out the letters from the original string. My problem is that I don?t know how I can treat the string as a vector and pluck out the letters. >> >> There may be many ways to do it, but I have not succeeded in coming up with even one way! I will appreciate any tips that I can get. >> Thanking you, >> Ravi >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >
Ravi, Here is a third way to do this, but it doesn't make use of regular expressions per se:> avec <- unlist(strsplit(astr, "")) # First convert astr to a vector > avec[c(1, 1 + grep(" ", avec))][1] "T" "i" "m" "t" "o" "w" "t" "d" "i" This latter expression subscripts avec by concatenating the first position, and 1 + the position of each blank in the character vector. Here is yet a fourth way that does use a regular expression:> avec[unlist(gregexpr("\\<[[:alpha:]]", astr))] # avec from above[1] "T" "i" "m" "t" "o" "w" "t" "d" "i" The components of this regular expression can be broken down as follows: "\\<" The empty string at the beginning of a word. R requires the extra backslash. "[[:alpha:]]" Any alphabetic character, upper or lower case gregexpr() returns a list; unlist() converts the list to a vector, each element of which points to the first character of a word in astr. That result can be used to subscript avec. Best regards, Chuck Taylor TIBCO Spotfire Seattle, WA, USA -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of ravi Sent: Tuesday, August 04, 2009 10:28 AM To: r-help at r-project.org Subject: [R] regex question Hi, I am getting stuck over an apparently simple problem in the use of regular expressions : To collect together the first letters of the words from the Perl motto, ?There is more than one way to do it? in the following form ? TIMTOWTDI. I tried the following code : ? ##### A regex problem with the Perl motto astr<-"There is more than one way to do it" b1<-grep("\\<", astr,value=T) ## This just?retrieves ?the whole string ## Next trial with gregexpr b2<-gregexpr("\\<",astr) ##?This gives ?:> b3[[1]] [1]? 1? 7 10 15 20 24 28 31 34 attr(,"match.length") [1] 0 0 0 0 0 0 0 0 0 ? A vector of indices corresponding to the first letter is obtained all right with gregexpr but the next step is not so clear. I am not able to figure out how I can use this information to pick out the letters from the original string. My problem is that I don?t know how I can treat the string as a vector and pluck out the letters. ? There may be many ways to do it, but I have not succeeded in coming up with even one way! I will appreciate any tips that I can get. Thanking you, Ravi ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Here's what I came up with:> gsub("(\\w)[^ ]+[\\b ]", "\\1", astr)[1] "Timtowtdit" You might be interested in Regular Expressions Cookbook from O'Reilly (publisher not author) or http://www.regular-expressions.info/ I usually bumble along knowing there are better ways to do whatever I am doing. Michael