Hi All: I have a regular expression problem. If a character string ends with "rhofixed" or "norhofixed", I want that part of the string to be removed. If it doesn't end with either of those two endings, then the result should be the same as the original. Below doesn't work for the second case. I know why but not how to fix it. I lookrd st friedl's book and I bet it's in there somewhere but I can't find it. Thanks. s <- c("lngimbintrhofixed","lngimbnointnorhofixed","test") result <- sub("^(.*)([n.*|r.*].*)$","\\1",s) print(result) [1] "lngimbint" "lngimbnointno" "test" [[alternative HTML version deleted]]
No HTML please. it makes me itchy! <grin/>> s <- c("lngimbintrhofixed","lngimbnointnorhofixed","test") > sub('(no)?rhofixed$','',s)[1] "lngimbint" "lngimbnoint" "test">On Mon, Jan 12, 2015 at 1:37 PM, Mark Leeds <markleeds2 at gmail.com> wrote:> Hi All: I have a regular expression problem. If a character string ends > with "rhofixed" or "norhofixed", I want that part of the string to be > removed. If it doesn't end with either of those two endings, then the > result should be the same as the original. Below doesn't work for the > second case. I know why but not how to fix it. I lookrd st friedl's book > and I bet it's in there somewhere but I can't find it. Thanks. > > s <- c("lngimbintrhofixed","lngimbnointnorhofixed","test") > > result <- sub("^(.*)([n.*|r.*].*)$","\\1",s) > > print(result) > [1] "lngimbint" "lngimbnointno" "test" > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- ? While a transcendent vocabulary is laudable, one must be eternally careful so that the calculated objective of communication does not become ensconced in obscurity. In other words, eschew obfuscation. 111,111,111 x 111,111,111 = 12,345,678,987,654,321 Maranatha! <>< John McKown [[alternative HTML version deleted]]
Hi Mark, Mark Leeds <markleeds2 at gmail.com> writes:> Hi All: I have a regular expression problem. If a character string ends > with "rhofixed" or "norhofixed", I want that part of the string to be > removed. If it doesn't end with either of those two endings, then the > result should be the same as the original. Below doesn't work for the > second case. I know why but not how to fix it. I lookrd st friedl's book > and I bet it's in there somewhere but I can't find it. Thanks. > > s <- c("lngimbintrhofixed","lngimbnointnorhofixed","test") > > result <- sub("^(.*)([n.*|r.*].*)$","\\1",s) > > print(result) > [1] "lngimbint" "lngimbnointno" "test" > > [[alternative HTML version deleted]] >The matching of the initial .* is by default greedy, so it will match everything before the last 'n' or 'r'. As you always have an 'r' in 'rho', your 'no' gets eaten by the first pattern. You can make a pattern non-greedy by appending '?' to the quantifier. I would do> s <- c("lngimbintrhofixed","lngimbnointnorhofixed","test") > result <- sub("^(.*?)((no)?rhofixed)$","\\1",s) > result[1] "lngimbint" "lngimbnoint" "test" Cheers, Loris -- This signature is currently under construction.
I know you already have a couple of solutions, but I would like to mention that it can be done in two steps with very simple regular expressions. I would have done: s <- c("lngimbintrhofixed","lngimbnointnorhofixed","test", 'rhofixedtest','norhofixedtest') res <- gsub('norhofixed$', '',s) res <- gsub('rhofixed$', '',res) res [1] "lngimbint" "lngimbnoint" "test" "rhofixedtest" "norhofixedtest" (this is for those of us who don't understand regular expressions very well!) -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 1/12/15, 11:37 AM, "Mark Leeds" <markleeds2 at gmail.com> wrote:>Hi All: I have a regular expression problem. If a character string ends >with "rhofixed" or "norhofixed", I want that part of the string to be >removed. If it doesn't end with either of those two endings, then the >result should be the same as the original. Below doesn't work for the >second case. I know why but not how to fix it. I lookrd st friedl's book >and I bet it's in there somewhere but I can't find it. Thanks. > >s <- c("lngimbintrhofixed","lngimbnointnorhofixed","test") > >result <- sub("^(.*)([n.*|r.*].*)$","\\1",s) > > print(result) >[1] "lngimbint" "lngimbnointno" "test" > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
On Wed, Jan 14, 2015 at 10:03 AM, MacQueen, Don <macqueen1 at llnl.gov> wrote:> I know you already have a couple of solutions, but I would like to mention > that it can be done in two steps with very simple regular expressions. I > would have done: > > s <- c("lngimbintrhofixed","lngimbnointnorhofixed","test", > 'rhofixedtest','norhofixedtest') > res <- gsub('norhofixed$', '',s) > res <- gsub('rhofixed$', '',res) > res > [1] "lngimbint" "lngimbnoint" "test" > "rhofixedtest" "norhofixedtest" > > > (this is for those of us who don't understand regular expressions very > well!) >?There is one possible problem with your solution.? Consider the string: arhofixednorhofixed. It ends with norhofixed and, according to the original specification, needs to result in arhofixed. (I will admit this is a contrived case which is very unlikely to occur in reality). But since you do TWO regular expressions, first removing the trailing norhofixed, resulting in "arhofixed" (the correct answer?), but then reduces that to simply "a". The other regular expressions correctly remove either norhofixed or rhofixed, if they are written _correctly_. That is, they check first for norhofixed, with an alternate of rhofixed, or conditionally match the no in front of the rhofixed at the very end of the string (my example). To be even more explicit the regexp "nohrofixed|rhofixed" will work properly but "rhofixed|norhofixed" will not because the "norhofixed" won't be looked for if the "rhofixed" matches. Yes, regular expressions can be complicated. Although I have a liking for them due to their expressiveness and power, it is like an person using raw nitroglycerin instead of dynamite. Dangerous.> > -Don > > -- > Don MacQueen > > Lawrence Livermore National Laboratory >-- ? While a transcendent vocabulary is laudable, one must be eternally careful so that the calculated objective of communication does not become ensconced in obscurity. In other words, eschew obfuscation. 111,111,111 x 111,111,111 = 12,345,678,987,654,321 Maranatha! <>< John McKown [[alternative HTML version deleted]]