hello, i am trying to extract text using regex as follows: "* < <* this is my text > > " into: "this is my text" below what I did: varReg <- "* < <* this is my text > > " ## either this pattern patReg <- "(^[ <*]+)" ## or below patten patReg <- "([ > ]+$)" sub(patReg, '', varReg) depending of which patten I use, I could only extra the first portion or the last portion of the unwanted characters. how to extract both ends and keep my text "this is my text" ? I have tried with gsub, as below: patReg <- "([ >* ]+)" gsub(patReg, '', varReg) but it returned "thisismytext" any idea is appreciated. thanks, ferry
If patReg1 and patReg2 are your two regex's then: gsub(paste(patReg1, patReg2, sep = "|"), "", varReg) On Mon, Nov 3, 2008 at 8:37 PM, Ferry <fmi.mlist at gmail.com> wrote:> hello, > > i am trying to extract text using regex as follows: > > "* < <* this is my text > > " > > into: > > "this is my text" > > below what I did: > > varReg <- "* < <* this is my text > > " > > ## either this pattern > patReg <- "(^[ <*]+)" > ## or below patten > patReg <- "([ > ]+$)" > > sub(patReg, '', varReg) > > depending of which patten I use, I could only extra the first portion > or the last portion of the unwanted characters. how to extract both > ends and keep my text "this is my text" ? > > I have tried with gsub, as below: > patReg <- "([ >* ]+)" > gsub(patReg, '', varReg) > > but it returned "thisismytext" > > any idea is appreciated. > > thanks, > > ferry > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Dear Ferry, You're almost all the way there. Just apply each substitution in turn: varReg <- "* < <* this is my text > > " left <- "(^[ <*]+)" right <- "([ > ]+$)" sub(right, "", sub(left, "", varReg)) [1] "this is my text" I hope this helps, John ------------------------------ John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada web: socserv.mcmaster.ca/jfox> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]On> Behalf Of Ferry > Sent: November-03-08 8:38 PM > To: r-help at r-project.org > Subject: [R] regex question > > hello, > > i am trying to extract text using regex as follows: > > "* < <* this is my text > > " > > into: > > "this is my text" > > below what I did: > > varReg <- "* < <* this is my text > > " > > ## either this pattern > patReg <- "(^[ <*]+)" > ## or below patten > patReg <- "([ > ]+$)" > > sub(patReg, '', varReg) > > depending of which patten I use, I could only extra the first portion > or the last portion of the unwanted characters. how to extract both > ends and keep my text "this is my text" ? > > I have tried with gsub, as below: > patReg <- "([ >* ]+)" > gsub(patReg, '', varReg) > > but it returned "thisismytext" > > any idea is appreciated. > > thanks, > > ferry > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Hi: Gabor's solution does do it in a single line. he just used paste to make the line. see below. John's is sort of a single line also but he called sub twice. I doubt that it's possible to make it shorter than those solutions. # Gabor's solution spelled out. patReg1 <- "(^[ <*]+)" patReg2 <- "([ > ]+$)" temp <- paste(patReg1, patReg2, sep = "|") print(temp) gsub(temp, "", varReg) On Tue, Nov 4, 2008 at 12:10 AM, Ferry wrote:> Dear John, Gabor ... > > Thank you for your fast responses. > In term of efficiency, does my code efficient? I mean, I thought there > is a way to combine both patterns into a single line. > > Also, I tried to substitute the pattern ([ <*]+) with ([[:punct:]]), > as in R regex docs: > patReg1 <- "(^[[:punct:]]+)" > > but it doesn't work. > > or, possibly it just my stupidity ? > > On Mon, Nov 3, 2008 at 5:59 PM, John Fox <jfox at mcmaster.ca> wrote: >> Dear Ferry, >> >> You're almost all the way there. Just apply each substitution in >> turn: >> >> varReg <- "* < <* this is my text > > " >> left <- "(^[ <*]+)" >> right <- "([ > ]+$)" >> sub(right, "", sub(left, "", varReg)) >> [1] "this is my text" >> >> I hope this helps, >> John >> >> ------------------------------ >> John Fox, Professor >> Department of Sociology >> McMaster University >> Hamilton, Ontario, Canada >> web: socserv.mcmaster.ca/jfox >> >> >>> -----Original Message----- >>> From: r-help-bounces at r-project.org >>> [mailto:r-help-bounces at r-project.org] >> On >>> Behalf Of Ferry >>> Sent: November-03-08 8:38 PM >>> To: r-help at r-project.org >>> Subject: [R] regex question >>> >>> hello, >>> >>> i am trying to extract text using regex as follows: >>> >>> "* < <* this is my text > > " >>> >>> into: >>> >>> "this is my text" >>> >>> below what I did: >>> >>> varReg <- "* < <* this is my text > > " >>> >>> ## either this pattern >>> patReg <- "(^[ <*]+)" >>> ## or below patten >>> patReg <- "([ > ]+$)" >>> >>> sub(patReg, '', varReg) >>> >>> depending of which patten I use, I could only extra the first >>> portion >>> or the last portion of the unwanted characters. how to extract both >>> ends and keep my text "this is my text" ? >>> >>> I have tried with gsub, as below: >>> patReg <- "([ >* ]+)" >>> gsub(patReg, '', varReg) >>> >>> but it returned "thisismytext" >>> >>> any idea is appreciated. >>> >>> thanks, >>> >>> ferry >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.