Good afternoon list, I have the following character strings; one with spaces between the maths operators and variable names, and one without said spaces. form<-c('~ Sentence + LEGAL + Intro + Intro / Intro1 + Intro * LEGAL + benefit + benefit / benefit1 + product + action * mean + CTA + help + mean * product') form<-c('~Sentence+LEGAL+Intro+Intro/Intro1+Intro*LEGAL+benefit+benefit/benefit1+product+action*mean+CTA+help+mean*product') I would like to remove the following target strings, either: 1. '+ Intro * LEGAL' which is '+ space name space * space name' 2. '+Intro*LEGAL' which is '+ nospace name nospace * nospace name' Having delved into a variety of sites (e.g. http://www.zytrax.com/tech/web/regex.htm#search) investigating regular expressions I now have a basic grasp, but I am having difficulties removing ALL of the instances or 1. or 2. The code below removes just a SINGLE instance of the target string, but I was expecting it to remove all instances as I have \\*.[[allnum]]. I did try \\*.[[allnum]]*, but this did not work. form<-sub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]", "", form) I am obviously still not understanding something. If the list could offer some guidance I would be most grateful. Regards Mike Griffiths -- *Michael Griffiths, Ph.D *Statistician *Upstream Systems* 8th Floor Portland House Bressenden Place SW1E 5BH <http://www.google.com/url?q=http%3A%2F%2Fwww.upstreamsystems.com%2F&sa=D&sntz=1&usg=AFrqEzfKYfaAalqvahwrpywpJDL9DxUmWw> Tel +44 (0) 20 7869 5147 Fax +44 207 290 1321 Mob +44 789 4944 145 www.upstreamsystems.com<http://www.google.com/url?q=http%3A%2F%2Fwww.upstreamsystems.com%2F&sa=D&sntz=1&usg=AFrqEzfKYfaAalqvahwrpywpJDL9DxUmWw> *griffiths@upstreamsystems.com <einstein@upstreamsystems.com>* <http://www.upstreamsystems.com/> [[alternative HTML version deleted]]
Hi Michael, You need to take another look at the examples you were given, and at the help for ?sub(): The two ?*sub? functions differ only in that ?sub? replaces only the first occurrence of a ?pattern? whereas ?gsub? replaces all occurrences. If ?replacement? contains backreferences which are not defined in ?pattern? the result is undefined (but most often the backreference is taken to be ?""?). Sarah On Tue, Nov 15, 2011 at 12:18 PM, Michael Griffiths <griffiths at upstreamsystems.com> wrote:> Good afternoon list, > > I have the following character strings; one with spaces between the maths > operators and variable names, and one without said spaces. > > form<-c('~ Sentence + LEGAL + Intro + Intro / Intro1 + Intro * LEGAL + > benefit + benefit / benefit1 + product + action * mean + CTA + help + mean > * product') > form<-c('~Sentence+LEGAL+Intro+Intro/Intro1+Intro*LEGAL+benefit+benefit/benefit1+product+action*mean+CTA+help+mean*product') > > I would like to remove the following target strings, either: > > 1. '+ Intro * LEGAL' which is ?'+ space name space * space name' > 2. '+Intro*LEGAL' which is ?'+ nospace name nospace * nospace name' > > Having delved into a variety of sites (e.g. > http://www.zytrax.com/tech/web/regex.htm#search) investigating regular > expressions I now have a basic grasp, but I am having difficulties removing > ALL of the instances or 1. or 2. > > The code below removes just a SINGLE instance of the target string, but I > was expecting it to remove all instances as I have \\*.[[allnum]]. I did > try \\*.[[allnum]]*, but this did not work. > > form<-sub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]", "", form) > > I am obviously still not understanding something. If the list could offer > some guidance I would be most grateful. > > Regards > > Mike Griffiths > > >-- Sarah Goslee http://www.functionaldiversity.org
Hi Michael, Your strings were long so I made a bit smaller example. Sarah made one good point, you want to be using gsub() not sub(), but when I use your code, I do not think it even works precisely for one instance. Try this on for size, you were 99% there: ## simplified cases form1 <- c('product + action * mean + CTA + help + mean * product') form2 <- c('product+action*mean+CTA+help+mean*product') ## what I believe your desired output is 'product + CTA + help' 'product+CTA+help' gsub("\\s\\+\\s[[:alnum:]]*\\s\\*\\s[[:alnum:]]*", "", form1) gsub("\\+[[:alnum:]]*\\*[[:alnum:]]*", "", form2) ## your code (using gsub() instead of sub()) gsub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]", "", form1) ######## Running on r57586 Windows x64 ########> gsub("\\s\\+\\s[[:alnum:]]*\\s\\*\\s[[:alnum:]]*", "", form1)[1] "product + CTA + help"> gsub("\\+[[:alnum:]]*\\*[[:alnum:]]*", "", form2)[1] "product+CTA+help"> > ## your code (using gsub() instead of sub()) > gsub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]", "", form1)[1] "product ean + CTA + help roduct" Hope this helps, Josh On Tue, Nov 15, 2011 at 9:18 AM, Michael Griffiths <griffiths at upstreamsystems.com> wrote:> Good afternoon list, > > I have the following character strings; one with spaces between the maths > operators and variable names, and one without said spaces. > > form<-c('~ Sentence + LEGAL + Intro + Intro / Intro1 + Intro * LEGAL + > benefit + benefit / benefit1 + product + action * mean + CTA + help + mean > * product') > form<-c('~Sentence+LEGAL+Intro+Intro/Intro1+Intro*LEGAL+benefit+benefit/benefit1+product+action*mean+CTA+help+mean*product') > > I would like to remove the following target strings, either: > > 1. '+ Intro * LEGAL' which is ?'+ space name space * space name' > 2. '+Intro*LEGAL' which is ?'+ nospace name nospace * nospace name' > > Having delved into a variety of sites (e.g. > http://www.zytrax.com/tech/web/regex.htm#search) investigating regular > expressions I now have a basic grasp, but I am having difficulties removing > ALL of the instances or 1. or 2. > > The code below removes just a SINGLE instance of the target string, but I > was expecting it to remove all instances as I have \\*.[[allnum]]. I did > try \\*.[[allnum]]*, but this did not work. > > form<-sub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]", "", form) > > I am obviously still not understanding something. If the list could offer > some guidance I would be most grateful. > > Regards > > Mike Griffiths > > > > -- > > *Michael Griffiths, Ph.D > *Statistician > > *Upstream Systems* > > 8th Floor > Portland House > Bressenden Place > SW1E 5BH > > <http://www.google.com/url?q=http%3A%2F%2Fwww.upstreamsystems.com%2F&sa=D&sntz=1&usg=AFrqEzfKYfaAalqvahwrpywpJDL9DxUmWw> > > Tel ? +44 (0) 20 7869 5147 > Fax ?+44 207 290 1321 > Mob +44 789 4944 145 > > www.upstreamsystems.com<http://www.google.com/url?q=http%3A%2F%2Fwww.upstreamsystems.com%2F&sa=D&sntz=1&usg=AFrqEzfKYfaAalqvahwrpywpJDL9DxUmWw> > > *griffiths at upstreamsystems.com <einstein at upstreamsystems.com>* > > <http://www.upstreamsystems.com/> > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/