thr3ads.net - R help - [R] Regular expressions in R [Nov 2011]

If this information is useful, please help other people find it:
Share via:

Michael Griffiths

2011-Nov-15 17:18 UTC

[R] Regular expressions in R

Good afternoon list,

I have the following character strings; one with spaces between the maths
operators and variable names, and one without said spaces.

form<-c('~ Sentence + LEGAL + Intro + Intro / Intro1 + Intro * LEGAL +
benefit + benefit / benefit1 + product + action * mean + CTA + help + mean
* product')
form<-c('~Sentence+LEGAL+Intro+Intro/Intro1+Intro*LEGAL+benefit+benefit/benefit1+product+action*mean+CTA+help+mean*product')

I would like to remove the following target strings, either:

1. '+ Intro * LEGAL' which is  '+ space name space * space name'
2. '+Intro*LEGAL' which is  '+ nospace name nospace * nospace
name'

Having delved into a variety of sites (e.g.
http://www.zytrax.com/tech/web/regex.htm#search) investigating regular
expressions I now have a basic grasp, but I am having difficulties removing
ALL of the instances or 1. or 2.

The code below removes just a SINGLE instance of the target string, but I
was expecting it to remove all instances as I have \\*.[[allnum]]. I did
try \\*.[[allnum]]*, but this did not work.

form<-sub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]", "",
form)

I am obviously still not understanding something. If the list could offer
some guidance I would be most grateful.

Regards

Mike Griffiths



-- 

*Michael Griffiths, Ph.D
*Statistician

*Upstream Systems*

8th Floor
Portland House
Bressenden Place
SW1E 5BH

<http://www.google.com/url?q=http%3A%2F%2Fwww.upstreamsystems.com%2F&sa=D&sntz=1&usg=AFrqEzfKYfaAalqvahwrpywpJDL9DxUmWw>

Tel   +44 (0) 20 7869 5147
Fax  +44 207 290 1321
Mob +44 789 4944 145

www.upstreamsystems.com<http://www.google.com/url?q=http%3A%2F%2Fwww.upstreamsystems.com%2F&sa=D&sntz=1&usg=AFrqEzfKYfaAalqvahwrpywpJDL9DxUmWw>

*griffiths@upstreamsystems.com <einstein@upstreamsystems.com>*

<http://www.upstreamsystems.com/>

	[[alternative HTML version deleted]]

Sarah Goslee

2011-Nov-15 17:30 UTC

head link

[R] Regular expressions in R

Hi Michael,

You need to take another look at the examples you were given, and at
the help for ?sub():

     The two ?*sub? functions differ only in that ?sub? replaces only
     the first occurrence of a ?pattern? whereas ?gsub? replaces all
     occurrences.  If ?replacement? contains backreferences which are
     not defined in ?pattern? the result is undefined (but most often
     the backreference is taken to be ?""?).

Sarah

On Tue, Nov 15, 2011 at 12:18 PM, Michael Griffiths
<griffiths at upstreamsystems.com> wrote:> Good afternoon list,
>
> I have the following character strings; one with spaces between the maths
> operators and variable names, and one without said spaces.
>
> form<-c('~ Sentence + LEGAL + Intro + Intro / Intro1 + Intro * LEGAL
+
> benefit + benefit / benefit1 + product + action * mean + CTA + help + mean
> * product')
>
form<-c('~Sentence+LEGAL+Intro+Intro/Intro1+Intro*LEGAL+benefit+benefit/benefit1+product+action*mean+CTA+help+mean*product')
>
> I would like to remove the following target strings, either:
>
> 1. '+ Intro * LEGAL' which is ?'+ space name space * space
name'
> 2. '+Intro*LEGAL' which is ?'+ nospace name nospace * nospace
name'
>
> Having delved into a variety of sites (e.g.
> http://www.zytrax.com/tech/web/regex.htm#search) investigating regular
> expressions I now have a basic grasp, but I am having difficulties removing
> ALL of the instances or 1. or 2.
>
> The code below removes just a SINGLE instance of the target string, but I
> was expecting it to remove all instances as I have \\*.[[allnum]]. I did
> try \\*.[[allnum]]*, but this did not work.
>
> form<-sub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]",
"", form)
>
> I am obviously still not understanding something. If the list could offer
> some guidance I would be most grateful.
>
> Regards
>
> Mike Griffiths
>
>
>-- 
Sarah Goslee
http://www.functionaldiversity.org

Joshua Wiley

2011-Nov-15 17:47 UTC

head link

[R] Regular expressions in R

Hi Michael,

Your strings were long so I made a bit smaller example.  Sarah made
one good point, you want to be using gsub() not sub(), but when I use
your code, I do not think it even works precisely for one instance.
Try this on for size, you were 99% there:

## simplified cases
form1 <- c('product + action * mean + CTA + help + mean * product')
form2 <- c('product+action*mean+CTA+help+mean*product')

## what I believe your desired output is
'product + CTA + help'
'product+CTA+help'

gsub("\\s\\+\\s[[:alnum:]]*\\s\\*\\s[[:alnum:]]*", "",
form1)
gsub("\\+[[:alnum:]]*\\*[[:alnum:]]*", "", form2)

## your code (using gsub() instead of sub())
gsub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]", "", form1)


######## Running on r57586 Windows x64 ########> gsub("\\s\\+\\s[[:alnum:]]*\\s\\*\\s[[:alnum:]]*", "",
form1)
[1] "product + CTA + help"> gsub("\\+[[:alnum:]]*\\*[[:alnum:]]*", "", form2)
[1] "product+CTA+help">
> ## your code (using gsub() instead of sub())
> gsub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]", "",
form1)[1] "product ean + CTA + help roduct"

Hope this helps,

Josh

On Tue, Nov 15, 2011 at 9:18 AM, Michael Griffiths
<griffiths at upstreamsystems.com> wrote:> Good afternoon list,
>
> I have the following character strings; one with spaces between the maths
> operators and variable names, and one without said spaces.
>
> form<-c('~ Sentence + LEGAL + Intro + Intro / Intro1 + Intro * LEGAL
+
> benefit + benefit / benefit1 + product + action * mean + CTA + help + mean
> * product')
>
form<-c('~Sentence+LEGAL+Intro+Intro/Intro1+Intro*LEGAL+benefit+benefit/benefit1+product+action*mean+CTA+help+mean*product')
>
> I would like to remove the following target strings, either:
>
> 1. '+ Intro * LEGAL' which is ?'+ space name space * space
name'
> 2. '+Intro*LEGAL' which is ?'+ nospace name nospace * nospace
name'
>
> Having delved into a variety of sites (e.g.
> http://www.zytrax.com/tech/web/regex.htm#search) investigating regular
> expressions I now have a basic grasp, but I am having difficulties removing
> ALL of the instances or 1. or 2.
>
> The code below removes just a SINGLE instance of the target string, but I
> was expecting it to remove all instances as I have \\*.[[allnum]]. I did
> try \\*.[[allnum]]*, but this did not work.
>
> form<-sub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]",
"", form)
>
> I am obviously still not understanding something. If the list could offer
> some guidance I would be most grateful.
>
> Regards
>
> Mike Griffiths
>
>
>
> --
>
> *Michael Griffiths, Ph.D
> *Statistician
>
> *Upstream Systems*
>
> 8th Floor
> Portland House
> Bressenden Place
> SW1E 5BH
>
>
<http://www.google.com/url?q=http%3A%2F%2Fwww.upstreamsystems.com%2F&sa=D&sntz=1&usg=AFrqEzfKYfaAalqvahwrpywpJDL9DxUmWw>
>
> Tel ? +44 (0) 20 7869 5147
> Fax ?+44 207 290 1321
> Mob +44 789 4944 145
>
>
www.upstreamsystems.com<http://www.google.com/url?q=http%3A%2F%2Fwww.upstreamsystems.com%2F&sa=D&sntz=1&usg=AFrqEzfKYfaAalqvahwrpywpJDL9DxUmWw>
>
> *griffiths at upstreamsystems.com <einstein at upstreamsystems.com>*
>
> <http://www.upstreamsystems.com/>
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, ATS Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

Seemingly Similar Threads

Search for more reasonably related threads

R help - Nov 2011 - Regular expressions in R

[R] Regular expressions in R

[R] Regular expressions in R

[R] Regular expressions in R

Seemingly Similar Threads