thr3ads.net - R help - [R] Need some help with regular expression [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Steven Nagy

2016-Nov-20 04:06 UTC

[R] Need some help with regular expression

I tried out a regular expression on this website:

http://regexr.com/3en1m

 

So the input text is:

"Name.MEMBER_TYPE:  -> STU"

 

The regular expression is: ((?:\w+|\s) -> STU|STU -> (?:\w+|\s))

And it returns:

"  -> STU"

 

but when I use in R, it doesn't return the same result:

strapply(c, "((?:\\w+|\\s) -> STU|STU -> (?:\\w+|\\s))", c,
backref = -1,
perl = TRUE)

returns:
"Name.MEMBER_TYPE: -> STU"

 

 

Here is what I was trying to do:

 

I need to extract some values from a log table, and I created a regular
expression that helps me with that.

The log table has cells with values like:

a = "Name.MEMBER_TYPE: NMA -> STU ; CATEGORY:  -> 1 ; CITY:
MISSISSAUGA ->
Mississauga ; ZIP: L5N1H9 -> L5N 1H9 ; COUNTRY: CAN ->  ; MEMBER_STATUS: 
->
N"

or
b = "Name.MEMBER_TYPE: STU -> REG ; CATEGORY: 1 ->" 

so I needed to extract the values that a STU member type is changing from
and to, so I needed NMA, STU in the 1st case or STU, REG in the 2nd case.

I came up with this expression which worked in both cases:

strapply(strapply(a, "(\\w+ -> STU|STU -> \\w+)", c, backref =
-1, perl TRUE), "(\\w+) -> (\\w+)", c, backref = -2, perl = TRUE)

 

But I had a 3rd case when the source member type was blank:

c = "Name.MEMBER_TYPE: -> STU"

and in that case it returned an error:

strapply(strapply(c, "(\\w+ -> STU|STU -> \\w+)", c, backref =
-1, perl TRUE), "(\\w+) -> (\\w+)", c, backref = -2, perl = TRUE)

Error: is.character(x) is not TRUE

 

I found that the error is because this returns NULL:

strapply(c, "(\\w+ -> STU|STU -> \\w+)", c, backref = -1, perl =
TRUE)

 

 

So I tried to modify the regular expression to match any word or blank
space:

strapply(c, "((?:\\w+|\\s) -> STU|STU -> (?:\\w+|\\s))", c,
backref = -1,
perl = TRUE)

 

but this returned me the whole value of "c":

"Name.MEMBER_TYPE:  -> STU"

and I only needed "  -> STU" as it shows on the website regxr.com

 

Is the result wrong on the regxr.com website or strapply returns the wrong
result?

 

Thanks,

Steven


	[[alternative HTML version deleted]]

Bert Gunter

2016-Nov-20 19:15 UTC

head link

[R] Need some help with regular expression

If I understand you correctly, I think you are making it more complex
than necessary. Using your example (thanks!!), the following should
get you started:

> x<- c("Name.MEMBER_TYPE: NMA -> STU ; CATEGORY:  -> 1 ; CITY:
MISSISSAUGA -> Mississauga ; ZIP: L5N1H9 -> L5N 1H9 ; COUNTRY: CAN -> 
; MEMBER_STATUS:  -> N", "Name.MEMBER_TYPE: STU -> REG ;
CATEGORY: 1 ->","Name.MEMBER_TYPE: -> STU")
>
> x[1] "Name.MEMBER_TYPE: NMA -> STU ; CATEGORY:  -> 1 ; CITY:
MISSISSAUGA -> Mississauga ; ZIP: L5N1H9 -> L5N 1H9 ; COUNTRY: CAN ->
; MEMBER_STATUS:  -> N"

[2] "Name.MEMBER_TYPE: STU -> REG ; CATEGORY: 1 ->"
[3] "Name.MEMBER_TYPE: -> STU">
> sub(".*: *([[:alnum:]]* *-> *STU|STU *->
*[[:alnum:]]*).*","\\1",x)[1] "NMA -> STU" "STU -> REG" "-> STU"


I am sure that you can get things to the form you desire in one go
with some fiddling of the above, but it was easier for me to write the
regex to pick out the pieces you wanted and leave the rest to you.
Others may have slicker ways to do it, of course.

HTH

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Nov 19, 2016 at 8:06 PM, Steven Nagy <nstefi at gmail.com>
wrote:> I tried out a regular expression on this website:
>
> http://regexr.com/3en1m
>
>
>
> So the input text is:
>
> "Name.MEMBER_TYPE:  -> STU"
>
>
>
> The regular expression is: ((?:\w+|\s) -> STU|STU -> (?:\w+|\s))
>
> And it returns:
>
> "  -> STU"
>
>
>
> but when I use in R, it doesn't return the same result:
>
> strapply(c, "((?:\\w+|\\s) -> STU|STU -> (?:\\w+|\\s))", c,
backref = -1,
> perl = TRUE)
>
> returns:
> "Name.MEMBER_TYPE: -> STU"
>
>
>
>
>
> Here is what I was trying to do:
>
>
>
> I need to extract some values from a log table, and I created a regular
> expression that helps me with that.
>
> The log table has cells with values like:
>
> a = "Name.MEMBER_TYPE: NMA -> STU ; CATEGORY:  -> 1 ; CITY:
MISSISSAUGA ->
> Mississauga ; ZIP: L5N1H9 -> L5N 1H9 ; COUNTRY: CAN ->  ;
MEMBER_STATUS:  ->
> N"
>
> or
> b = "Name.MEMBER_TYPE: STU -> REG ; CATEGORY: 1 ->"
>
> so I needed to extract the values that a STU member type is changing from
> and to, so I needed NMA, STU in the 1st case or STU, REG in the 2nd case.
>
> I came up with this expression which worked in both cases:
>
> strapply(strapply(a, "(\\w+ -> STU|STU -> \\w+)", c,
backref = -1, perl > TRUE), "(\\w+) -> (\\w+)", c, backref = -2,
perl = TRUE)
>
>
>
> But I had a 3rd case when the source member type was blank:
>
> c = "Name.MEMBER_TYPE: -> STU"
>
> and in that case it returned an error:
>
> strapply(strapply(c, "(\\w+ -> STU|STU -> \\w+)", c,
backref = -1, perl > TRUE), "(\\w+) -> (\\w+)", c, backref = -2,
perl = TRUE)
>
> Error: is.character(x) is not TRUE
>
>
>
> I found that the error is because this returns NULL:
>
> strapply(c, "(\\w+ -> STU|STU -> \\w+)", c, backref = -1,
perl = TRUE)
>
>
>
>
>
> So I tried to modify the regular expression to match any word or blank
> space:
>
> strapply(c, "((?:\\w+|\\s) -> STU|STU -> (?:\\w+|\\s))", c,
backref = -1,
> perl = TRUE)
>
>
>
> but this returned me the whole value of "c":
>
> "Name.MEMBER_TYPE:  -> STU"
>
> and I only needed "  -> STU" as it shows on the website
regxr.com
>
>
>
> Is the result wrong on the regxr.com website or strapply returns the wrong
> result?
>
>
>
> Thanks,
>
> Steven
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Steven Nagy

2016-Nov-21 02:40 UTC

head link

[R] Need some help with regular expression

Thanks a lot Bert. That's amazing. I am very new to both R and regular
expressions. I don't really understand the regular expression that you used
below.
And looks like I don't even need any special library, like the
"gsubfn" for
the strapply function.
I was trying to use the regexr.com website to analyze your regular
expression, but it doesn't seem to match any text there.
Can you explain me the regular expression that you used?
".*: *([[:alnum:]]* *-> *STU|STU *-> *[[:alnum:]]*).*"
So the dot in the front means any character and the star after that means
that it can repeat 0 or more times, right?
Then followed by a colon character ":" and a space, and what is the
next
star after that? It means that the sequence before that again can repeat 0
or more times?
And what are the double square brackets?
Is ":alnum:" specific to R? I don't think "regexr.com"
understands that. Or
maybe that site is for regular expressions in Javascript, and the syntax is
different in R?

Thank you,
Steven

-----Original Message-----
From: Bert Gunter [mailto:bgunter.4567 at gmail.com] 
Sent: Sunday, November 20, 2016 2:15 PM
To: Steven Nagy <nstefi at gmail.com>
Cc: R-help <r-help at r-project.org>
Subject: Re: [R] Need some help with regular expression

If I understand you correctly, I think you are making it more complex than
necessary. Using your example (thanks!!), the following should get you
started:

> x<- c("Name.MEMBER_TYPE: NMA -> STU ; CATEGORY:  -> 1 ; CITY:
> MISSISSAUGA -> Mississauga ; ZIP: L5N1H9 -> L5N 1H9 ; COUNTRY: CAN
->
> ; MEMBER_STATUS:  -> N", "Name.MEMBER_TYPE: STU -> REG ;
CATEGORY: 1
> ->","Name.MEMBER_TYPE: -> STU")
>
> x[1] "Name.MEMBER_TYPE: NMA -> STU ; CATEGORY:  -> 1 ; CITY:
MISSISSAUGA -> Mississauga ; ZIP: L5N1H9 -> L5N 1H9 ; COUNTRY: CAN -> ;
MEMBER_STATUS:  -> N"

[2] "Name.MEMBER_TYPE: STU -> REG ; CATEGORY: 1 ->"
[3] "Name.MEMBER_TYPE: -> STU">
> sub(".*: *([[:alnum:]]* *-> *STU|STU *->
*[[:alnum:]]*).*","\\1",x)[1] "NMA -> STU" "STU -> REG" "-> STU"


I am sure that you can get things to the form you desire in one go with some
fiddling of the above, but it was easier for me to write the regex to pick
out the pieces you wanted and leave the rest to you.
Others may have slicker ways to do it, of course.

HTH

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Nov 19, 2016 at 8:06 PM, Steven Nagy <nstefi at gmail.com>
wrote:> I tried out a regular expression on this website:
>
> http://regexr.com/3en1m
>
>
>
> So the input text is:
>
> "Name.MEMBER_TYPE:  -> STU"
>
>
>
> The regular expression is: ((?:\w+|\s) -> STU|STU -> (?:\w+|\s))
>
> And it returns:
>
> "  -> STU"
>
>
>
> but when I use in R, it doesn't return the same result:
>
> strapply(c, "((?:\\w+|\\s) -> STU|STU -> (?:\\w+|\\s))", c,
backref =
> -1, perl = TRUE)
>
> returns:
> "Name.MEMBER_TYPE: -> STU"
>
>
>
>
>
> Here is what I was trying to do:
>
>
>
> I need to extract some values from a log table, and I created a 
> regular expression that helps me with that.
>
> The log table has cells with values like:
>
> a = "Name.MEMBER_TYPE: NMA -> STU ; CATEGORY:  -> 1 ; CITY: 
> MISSISSAUGA -> Mississauga ; ZIP: L5N1H9 -> L5N 1H9 ; COUNTRY: CAN
->
> ; MEMBER_STATUS:  -> N"
>
> or
> b = "Name.MEMBER_TYPE: STU -> REG ; CATEGORY: 1 ->"
>
> so I needed to extract the values that a STU member type is changing 
> from and to, so I needed NMA, STU in the 1st case or STU, REG in the 2nd
case.>
> I came up with this expression which worked in both cases:
>
> strapply(strapply(a, "(\\w+ -> STU|STU -> \\w+)", c,
backref = -1,
> perl = TRUE), "(\\w+) -> (\\w+)", c, backref = -2, perl =
TRUE)
>
>
>
> But I had a 3rd case when the source member type was blank:
>
> c = "Name.MEMBER_TYPE: -> STU"
>
> and in that case it returned an error:
>
> strapply(strapply(c, "(\\w+ -> STU|STU -> \\w+)", c,
backref = -1,
> perl = TRUE), "(\\w+) -> (\\w+)", c, backref = -2, perl =
TRUE)
>
> Error: is.character(x) is not TRUE
>
>
>
> I found that the error is because this returns NULL:
>
> strapply(c, "(\\w+ -> STU|STU -> \\w+)", c, backref = -1,
perl = TRUE)
>
>
>
>
>
> So I tried to modify the regular expression to match any word or blank
> space:
>
> strapply(c, "((?:\\w+|\\s) -> STU|STU -> (?:\\w+|\\s))", c,
backref =
> -1, perl = TRUE)
>
>
>
> but this returned me the whole value of "c":
>
> "Name.MEMBER_TYPE:  -> STU"
>
> and I only needed "  -> STU" as it shows on the website
regxr.com
>
>
>
> Is the result wrong on the regxr.com website or strapply returns the 
> wrong result?
>
>
>
> Thanks,
>
> Steven
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Nov 2016 - Need some help with regular expression

[R] Need some help with regular expression

[R] Need some help with regular expression

[R] Need some help with regular expression