Dear R Helpers, My regex skills are beginner to intermediate and banging around the web has not resulted in a solution to the problem below so I hope that one of you who has mad skills can help me out. I want to extract the stock ticker--AMT-- out of the string American Tower Corporation (REIT)??(AMT) The presence of the other parenthetical text (REIT) makes this difficult. Please note that the string may or may not have a interfering set of characters such as the (REIT) so the solution needs to be generalizable to the last set of characters that are contained in parentheses in the larger string. So an example of a string without the interfering (REIT) would be Aetna Inc.??(AET) Your assistance would be very much appreciated. --John Sparks
The following gets the last parenthesized sequence of non-parentheses
> sub(".*(\\([^()]+\\))([^()]*)$", "\\1",
c("Aetna(AET)",
"American Tower Corp(REIT)(ATC)",
"No Parens",
"Qwerty Corp (ASD)(ZXC)(123) extra stuff"))
[1] "(AET)" "(ATC)" "No Parens"
"(123)"
Bill Dunlap
TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at
r-project.org] On Behalf
> Of Sparks, John James
> Sent: Tuesday, April 08, 2014 11:29 AM
> To: r-help at r-project.org
> Subject: [R] Pull Stock Symbol Out of String
>
> Dear R Helpers,
>
> My regex skills are beginner to intermediate and banging around the web
> has not resulted in a solution to the problem below so I hope that one of
> you who has mad skills can help me out.
>
> I want to extract the stock ticker--AMT-- out of the string
>
> American Tower Corporation (REIT)??(AMT)
>
> The presence of the other parenthetical text (REIT) makes this difficult.
> Please note that the string may or may not have a interfering set of
> characters such as the (REIT) so the solution needs to be generalizable to
> the last set of characters that are contained in parentheses in the larger
> string. So an example of a string without the interfering (REIT) would be
>
> Aetna Inc.??(AET)
>
>
> Your assistance would be very much appreciated.
>
> --John Sparks
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
You could try:
# Use ?regexec and ?regmatches to return a list of grouped matches.
# Use \\( and \\) to match literal parentheses.
# Use ... to match three characters.
# Use $ to match at end of string.
s1 <- "American Tower Corporation (REIT)? (AMT)"
s2 <- "Aetna Inc.? (AET)"
getSym <- function(s) {regmatches(s, regexec("\\((...)\\)$",
s))[[1]][2]}
getSym(s1) # [1] "AMT"
getSym(s2) # [1] "AET"
Cheers,
B.
On 2014-04-08, at 2:29 PM, Sparks, John James wrote:
> Dear R Helpers,
>
> My regex skills are beginner to intermediate and banging around the web
> has not resulted in a solution to the problem below so I hope that one of
> you who has mad skills can help me out.
>
> I want to extract the stock ticker--AMT-- out of the string
>
> American Tower Corporation (REIT)? (AMT)
>
> The presence of the other parenthetical text (REIT) makes this difficult.
> Please note that the string may or may not have a interfering set of
> characters such as the (REIT) so the solution needs to be generalizable to
> the last set of characters that are contained in parentheses in the larger
> string. So an example of a string without the interfering (REIT) would be
>
> Aetna Inc.? (AET)
>
>
> Your assistance would be very much appreciated.
>
> --John Sparks
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Hi,
You may try:
library(qdap)
str1 <- c("American Tower Corporation (REIT)? (AMT)", "Aetna
Inc.? (AET)")
unlist(lapply(bracketXtract(str1,"round"),tail,1),use.names=F)
#[1] "AMT" "AET"
A.K.
On Tuesday, April 8, 2014 7:48 PM, "Sparks, John James" <jspark4 at
uic.edu> wrote:
Dear R Helpers,
My regex skills are beginner to intermediate and banging around the web
has not resulted in a solution to the problem below so I hope that one of
you who has mad skills can help me out.
I want to extract the stock ticker--AMT-- out of the string
American Tower Corporation (REIT)??(AMT)
The presence of the other parenthetical text (REIT) makes this difficult.
Please note that the string may or may not have a interfering set of
characters such as the (REIT) so the solution needs to be generalizable to
the last set of characters that are contained in parentheses in the larger
string.? So an example of a string without the interfering (REIT) would be
Aetna Inc.??(AET)
Your assistance would be very much appreciated.
--John Sparks
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.