Dear R Helpers, My regex skills are beginner to intermediate and banging around the web has not resulted in a solution to the problem below so I hope that one of you who has mad skills can help me out. I want to extract the stock ticker--AMT-- out of the string American Tower Corporation (REIT)??(AMT) The presence of the other parenthetical text (REIT) makes this difficult. Please note that the string may or may not have a interfering set of characters such as the (REIT) so the solution needs to be generalizable to the last set of characters that are contained in parentheses in the larger string. So an example of a string without the interfering (REIT) would be Aetna Inc.??(AET) Your assistance would be very much appreciated. --John Sparks
The following gets the last parenthesized sequence of non-parentheses > sub(".*(\\([^()]+\\))([^()]*)$", "\\1", c("Aetna(AET)", "American Tower Corp(REIT)(ATC)", "No Parens", "Qwerty Corp (ASD)(ZXC)(123) extra stuff")) [1] "(AET)" "(ATC)" "No Parens" "(123)" Bill Dunlap TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf > Of Sparks, John James > Sent: Tuesday, April 08, 2014 11:29 AM > To: r-help at r-project.org > Subject: [R] Pull Stock Symbol Out of String > > Dear R Helpers, > > My regex skills are beginner to intermediate and banging around the web > has not resulted in a solution to the problem below so I hope that one of > you who has mad skills can help me out. > > I want to extract the stock ticker--AMT-- out of the string > > American Tower Corporation (REIT)??(AMT) > > The presence of the other parenthetical text (REIT) makes this difficult. > Please note that the string may or may not have a interfering set of > characters such as the (REIT) so the solution needs to be generalizable to > the last set of characters that are contained in parentheses in the larger > string. So an example of a string without the interfering (REIT) would be > > Aetna Inc.??(AET) > > > Your assistance would be very much appreciated. > > --John Sparks > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
You could try: # Use ?regexec and ?regmatches to return a list of grouped matches. # Use \\( and \\) to match literal parentheses. # Use ... to match three characters. # Use $ to match at end of string. s1 <- "American Tower Corporation (REIT)? (AMT)" s2 <- "Aetna Inc.? (AET)" getSym <- function(s) {regmatches(s, regexec("\\((...)\\)$", s))[[1]][2]} getSym(s1) # [1] "AMT" getSym(s2) # [1] "AET" Cheers, B. On 2014-04-08, at 2:29 PM, Sparks, John James wrote:> Dear R Helpers, > > My regex skills are beginner to intermediate and banging around the web > has not resulted in a solution to the problem below so I hope that one of > you who has mad skills can help me out. > > I want to extract the stock ticker--AMT-- out of the string > > American Tower Corporation (REIT)? (AMT) > > The presence of the other parenthetical text (REIT) makes this difficult. > Please note that the string may or may not have a interfering set of > characters such as the (REIT) so the solution needs to be generalizable to > the last set of characters that are contained in parentheses in the larger > string. So an example of a string without the interfering (REIT) would be > > Aetna Inc.? (AET) > > > Your assistance would be very much appreciated. > > --John Sparks > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi, You may try: library(qdap) str1 <- c("American Tower Corporation (REIT)? (AMT)", "Aetna Inc.? (AET)") unlist(lapply(bracketXtract(str1,"round"),tail,1),use.names=F) #[1] "AMT" "AET" A.K. On Tuesday, April 8, 2014 7:48 PM, "Sparks, John James" <jspark4 at uic.edu> wrote: Dear R Helpers, My regex skills are beginner to intermediate and banging around the web has not resulted in a solution to the problem below so I hope that one of you who has mad skills can help me out. I want to extract the stock ticker--AMT-- out of the string American Tower Corporation (REIT)??(AMT) The presence of the other parenthetical text (REIT) makes this difficult. Please note that the string may or may not have a interfering set of characters such as the (REIT) so the solution needs to be generalizable to the last set of characters that are contained in parentheses in the larger string.? So an example of a string without the interfering (REIT) would be Aetna Inc.??(AET) Your assistance would be very much appreciated. --John Sparks ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.