All - I have a column of SiteNames: SiteName OYS-PIA2-FL-1 OYS-PIA2-LA-1 OYS-PI-LA-BB-1 OYS-PIA2-LA-10 ... [truncated] and I want to include only the last few digits into a new column. I tried substr(data$SiteName, 13, 20) but because some SiteName values are of a different length, the final hyphen (i.e., "-") was included: "1" "1" "-1" "10" ... so I use strsplit(data$SiteName, split = "-") and get "OYS" "PIA2" "FL" "1" "OYS" "PIA2" "LA" "1" "OYS" "PI" "LA" "BB" "1" "OYS" "PIA2" "LA" "10" ... which is great. Unfortunately, I'm stuck. I don't know how to retrieve the final grouping of information from the strsplit() statement I called into a new column. Can you help? Thanks - SR Steven H. Ranney
try this:> x[1] "OYS-PIA2-FL-1" "OYS-PIA2-LA-1" "OYS-PI-LA-BB-1" "OYS-PIA2-LA-10"> sub("^.*?([0-9]+)$", "\\1", x)[1] "1" "1" "1" "10">On Tue, Dec 11, 2012 at 12:46 PM, Steven Ranney <steven.ranney at gmail.com> wrote:> OYS-PIA2-FL-1 > OYS-PIA2-LA-1 > OYS-PI-LA-BB-1 > OYS-PIA2-LA-10-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.
On Dec 11, 2012, at 10:10 AM, jim holtman wrote:> try this: > >> x > [1] "OYS-PIA2-FL-1" "OYS-PIA2-LA-1" "OYS-PI-LA-BB-1" "OYS-PIA2- > LA-10" >> sub("^.*?([0-9]+)$", "\\1", x) > [1] "1" "1" "1" "10" >> > >Steve; jim holtman is one of the jewels of the rhelp world. I generally assume that his answers are going to be the most succinct and efficient ones possible and avoid adding noise, but here I thought I would try to improve. Thinking there might be a string-splitting approach I first tried (and discovered a not-so-great solution: x <- c("OYS-PIA2-FL-1", "OYS-PIA2-LA-1", "OYS-PI-LA-BB-1", "OYS- PIA2-LA-10") sapply( strsplit(x, "-") , "[", 4) [1] "1" "1" "BB" "10" So then I asked myself if we could just "blank out" everything before the last das, finding what seemed to be a fairly economical solution and one that does not require back-references: sub( "^.+-" , "", x) [1] "1" "1" "1" "10" If there were no digits after the last dash these approaches give different results: x <- c("OYS-PIA2-FL-1", "OYS-PIA2-LA-1", "OYS-PI-LA-BB-1", "OYS- PIA2-LA-") sub( "^.+-" , "", x) [1] "1" "1" "1" "" sub("^.*?([0-9]+)$", "\\1", x) [1] "1" "1" "1" "OYS-PIA2-LA-" When a grep pattern does not match, sub and gsub will return the whole argument. -- David.> > On Tue, Dec 11, 2012 at 12:46 PM, Steven Ranney <steven.ranney at gmail.com > > wrote: >> OYS-PIA2-FL-1 >> OYS-PIA2-LA-1 >> OYS-PI-LA-BB-1 >> OYS-PIA2-LA-10 > > > > -- > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Alameda, CA, USA
HI, You could also use: x <- c("OYS-PIA2-FL-1",? "OYS-PIA2-LA-1",? "OYS-PI-LA-BB-1", "OYS-PIA2-LA-10") gsub(".*\\-(\\d+)$","\\1",x) #[1] "1"? "1"? "1"? "10" #or gsub("[A-Z2-]","",x) #in this case #[1] "1"? "1"? "1"? "10" ----- Original Message ----- From: Steven Ranney <steven.ranney at gmail.com> To: r-help at r-project.org Cc: Sent: Tuesday, December 11, 2012 12:46 PM Subject: [R] Retain last grouping after a strsplit() All - I have a column of SiteNames: SiteName OYS-PIA2-FL-1 OYS-PIA2-LA-1 OYS-PI-LA-BB-1 OYS-PIA2-LA-10 ... [truncated] and I want to include only the last few digits into a new column. I tried substr(data$SiteName, 13, 20) but because some SiteName values are of a different length, the final hyphen (i.e., "-") was included: "1" "1" "-1" "10" ... so I use strsplit(data$SiteName, split = "-") and get "OYS" "PIA2" "FL" "1" "OYS" "PIA2" "LA" "1" "OYS" "PI" "LA" "BB" "1" "OYS" "PIA2" "LA" "10" ... which is great.? Unfortunately, I'm stuck.? I don't know how to retrieve the final grouping of information from the strsplit() statement I called into a new column. Can you help? Thanks - SR Steven H. Ranney ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Tue, Dec 11, 2012 at 12:46 PM, Steven Ranney <steven.ranney at gmail.com> wrote:> All - > > I have a column of SiteNames: > > SiteName > OYS-PIA2-FL-1 > OYS-PIA2-LA-1 > OYS-PI-LA-BB-1 > OYS-PIA2-LA-10 > ... > [truncated] > > and I want to include only the last few digits into a new column. > > I tried > > substr(data$SiteName, 13, 20) > > but because some SiteName values are of a different length, the final > hyphen (i.e., "-") was included: > > "1" > "1" > "-1" > "10"Replace everything up to the last dash with the empty string like this: sub(".*-", "", data$SiteName) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Hi Steven, Not sure if you want to understand regular expressions in general or just the solution to your particular problem. If it's the former and you'd be willing to read a book on the subject, I'd recommend "Mastering Regular Expressions" by Jeffrey Friedl. I'm about halfway through now, and think the book is excellent. I'm developing an understanding that I feel is much harder to obtain solely from the documentation that is available online. Thanks, Paul
Apparently Analagous Threads
- semi-nube help request
- domain member file server failed after upgrade from 4.11.14 to 4.13.2
- NT_STATUS_NO_LOGON_SERVERS errors sporadically occurring
- net ads join fails
- After joining domain, Samba uses the workgroup name, not the FQDN when running the net ads command