Please consider following string: MyString <- "ABCFR34564IJVEOJC3434" Here you see that, there are 4 groups in above string. 1st and 3rd groups are for english letters and 2nd and 4th for numeric. Given a string, how can I separate out those 4 groups? Thanks for your time [[alternative HTML version deleted]]
On Sun, Feb 13, 2011 at 10:27 AM, Megh Dal <megh700004 at gmail.com> wrote:> Please consider following string: > > MyString <- "ABCFR34564IJVEOJC3434" > > Here you see that, there are 4 groups in above string. 1st and 3rd groups > are for english letters and 2nd and 4th for numeric. Given a string, how can > I separate out those 4 groups? >Try this. "\\D+" and "\\d+" match non-digits and digits respectively. The portions within parentheses are captures and passed to the c function. It returns a list with a component for each element of MyString. Like R's split it returns a list with a component per element of MyString but MyString only has one element so we get its contents using [[1]].> library(gsubfn) > strapply(MyString, "(\\D+)(\\d+)(\\D+)(\\d+)", c)[[1]][1] "ABCFR" "34564" "IJVEOJC" "3434" Alternately we could convert the relevant portions to numbers at the same time. ~ list(...) is interpreted as a function whose body is the right hand side of the ~ and whose arguments are the free variables, i.e. s1, s2, s3 and s4. strapply(MyString, "(\\D+)(\\d+)(\\D+)(\\d+)", ~ list(s1, as.numeric(s2), s3, as.numeric(s4)))[[1]] See http://gsubfn.googlecode.com for more. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
If you have an indeterminate number of the patterns in the string, try the following:> MyString <- "ABCFR34564IJVEOJC3434" > # translate to the pattern sequences > x <- chartr('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'+ , '000000000000000000000000001111111111' + , MyString + )> x.rle <- rle(strsplit(x, '')[[1]]) # determine the runs > # create extraction matrix > x.ext <- cbind(cumsum(c(1, head(x.rle$lengths, -1)))+ , cumsum(x.rle$lengths) + )> substring(MyString, x.ext[,1], x.ext[,2])[1] "ABCFR" "34564" "IJVEOJC" "3434">On Sun, Feb 13, 2011 at 10:27 AM, Megh Dal <megh700004 at gmail.com> wrote:> Please consider following string: > > MyString <- "ABCFR34564IJVEOJC3434" > > Here you see that, there are 4 groups in above string. 1st and 3rd groups > are for english letters and 2nd and 4th for numeric. Given a string, how can > I separate out those 4 groups? > > Thanks for your time > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?