Hi All, I have strings made up of an unknown number of letters, digits, and spaces. Strings always start with one or two letters, and always end with one or two digits. A set of letters (one or two letters) is always followed by a set of digits (one or two digits), possibly with one or more spaces between the sets of letters and digits. A set of letters always belongs to the following set of digits and I want to parse the strings into these groups. As an example, the strings and the desired parsing results could look like this: A10B10, desired parsing result: A10 and B10 A10 B5, desired parsing result: A10 and B5 AB 10 CD 12, desired parsing result: AB10 and CD12 A10CD2EF3, desired parsing result: A10, CD2, and EF3 I assume that it is possible to search a string for letters and digits and then break the string where letters are followed by digits, however I am a bit clueless about how I could use, e.g., the 'charmatch' or 'parse' commands to achieve this. Thanks a lot in advance for your help. Best, Michael Michael Drescher Ontario Forest Research Institute Ontario Ministry of Natural Resources 1235 Queen St East Sault Ste Marie, ON, P6A 2E3 Tel: (705) 946-7406 Fax: (705) 946-2030 [[alternative HTML version deleted]]
strapply in package gsubfn can do that. The following matches the
indicated regular expression against x and applies the function
given in formula notation (which removes spaces) to each match,
outputting the result as a list:
library(gsubfn)
x <- c("AB10B10", "A10 B5", "AB 10 CD 12",
"A10CD2EF3")
strapply(x, "[[:alpha:]] *[[:digit:]]+", ~ gsub(" ",
"", x))
For more info, see the gsubfn home page at:
http://code.google.com/p/gsubfn/
and the various Links there.
On 7/9/07, Drescher, Michael (MNR) <michael.drescher at ontario.ca>
wrote:> Hi All,
>
>
>
> I have strings made up of an unknown number of letters, digits, and
> spaces. Strings always start with one or two letters, and always end
> with one or two digits. A set of letters (one or two letters) is always
> followed by a set of digits (one or two digits), possibly with one or
> more spaces between the sets of letters and digits. A set of letters
> always belongs to the following set of digits and I want to parse the
> strings into these groups. As an example, the strings and the desired
> parsing results could look like this:
>
>
>
> A10B10, desired parsing result: A10 and B10
>
> A10 B5, desired parsing result: A10 and B5
>
> AB 10 CD 12, desired parsing result: AB10 and CD12
>
> A10CD2EF3, desired parsing result: A10, CD2, and EF3
>
>
>
> I assume that it is possible to search a string for letters and digits
> and then break the string where letters are followed by digits, however
> I am a bit clueless about how I could use, e.g., the 'charmatch' or
> 'parse' commands to achieve this.
>
>
>
> Thanks a lot in advance for your help.
>
>
>
> Best, Michael
>
>
>
>
>
>
>
> Michael Drescher
>
> Ontario Forest Research Institute
>
> Ontario Ministry of Natural Resources
>
> 1235 Queen St East
>
> Sault Ste Marie, ON, P6A 2E3
>
> Tel: (705) 946-7406
>
> Fax: (705) 946-2030
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Is this what you want:> x <- "A10B10A10 B5AB 10 CD 12A10CD2EF3" > x <- gsub(" ", "", x) # remove blanks > y <- gregexpr("[A-Z]+\\s*[0-9]+", x )[[1]] > > substring(x, y, y + attr(y, 'match.length') - 1)[1] "A10" "B10" "A10" "B5" "AB10" "CD12" "A10" "CD2" "EF3">On 7/9/07, Drescher, Michael (MNR) <michael.drescher at ontario.ca> wrote:> Hi All, > > > > I have strings made up of an unknown number of letters, digits, and > spaces. Strings always start with one or two letters, and always end > with one or two digits. A set of letters (one or two letters) is always > followed by a set of digits (one or two digits), possibly with one or > more spaces between the sets of letters and digits. A set of letters > always belongs to the following set of digits and I want to parse the > strings into these groups. As an example, the strings and the desired > parsing results could look like this: > > > > A10B10, desired parsing result: A10 and B10 > > A10 B5, desired parsing result: A10 and B5 > > AB 10 CD 12, desired parsing result: AB10 and CD12 > > A10CD2EF3, desired parsing result: A10, CD2, and EF3 > > > > I assume that it is possible to search a string for letters and digits > and then break the string where letters are followed by digits, however > I am a bit clueless about how I could use, e.g., the 'charmatch' or > 'parse' commands to achieve this. > > > > Thanks a lot in advance for your help. > > > > Best, Michael > > > > > > > > Michael Drescher > > Ontario Forest Research Institute > > Ontario Ministry of Natural Resources > > 1235 Queen St East > > Sault Ste Marie, ON, P6A 2E3 > > Tel: (705) 946-7406 > > Fax: (705) 946-2030 > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?