Hi All, I have strings made up of an unknown number of letters, digits, and spaces. Strings always start with one or two letters, and always end with one or two digits. A set of letters (one or two letters) is always followed by a set of digits (one or two digits), possibly with one or more spaces between the sets of letters and digits. A set of letters always belongs to the following set of digits and I want to parse the strings into these groups. As an example, the strings and the desired parsing results could look like this: A10B10, desired parsing result: A10 and B10 A10 B5, desired parsing result: A10 and B5 AB 10 CD 12, desired parsing result: AB10 and CD12 A10CD2EF3, desired parsing result: A10, CD2, and EF3 I assume that it is possible to search a string for letters and digits and then break the string where letters are followed by digits, however I am a bit clueless about how I could use, e.g., the 'charmatch' or 'parse' commands to achieve this. Thanks a lot in advance for your help. Best, Michael Michael Drescher Ontario Forest Research Institute Ontario Ministry of Natural Resources 1235 Queen St East Sault Ste Marie, ON, P6A 2E3 Tel: (705) 946-7406 Fax: (705) 946-2030 [[alternative HTML version deleted]]
strapply in package gsubfn can do that. The following matches the indicated regular expression against x and applies the function given in formula notation (which removes spaces) to each match, outputting the result as a list: library(gsubfn) x <- c("AB10B10", "A10 B5", "AB 10 CD 12", "A10CD2EF3") strapply(x, "[[:alpha:]] *[[:digit:]]+", ~ gsub(" ", "", x)) For more info, see the gsubfn home page at: code.google.com/p/gsubfn and the various Links there. On 7/9/07, Drescher, Michael (MNR) <michael.drescher at ontario.ca> wrote:> Hi All, > > > > I have strings made up of an unknown number of letters, digits, and > spaces. Strings always start with one or two letters, and always end > with one or two digits. A set of letters (one or two letters) is always > followed by a set of digits (one or two digits), possibly with one or > more spaces between the sets of letters and digits. A set of letters > always belongs to the following set of digits and I want to parse the > strings into these groups. As an example, the strings and the desired > parsing results could look like this: > > > > A10B10, desired parsing result: A10 and B10 > > A10 B5, desired parsing result: A10 and B5 > > AB 10 CD 12, desired parsing result: AB10 and CD12 > > A10CD2EF3, desired parsing result: A10, CD2, and EF3 > > > > I assume that it is possible to search a string for letters and digits > and then break the string where letters are followed by digits, however > I am a bit clueless about how I could use, e.g., the 'charmatch' or > 'parse' commands to achieve this. > > > > Thanks a lot in advance for your help. > > > > Best, Michael > > > > > > > > Michael Drescher > > Ontario Forest Research Institute > > Ontario Ministry of Natural Resources > > 1235 Queen St East > > Sault Ste Marie, ON, P6A 2E3 > > Tel: (705) 946-7406 > > Fax: (705) 946-2030 > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Is this what you want:> x <- "A10B10A10 B5AB 10 CD 12A10CD2EF3" > x <- gsub(" ", "", x) # remove blanks > y <- gregexpr("[A-Z]+\\s*[0-9]+", x )[[1]] > > substring(x, y, y + attr(y, 'match.length') - 1)[1] "A10" "B10" "A10" "B5" "AB10" "CD12" "A10" "CD2" "EF3">On 7/9/07, Drescher, Michael (MNR) <michael.drescher at ontario.ca> wrote:> Hi All, > > > > I have strings made up of an unknown number of letters, digits, and > spaces. Strings always start with one or two letters, and always end > with one or two digits. A set of letters (one or two letters) is always > followed by a set of digits (one or two digits), possibly with one or > more spaces between the sets of letters and digits. A set of letters > always belongs to the following set of digits and I want to parse the > strings into these groups. As an example, the strings and the desired > parsing results could look like this: > > > > A10B10, desired parsing result: A10 and B10 > > A10 B5, desired parsing result: A10 and B5 > > AB 10 CD 12, desired parsing result: AB10 and CD12 > > A10CD2EF3, desired parsing result: A10, CD2, and EF3 > > > > I assume that it is possible to search a string for letters and digits > and then break the string where letters are followed by digits, however > I am a bit clueless about how I could use, e.g., the 'charmatch' or > 'parse' commands to achieve this. > > > > Thanks a lot in advance for your help. > > > > Best, Michael > > > > > > > > Michael Drescher > > Ontario Forest Research Institute > > Ontario Ministry of Natural Resources > > 1235 Queen St East > > Sault Ste Marie, ON, P6A 2E3 > > Tel: (705) 946-7406 > > Fax: (705) 946-2030 > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?