Hi all,>From a list of strings, I desire to filter out the followings:1. Digits at the beginning of the strings 2. Character "SPE" following the digits (if it exists) 3. Any characters followed by hyphen The following produces the desired result, but would like to know whether this can be done more efficiently. Any suggestions would be much appreciated. dat <- c("2148 SPE MAR - CCC", "9843 SPE ANN - BBB", "56748 LIF - AA", "3489 SPE GEN - CC", "4752473 MAR - AA", "980843 SPE PEN - CC")> dat[1] "2148 SPE MAR - CCC" "9843 SPE ANN - BBB" "56748 LIF - AA" "3489 SPE GEN - CC" "4752473 MAR - AA" "980843 SPE PEN - CC" dd <- sub(pattern = "^[0-9]+[[:blank:]]", "", dat) dd <- sub(pattern = "SPE ", "", dd) dd <- substr(x = dd, start = 1, stop = regexpr("-", dd) - 2)> dd[1] "MAR" "ANN" "LIF" "GEN" "MAR" "PEN" -- Steven [[alternative HTML version deleted]]
gsub(pattern = "^[0-9]+ (SPE )*(\\w+) - .*$", "\\2", dat) ----- A R learner. -- View this message in context: http://r.789695.n4.nabble.com/Regular-Expression-tp2318086p2318101.html Sent from the R help mailing list archive at Nabble.com.
Was going to suggest gsub("^[0-9]+ (SPE )?([^ -])( -.*)?", "\\3", s) but I see Wu Gong beat me to the punch with a nicer one :) On 9 August 2010 12:13, Wu Gong <wg2f at mtmail.mtsu.edu> wrote:> > gsub(pattern = "^[0-9]+ (SPE )*(\\w+) - .*$", "\\2", dat) >
And my \\3 should have been a \\2 anyway ! On 9 August 2010 12:23, Michael Bedward <michael.bedward at gmail.com> wrote:> Was going to suggest gsub("^[0-9]+ (SPE )?([^ -])( -.*)?", "\\3", s) > but I see Wu Gong beat me to the punch with a nicer one :) > > On 9 August 2010 12:13, Wu Gong <wg2f at mtmail.mtsu.edu> wrote: >> >> gsub(pattern = "^[0-9]+ (SPE )*(\\w+) - .*$", "\\2", dat) >> >