i have a data column of text entries: 26M_AN_C.bmp 22M_AN_C.bmp 20M_HA_O.bmp 20M_AN_C.bmp 26M_HA_O.bmp 22M_HA_O.bmp 31M_AN_C.bmp 38M_HA_O.bmp . . . . And I would like to sort by the middle tag: AN, HA, etc. Is there a way to parse text data in R? In excel, I would have used the "left" and "right" function to cut out just the middle two letters out and put into another column to sort by. Thanks! -- View this message in context: http://www.nabble.com/Text-data-tp21714334p21714334.html Sent from the R help mailing list archive at Nabble.com.
This will sort on those characters:> x <- readLines(textConnection("26M_AN_C.bmp+ 22M_AN_C.bmp + 20M_HA_O.bmp + 20M_AN_C.bmp + 26M_HA_O.bmp + 22M_HA_O.bmp + 31M_AN_C.bmp + 38M_HA_O.bmp"))> closeAllConnections() > # pick off characters between "_" > sortKey <- sub(".*_(.+)_.*", "\\1", x) > sortKey[1] "AN" "AN" "HA" "AN" "HA" "HA" "AN" "HA"> # output sorted list > x[order(sortKey)][1] "26M_AN_C.bmp" "22M_AN_C.bmp" "20M_AN_C.bmp" "31M_AN_C.bmp" "20M_HA_O.bmp" "26M_HA_O.bmp" "22M_HA_O.bmp" "38M_HA_O.bmp"> >On Wed, Jan 28, 2009 at 3:37 PM, Alice Lin <alice.ly at gmail.com> wrote:> > i have a data column of text entries: > 26M_AN_C.bmp > 22M_AN_C.bmp > 20M_HA_O.bmp > 20M_AN_C.bmp > 26M_HA_O.bmp > 22M_HA_O.bmp > 31M_AN_C.bmp > 38M_HA_O.bmp > . > . > . > . > > > And I would like to sort by the middle tag: AN, HA, etc. > Is there a way to parse text data in R? > > In excel, I would have used the "left" and "right" function to cut out just > the middle two letters out and put into another column to sort by. > > Thanks! > > -- > View this message in context: http://www.nabble.com/Text-data-tp21714334p21714334.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Jim's solution is more elegant than the following (and probably more efficient) but you could also try the following (This let's you sort by AN/HN, and then by the number at the start of the filename):> text <- c( "26M_AN_C.bmp", "22M_AN_C.bmp", "20M_HA_O.bmp","20M_AN_C.bmp", "26M_HA_O.bmp", "22M_HA_O.bmp", "31M_AN_C.bmp", "38M_HA_O.bmp")> split <- do.call("rbind",strsplit(text,"_"))> o <- order(split[,2],split[,1],split[,3])> text[o][1] 20M_AN_C.bmp" "22M_AN_C.bmp" "26M_AN_C.bmp" "31M_AN_C.bmp" "20M_HA_O.bmp" [6] "22M_HA_O.bmp" "26M_HA_O.bmp" "38M_HA_O.bmp" -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Alice Lin Sent: Wednesday, January 28, 2009 3:38 PM To: r-help at r-project.org Subject: [R] Text data i have a data column of text entries: 26M_AN_C.bmp 22M_AN_C.bmp 20M_HA_O.bmp 20M_AN_C.bmp 26M_HA_O.bmp 22M_HA_O.bmp 31M_AN_C.bmp 38M_HA_O.bmp . . . . And I would like to sort by the middle tag: AN, HA, etc. Is there a way to parse text data in R? In excel, I would have used the "left" and "right" function to cut out just the middle two letters out and put into another column to sort by. Thanks! -- View this message in context: http://www.nabble.com/Text-data-tp21714334p21714334.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ================================== P Please consider the environment before printing this e-mail Cleveland Clinic is ranked one of the top hospitals in America by U.S. News & World Report (2008). Visit us online at http://www.clevelandclinic.org for a complete listing of our services, staff and locations. Confidentiality Note: This message is intended for use\...{{dropped:13}}