Dear R People: I have the following set of data> Block[1:5][1] "5600-5699" "6100-6199" "9700-9799" "9400-9499" "8300-8399" and I want to split at the -> strsplit(Block[1:5],"-")[[1]] [1] "5600" "5699" [[2]] [1] "6100" "6199" [[3]] [1] "9700" "9799" [[4]] [1] "9400" "9499" [[5]] [1] "8300" "8399">What is the best way to extract the pieces that are to the left of the dash, please? Thanks, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodgess at gmail.com
Hi Erin, this is one way: Block <- c("5600-5699","6100-6199","9700-9799","9400-9499","8300-8399") splBlock <- strsplit(Block,"-") sapply(splBlock, "[", 1) greetings, Remko -- View this message in context: http://r.789695.n4.nabble.com/strsplit-question-tp3896847p3896850.html Sent from the R help mailing list archive at Nabble.com.
unlist(strsplit(Block[1:5], "-.+$")) if you are going to want the other pieces later, the most efficient way depends on the assumptions you can make about your data. If there are always two elements from the split: matrix(unlist(strsplit(Block[1:5], "-")), ncol = 2, byrow = TRUE) ## or do.call("rbind", strsplit(Block[1:5], "-")) the first option dropping everything after - is marginally more efficient, followed by the matrix technique. A series of clunkier options (in my view) would be: unlist(strsplit(Block[1:5], "-"))[seq(from = 1, to = 2 * length(Block[1:5]), by = 2)] or very flexible in terms of extracting the first element (regardless of how many there are), but computationally less efficient: sapply(strsplit(Block[1:5], "-"), `[[`, 1) but this is only slightly less so, and testing on a simple character vector of length 10^8, was still complete in less than 1 second on a 1.66ghz dual core on R devel r57214 windows x64. Cheers, Josh On Tue, Oct 11, 2011 at 10:20 PM, Erin Hodgess <erinm.hodgess at gmail.com> wrote:> Dear R People: > > I have the following set of data >> Block[1:5] > [1] "5600-5699" "6100-6199" "9700-9799" "9400-9499" "8300-8399" > > and I want to split at the - > >> strsplit(Block[1:5],"-") > [[1]] > [1] "5600" "5699" > > [[2]] > [1] "6100" "6199" > > [[3]] > [1] "9700" "9799" > > [[4]] > [1] "9400" "9499" > > [[5]] > [1] "8300" "8399" > >> > > What is the best way to extract the pieces that are to the left of the > dash, please? > > Thanks, > Erin > > > -- > Erin Hodgess > Associate Professor > Department of Computer and Mathematical Sciences > University of Houston - Downtown > mailto: erinm.hodgess at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/
sapply(strsplit(Block[1:5],"-"), function (x) {x[1]}) comes to mind... --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. Erin Hodgess <erinm.hodgess@gmail.com> wrote: Dear R People: I have the following set of data> Block[1:5][1] "5600-5699" "6100-6199" "9700-9799" "9400-9499" "8300-8399" and I want to split at the -> strsplit(Block[1:5],"-")[[1]] [1] "5600" "5699" [[2]] [1] "6100" "6199" [[3]] [1] "9700" "9799" [[4]] [1] "9400" "9499" [[5]] [1] "8300" "8399">What is the best way to extract the pieces that are to the left of the dash, please? Thanks, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodgess@gmail.com _____________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
On Oct 12, 2011, at 1:20 AM, Erin Hodgess wrote:> Dear R People: > > I have the following set of data >> Block[1:5] > [1] "5600-5699" "6100-6199" "9700-9799" "9400-9499" "8300-8399" > > and I want to split at the - > >> strsplit(Block[1:5],"-") > [[1]] > [1] "5600" "5699" > > [[2]] > [1] "6100" "6199" > > [[3]] > [1] "9700" "9799" > > [[4]] > [1] "9400" "9499" > > [[5]] > [1] "8300" "8399" > >> > > What is the best way to extract the pieces that are to the left of the > dash, please? >> sub("\\-.*$", "", c("5600-5699", "6100-6199", "9700-9799", "9400-9499", "8300-8399") ) [1] "5600" "6100" "9700" "9400" "8300" -- David Winsemius, MD West Hartford, CT
On Wed, Oct 12, 2011 at 1:20 AM, Erin Hodgess <erinm.hodgess at gmail.com> wrote:> Dear R People: > > I have the following set of data >> Block[1:5] > [1] "5600-5699" "6100-6199" "9700-9799" "9400-9499" "8300-8399" > > and I want to split at the - > >> strsplit(Block[1:5],"-") > [[1]] > [1] "5600" "5699" > > [[2]] > [1] "6100" "6199" > > [[3]] > [1] "9700" "9799" > > [[4]] > [1] "9400" "9499" > > [[5]] > [1] "8300" "8399" >Try this:> x <- c("5600-5699", "6100-6199", "9700-9799", "9400-9499", "8300-8399") > sub("-.*", "", x) # before dash[1] "5600" "6100" "9700" "9400" "8300"> sub(".*-", "", x) # after dash[1] "5699" "6199" "9799" "9499" "8399" and here is another approach:> library(gsubfn) > m <- strapply(x, "\\d+", c, simplify = TRUE) > m[,1] [,2] [,3] [,4] [,5] [1,] "5600" "6100" "9700" "9400" "8300" [2,] "5699" "6199" "9799" "9499" "8399" Now m[1, ] and m[2, ] are the vectors of digits before and after the dash. Note that c in the strapply call can be replaced with as.numeric if you want a numeric matrix instead. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com