Mark Heckmann
2011-Oct-24 13:46 UTC
[R] splitting a string into words preserving blanks (using regex)
I would like to split a string into words at its blanks but also to preserve all blanks. Example: c(" some words to split ") should become c(" ", "some", " ", " words", " ", "to" , " ", "split", " ") I was not able to achieve this via strsplit() . But I am not familiar with regular expressions. Is there an easy way to do that using e.g. regex and strsplit? Thanks Mark ??????????????????????????????????????? Mark Heckmann Blog: www.markheckmann.de R-Blog: http://ryouready.wordpress.com
Gabor Grothendieck
2011-Oct-24 14:07 UTC
[R] splitting a string into words preserving blanks (using regex)
On Mon, Oct 24, 2011 at 9:46 AM, Mark Heckmann <mark.heckmann at gmx.de> wrote:> I would like to split a string into words at its blanks but also to preserve all blanks. > > Example: > ? ? ? ?c(" some ? ?words to split ") > should become > ? ? ? ?c(" ", "some", " ? ", " words", " ", "to" , " ", "split", " ") > > I was not able to achieve this via strsplit() . > But I am not familiar with regular expressions. > Is there an easy way to do that using e.g. regex and strsplit?Try this:> library(gsubfn) > x <- " some words to split " > v <- strapply(x, "(\\s*)(\\S+)(\\s*)", c)[[1]] > v[nchar(v) > 0][1] " " "some" " " "words" " " "to" " " "split" " " If you don't need the trailing space it can be further simplified:> strapply(xx, "(\\s*)(\\S+)", c)[[1]][1] " " "some" " " "words" " " "to" " " "split" or if you don't need the leading space it can be simplified like this:> strapply(xx, "(\\S+)(\\s*)", c)[[1]][1] "some" " " "words" " " "to" " " "split" " " -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Eik Vettorazzi
2011-Oct-24 14:18 UTC
[R] splitting a string into words preserving blanks (using regex)
Hi Mark, here is a way using gsub to insert a "split marker" and strsplit. strsplit(gsub("([[:alnum:]]+)","|\\1|",c(" some words to split "))[[1]] cheers Am 24.10.2011 15:46, schrieb Mark Heckmann:> I would like to split a string into words at its blanks but also to preserve all blanks. > > Example: > c(" some words to split ") > should become > c(" ", "some", " ", " words", " ", "to" , " ", "split", " ") > > I was not able to achieve this via strsplit() . > But I am not familiar with regular expressions. > Is there an easy way to do that using e.g. regex and strsplit? > > Thanks > Mark > ??????????????????????????????????????? > Mark Heckmann > Blog: www.markheckmann.de > R-Blog: http://ryouready.wordpress.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Eik Vettorazzi Institut f?r Medizinische Biometrie und Epidemiologie Universit?tsklinikum Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 -- Pflichtangaben gem?? Gesetz ?ber elektronische Handelsregister und Genossenschaftsregister sowie das Unternehmensregister (EHUG): Universit?tsklinikum Hamburg-Eppendorf; K?rperschaft des ?ffentlichen Rechts; Gerichtsstand: Hamburg Vorstandsmitglieder: Prof. Dr. Guido Sauter (Vertreter des Vorsitzenden), Dr. Alexander Kirstein, Joachim Pr?l?, Prof. Dr. Dr. Uwe Koch-Gromus
Reasonably Related Threads
- blank space escape sequence in R?
- using regular expressions to retrieve a digit-digit-dot structure from a string
- no partial matching of argument names after dots argument - why?
- Reordering the results from table(cut()) by break argument
- changing a list element's name during execution in lapply - possible?