Mark Heckmann
2011-Oct-24 13:46 UTC
[R] splitting a string into words preserving blanks (using regex)
I would like to split a string into words at its blanks but also to preserve all
blanks.
Example:
c(" some words to split ")
should become
c(" ", "some", " ", " words", "
", "to" , " ", "split", " ")
I was not able to achieve this via strsplit() .
But I am not familiar with regular expressions.
Is there an easy way to do that using e.g. regex and strsplit?
Thanks
Mark
???????????????????????????????????????
Mark Heckmann
Blog: www.markheckmann.de
R-Blog: http://ryouready.wordpress.com
Gabor Grothendieck
2011-Oct-24 14:07 UTC
[R] splitting a string into words preserving blanks (using regex)
On Mon, Oct 24, 2011 at 9:46 AM, Mark Heckmann <mark.heckmann at gmx.de> wrote:> I would like to split a string into words at its blanks but also to preserve all blanks. > > Example: > ? ? ? ?c(" some ? ?words to split ") > should become > ? ? ? ?c(" ", "some", " ? ", " words", " ", "to" , " ", "split", " ") > > I was not able to achieve this via strsplit() . > But I am not familiar with regular expressions. > Is there an easy way to do that using e.g. regex and strsplit?Try this:> library(gsubfn) > x <- " some words to split " > v <- strapply(x, "(\\s*)(\\S+)(\\s*)", c)[[1]] > v[nchar(v) > 0][1] " " "some" " " "words" " " "to" " " "split" " " If you don't need the trailing space it can be further simplified:> strapply(xx, "(\\s*)(\\S+)", c)[[1]][1] " " "some" " " "words" " " "to" " " "split" or if you don't need the leading space it can be simplified like this:> strapply(xx, "(\\S+)(\\s*)", c)[[1]][1] "some" " " "words" " " "to" " " "split" " " -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Eik Vettorazzi
2011-Oct-24 14:18 UTC
[R] splitting a string into words preserving blanks (using regex)
Hi Mark,
here is a way using gsub to insert a "split marker" and strsplit.
strsplit(gsub("([[:alnum:]]+)","|\\1|",c(" some
words to split "))[[1]]
cheers
Am 24.10.2011 15:46, schrieb Mark Heckmann:> I would like to split a string into words at its blanks but also to
preserve all blanks.
>
> Example:
> c(" some words to split ")
> should become
> c(" ", "some", " ", " words",
" ", "to" , " ", "split", " ")
>
> I was not able to achieve this via strsplit() .
> But I am not familiar with regular expressions.
> Is there an easy way to do that using e.g. regex and strsplit?
>
> Thanks
> Mark
> ???????????????????????????????????????
> Mark Heckmann
> Blog: www.markheckmann.de
> R-Blog: http://ryouready.wordpress.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Eik Vettorazzi
Institut f?r Medizinische Biometrie und Epidemiologie
Universit?tsklinikum Hamburg-Eppendorf
Martinistr. 52
20246 Hamburg
T ++49/40/7410-58243
F ++49/40/7410-57790
--
Pflichtangaben gem?? Gesetz ?ber elektronische Handelsregister und
Genossenschaftsregister sowie das Unternehmensregister (EHUG):
Universit?tsklinikum Hamburg-Eppendorf; K?rperschaft des ?ffentlichen Rechts;
Gerichtsstand: Hamburg
Vorstandsmitglieder: Prof. Dr. Guido Sauter (Vertreter des Vorsitzenden), Dr.
Alexander Kirstein, Joachim Pr?l?, Prof. Dr. Dr. Uwe Koch-Gromus
Seemingly Similar Threads
- blank space escape sequence in R?
- using regular expressions to retrieve a digit-digit-dot structure from a string
- no partial matching of argument names after dots argument - why?
- Reordering the results from table(cut()) by break argument
- changing a list element's name during execution in lapply - possible?