thr3ads.net - R help - [R] Q about strsplit and regexp [Oct 2004]

If this information is useful, please help other people find it:
Share via:

Liaw, Andy

2004-Oct-20 12:15 UTC

[R] Q about strsplit and regexp

Dear R-help,

This one is probably a piece of cake for regexp masters.  I'd like to split
a character vector (for simplicity, say of length one for now) that contains
fields that are delimited by arbitrary number of white spaces (e.g., "  a b
c ").  How do I get the character vector that contain the fields?  In the
example I gave, I've tried:
> strsplit("  a b    c ", " +")[[1]]
[1] ""  "a" "b" "c"

I do not want that empty character in the beginning, but couldn't figure out
how to strip the starting white spaces, other than something ugly like:
> strsplit(sub("^ +", "", "  a b    c "),
" +")[[1]]
[1] "a" "b" "c"

Can some kind soul point me to a simpler way?  TIA!!

Best,
Andy

Andy Liaw, PhD
Biometrics Research      PO Box 2000, RY33-300     
Merck Research Labs           Rahway, NJ 07065
andy_liaw <at> merck.com          732-594-0820

Jean-Pierre Muller

2004-Oct-20 12:49 UTC

head link

[R] Q about strsplit and regexp

Hello,

in the function ttda.segmentation of ttda 
<http://wwwpeople.unil.ch/jean-pierre.mueller/>

i use:

     #compute occurences
     occurences <- unlist(strsplit(textlines[1:length(textlines)],
         grep.sep, TRUE))
     #delete empty lines
     occurences <- occurences[nchar(occurences) > 0]

HTH.


Le 20 oct. 04, ?? 14:15, Liaw, Andy a ??crit :
> Dear R-help,
>
> This one is probably a piece of cake for regexp masters.  I'd like to 
> split
> a character vector (for simplicity, say of length one for now) that 
> contains
> fields that are delimited by arbitrary number of white spaces (e.g., "
>  a b
> c ").  How do I get the character vector that contain the fields?  In 
> the
> example I gave, I've tried:
>
>> strsplit("  a b    c ", " +")
> [[1]]
> [1] ""  "a" "b" "c"
>
> I do not want that empty character in the beginning, but couldn't 
> figure out
> how to strip the starting white spaces, other than something ugly like:
>
>> strsplit(sub("^ +", "", "  a b    c "),
" +")
> [[1]]
> [1] "a" "b" "c"
>
> Can some kind soul point me to a simpler way?  TIA!!
>
> Best,
> Andy
>
> Andy Liaw, PhD
> Biometrics Research      PO Box 2000, RY33-300
> Merck Research Labs           Rahway, NJ 07065
> andy_liaw <at> merck.com          732-594-0820
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>
>-- 
Jean-Pierre M??ller
SSP / BFSH2 / UNIL / CH - 1015 Lausanne
Voice:+41 21 692 3116 / Fax:+41 21 692 3115

Please avoid sending me Word or PowerPoint attachments.
  See http://www.fsf.org/philosophy/no-word-attachments.html
S'il vous pla??t, ??vitez de m'envoyer des attachements au format Word
ou
PowerPoint.
  Voir http://www.fsf.org/philosophy/no-word-attachments.fr.html

Dimitris Rizopoulos

2004-Oct-20 13:03 UTC

head link

[R] Q about strsplit and regexp

Hi Andy,

may be something like:

x <- "  a b     c "
##########
nx <- nchar(x)
x. <- substring(x, 1:nx, 1:nx)
x.[x.!=" "]

could be helpful.

Best,
Dimitris

----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/396887
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat/
     http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm


----- Original Message ----- 
From: "Liaw, Andy" <andy_liaw at merck.com>
To: "R-Help" <r-help at r-project.org>
Sent: Wednesday, October 20, 2004 2:15 PM
Subject: [R] Q about strsplit and regexp

> Dear R-help,
>
> This one is probably a piece of cake for regexp masters.  I'd like 
> to split
> a character vector (for simplicity, say of length one for now) that 
> contains
> fields that are delimited by arbitrary number of white spaces (e.g., 
> "  a b
> c ").  How do I get the character vector that contain the fields? 
> In the
> example I gave, I've tried:
>
>> strsplit("  a b    c ", " +")
> [[1]]
> [1] ""  "a" "b" "c"
>
> I do not want that empty character in the beginning, but couldn't 
> figure out
> how to strip the starting white spaces, other than something ugly 
> like:
>
>> strsplit(sub("^ +", "", "  a b    c "),
" +")
> [[1]]
> [1] "a" "b" "c"
>
> Can some kind soul point me to a simpler way?  TIA!!
>
> Best,
> Andy
>
> Andy Liaw, PhD
> Biometrics Research      PO Box 2000, RY33-300
> Merck Research Labs           Rahway, NJ 07065
> andy_liaw <at> merck.com          732-594-0820
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

John Fox

2004-Oct-20 13:07 UTC

head link

[R] Q about strsplit and regexp

Dear Andy,

This is something that I sometimes want to do, so I have a little utility
that trims blanks and tabs from the beginnings and ends of strings:

trim.ws <- function(text) gsub("^[\ \t]", "",
gsub("[\ \t]*$", "", text))

Regards,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
-------------------------------- 
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Liaw, Andy
> Sent: Wednesday, October 20, 2004 7:16 AM
> To: R-Help
> Subject: [R] Q about strsplit and regexp
> 
> Dear R-help,
> 
> This one is probably a piece of cake for regexp masters.  I'd 
> like to split a character vector (for simplicity, say of 
> length one for now) that contains fields that are delimited 
> by arbitrary number of white spaces (e.g., "  a b c ").  How 
> do I get the character vector that contain the fields?  In 
> the example I gave, I've tried:
> 
> > strsplit("  a b    c ", " +")
> [[1]]
> [1] ""  "a" "b" "c"
> 
> I do not want that empty character in the beginning, but 
> couldn't figure out how to strip the starting white spaces, 
> other than something ugly like:
> 
> > strsplit(sub("^ +", "", "  a b    c "),
" +")
> [[1]]
> [1] "a" "b" "c"
> 
> Can some kind soul point me to a simpler way?  TIA!!
> 
> Best,
> Andy
> 
> Andy Liaw, PhD
> Biometrics Research      PO Box 2000, RY33-300     
> Merck Research Labs           Rahway, NJ 07065
> andy_liaw <at> merck.com          732-594-0820
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html

Stephen Upton

2004-Oct-20 13:32 UTC

head link

[R] Q about strsplit and regexp

Hi Andy,

A slight variation on Jean-Pierre's:

x <- unlist(strsplit("  a b    c ","[[:space:]]"))
x <- x[nchar(x) > 0]

HTH
steve
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-
> bounces at stat.math.ethz.ch] On Behalf Of Liaw, Andy
> Sent: Wednesday, October 20, 2004 8:16 AM
> To: R-Help
> Subject: [R] Q about strsplit and regexp
> 
> Dear R-help,
> 
> This one is probably a piece of cake for regexp masters.  I'd like to
> split
> a character vector (for simplicity, say of length one for now) that
> contains
> fields that are delimited by arbitrary number of white spaces (e.g., "
a
> b
> c ").  How do I get the character vector that contain the fields?  In
the
> example I gave, I've tried:
> 
> > strsplit("  a b    c ", " +")
> [[1]]
> [1] ""  "a" "b" "c"
> 
> I do not want that empty character in the beginning, but couldn't
figure
> out
> how to strip the starting white spaces, other than something ugly like:
> 
> > strsplit(sub("^ +", "", "  a b    c "),
" +")
> [[1]]
> [1] "a" "b" "c"
> 
> Can some kind soul point me to a simpler way?  TIA!!
> 
> Best,
> Andy
> 
> Andy Liaw, PhD
> Biometrics Research      PO Box 2000, RY33-300
> Merck Research Labs           Rahway, NJ 07065
> andy_liaw <at> merck.com          732-594-0820
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-
> guide.html

Liaw, Andy

2004-Oct-20 13:43 UTC

head link

[R] Q about strsplit and regexp

Thanks to Barry Rawlingson, Peter Dalgaard, Jean-Pierre Muller, Dimitris
Rizopoulos, John Fox, and Stephen Upton for comments and suggestions.  Looks
like there's no easier way than to strip the spaces before splitting the
fields.  Several people suggested deleting the empty strings afterwards.  In
my particular application, there are typically thousands of fields, and I'd
think stripping leading (and maybe trailing) spaces in the original string
should be more efficient than computing nchar() on all fields afterwards.
(Although in reality it hardly makes any difference for me:  I'm only doing
this once, not gazillion times...)

So, in summary, I'm sticking with what I had originally.  Prof. Fox's
function for nuking leading and trailing white spaces will come in handy,
though.

Thanks again to all!

Best,
Andy
> From: Liaw, Andy
> 
> Dear R-help,
> 
> This one is probably a piece of cake for regexp masters.  I'd 
> like to split
> a character vector (for simplicity, say of length one for 
> now) that contains
> fields that are delimited by arbitrary number of white spaces 
> (e.g., "  a b
> c ").  How do I get the character vector that contain the 
> fields?  In the
> example I gave, I've tried:
> 
> > strsplit("  a b    c ", " +")
> [[1]]
> [1] ""  "a" "b" "c"
> 
> I do not want that empty character in the beginning, but 
> couldn't figure out
> how to strip the starting white spaces, other than something 
> ugly like:
> 
> > strsplit(sub("^ +", "", "  a b    c "),
" +")
> [[1]]
> [1] "a" "b" "c"
> 
> Can some kind soul point me to a simpler way?  TIA!!
> 
> Best,
> Andy
> 
> Andy Liaw, PhD
> Biometrics Research      PO Box 2000, RY33-300     
> Merck Research Labs           Rahway, NJ 07065
> andy_liaw <at> merck.com          732-594-0820
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 
> --------------------------------------------------------------
> ----------------
> Notice:  This e-mail message, together with any attachments, 
> contains information of Merck & Co., Inc. (One Merck Drive, 
> Whitehouse Station, New Jersey, USA 08889), and/or its 
> affiliates (which may be known outside the United States as 
> Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as 
> Banyu) that may be confidential, proprietary copyrighted 
> and/or legally privileged. It is intended solely for the use 
> of the individual or entity named on this message.  If you 
> are not the intended recipient, and have received this 
> message in error, please notify us immediately by reply 
> e-mail and then delete it from your system.
> --------------------------------------------------------------
> ----------------
>

Arne Henningsen

2004-Oct-20 13:52 UTC

head link

[R] Q about strsplit and regexp

Dear Andy,

I also don't know a regular expression that does what you want. However, if 
you have to do this several times, you can avoid the 'ugly' command
by:> mystrsplit <- function( str ) strsplit(sub("^ +",
"", str), " +")
> mystrsplit( "  a b    c ")[[1]]
[1] "a" "b" "c"> mystrsplit( " d  e   f")[[1]]
[1] "d" "e" "f"

All the best,
Arne

On Wednesday 20 October 2004 14:15, Liaw, Andy wrote:> Dear R-help,
>
> This one is probably a piece of cake for regexp masters.  I'd like to
split
> a character vector (for simplicity, say of length one for now) that
> contains fields that are delimited by arbitrary number of white spaces
> (e.g., "  a b c ").  How do I get the character vector that
contain the
> fields?  In the
>
> example I gave, I've tried:
> > strsplit("  a b    c ", " +")
>
> [[1]]
> [1] ""  "a" "b" "c"
>
> I do not want that empty character in the beginning, but couldn't
figure
> out
>
> how to strip the starting white spaces, other than something ugly like:
> > strsplit(sub("^ +", "", "  a b    c "),
" +")
>
> [[1]]
> [1] "a" "b" "c"
>
> Can some kind soul point me to a simpler way?  TIA!!
>
> Best,
> Andy
>
> Andy Liaw, PhD
> Biometrics Research      PO Box 2000, RY33-300
> Merck Research Labs           Rahway, NJ 07065
> andy_liaw <at> merck.com          732-594-0820
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
-- 
Arne Henningsen
Department of Agricultural Economics
University of Kiel
Olshausenstr. 40
D-24098 Kiel (Germany)
Tel: +49-431-880 4445
Fax: +49-431-880 1397
ahenningsen at agric-econ.uni-kiel.de
http://www.uni-kiel.de/agrarpol/ahenningsen/

Gabor Grothendieck

2004-Oct-21 05:55 UTC

head link

[R] Q about strsplit and regexp

John Fox <jfox <at> mcmaster.ca> writes:

: This is something that I sometimes want to do, so I have a little utility
: that trims blanks and tabs from the beginnings and ends of strings:
: 
: trim.ws <- function(text) gsub("^[\ \t]", "",
gsub("[\ \t]*$", "", text))
: 

This can be reduced to a single gsub like this:

  gsub("^[[:space:]]+|[[:space:]]+$", "", text)

Maybe Matching Threads

Search for more reasonably related threads

R help - Oct 2004 - Q about strsplit and regexp

[R] Q about strsplit and regexp

[R] Q about strsplit and regexp

[R] Q about strsplit and regexp

[R] Q about strsplit and regexp

[R] Q about strsplit and regexp

[R] Q about strsplit and regexp

[R] Q about strsplit and regexp

[R] Q about strsplit and regexp

Maybe Matching Threads