> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Jonathan Greenberg
> Sent: Thursday, October 22, 2009 7:35 PM
> To: r-help
> Subject: [R] splitting a vector of strings...
>
> Quick question -- if I have a vector of strings that I'd like
> to split
> into two new vectors based on a substring that is inside of
> each string,
> what is the most efficient way to do this? The substring
> that I want to
> split on is multiple characters, if that matters, and it is
> contained in
> every element of the character vector.
strsplit and sub can both be used for this. If you know
the string will be split into 2 parts then 2 calls to sub
with slightly different patterns will do it. strsplit requires
less fiddling with the pattern and is handier when the number
of parts is variable or large. strsplit's output often needs to
be rearranged for convenient use.
E.g., I made 100,000 strings with a 'qaz' in their middles with
x<-paste("X",sample(1e5),sep="")
y<-sub("X","Y",x)
xy<-paste(x,y,sep="qaz")
and split them by the 'qaz' in two ways:
system.time(ret1<-list(x=sub("qaz.*","",xy),y=sub(".*qaz","",xy)))
# user system elapsed
# 0.22 0.00 0.21
system.time({tmp<-strsplit(xy,"qaz");ret2<-list(x=unlist(lapply(tmp,`[`,
1)),y=unlist(lapply(tmp,`[`,2)))})
user system elapsed
# 2.42 0.00 2.20
identical(ret1,ret2)
#[1] TRUE
identical(ret1$x,x) && identical(ret1$y,y)
#[1] TRUE
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
>
> --j
>
> --
>
> Jonathan A. Greenberg, PhD
> Postdoctoral Scholar
> Center for Spatial Technologies and Remote Sensing (CSTARS)
> University of California, Davis
> One Shields Avenue
> The Barn, Room 250N
> Davis, CA 95616
> Phone: 415-763-5476
> AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>