Adrian Dusa
2009-Apr-10 13:48 UTC
[R] split a character variable into several character variable by a character
Dear Mao Jianfeng, "r-help-owner" is not the place for help, but: r-help at r-project.org (CC-ed here) In any case, strsplit() does the job, i.e.:> unlist(strsplit("BCPy01-01", "-"))[1] "BCPy01" "01" You can work with the whole variable, like: splitpop <- strsplit(df1$popcode, "-") then access the first part with> unlist(lapply(splitpop, "[", 1))[1] "BCPy01" "BCPy01" "BCPy01" "BCPy01" "BCPy01" "BCPy01" "BCPy01" "BCPy01" [9] "BCPy01" "BCPy01" and the second part with> unlist(lapply(splitpop, "[", 2))[1] "01" "01" "01" "02" "02" "02" "02" "02" "02" "03" hth, Adrian On Friday 10 April 2009, Mao Jianfeng wrote:> Dear, R-lister, > > I have a dataframe like the followed. And, I want to split a character > variable ("popcode", or "codetot") into several new variables. For example, > split "BCPy01-01" (popcode[1]) into "BCPy01" and "01". I need to know how > to do that. I have tried strsplit() and substring() functions. But, I still > can not perform the spliting. > > Any advice? Thanks in advance. > > df1: > popcode codetot p3need > BCPy01-01 BCPy01-01-1 100.0000 > BCPy01-01 BCPy01-01-2 100.0000 > BCPy01-01 BCPy01-01-3 100.0000 > BCPy01-02 BCPy01-02-1 92.5926 > BCPy01-02 BCPy01-02-1 100.0000 > BCPy01-02 BCPy01-02-2 92.5926 > BCPy01-02 BCPy01-02-2 100.0000 > BCPy01-02 BCPy01-02-3 92.5926 > BCPy01-02 BCPy01-02-3 100.0000 > BCPy01-03 BCPy01-03-1 100.0000 > > Regards, > > Mao Jian-feng-- Adrian Dusa Romanian Social Data Archive 1, Schitu Magureanu Bd. 050025 Bucharest sector 5 Romania Tel.:+40 21 3126618 \ +40 21 3120210 / int.101 Fax: +40 21 3158391
Mao Jianfeng
2009-Apr-10 14:55 UTC
[R] split a character variable into several character variable by a character
Dear, R-lister,
I have a dataframe like the followed. And, I want to split a character
variable ("popcode", or "codetot") into several new
variables. For example,
split "BCPy01-01" (in popcode) into "BCPy01" and
"01". I need to know how to
do that. I have tried strsplit() and substring() functions. But, I still can
not perform the spliting.
Any advice? Thanks in advance.
df1:
popcode codetot p3need
BCPy01-01 BCPy01-01-1 100.0000
BCPy01-01 BCPy01-01-2 100.0000
BCPy01-01 BCPy01-01-3 100.0000
BCPy01-02 BCPy01-02-1 92.5926
BCPy01-02 BCPy01-02-1 100.0000
BCPy01-02 BCPy01-02-2 92.5926
BCPy01-02 BCPy01-02-2 100.0000
BCPy01-02 BCPy01-02-3 92.5926
BCPy01-02 BCPy01-02-3 100.0000
BCPy01-03 BCPy01-03-1 100.0000
Regards,
Mao Jian-feng
[[alternative HTML version deleted]]
William Dunlap
2009-Apr-10 14:56 UTC
[R] split a character variable into several character variable by a character
strsplit() is the way to do it, but if your putative
character strings come from a data.frame you need to make
sure they are really character strings and not factors
(at least in R 2.8.1).
> d<-data.frame(name=c("Bill Dunlap", "First
Last"), num=1:2)
> d
name num
1 Bill Dunlap 1
2 First Last 2
> sapply(d,class)
name num
"factor" "integer"
> strsplit(d$name, " ")
Error in strsplit(d$name, " ") : non-character argument
> strsplit(as.character(d$name), " ")
[[1]]
[1] "Bill" "Dunlap"
[[2]]
[1] "First" "Last"
> d1<-data.frame(stringsAsFactors=FALSE,name=c("Bill Dunlap",
"First
Last"), num=1:2)
> sapply(d1,class)
name num
"character" "integer"
> strsplit(d1$name, " ")
[[1]]
[1] "Bill" "Dunlap"
[[2]]
[1] "First" "Last"
Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com
------------------------------------------------------------------------
-
[R] split a character variable into several character variable by a
character
Adrian Dusa dusa.adrian at gmail.com
Fri Apr 10 15:48:53 CEST 2009
Dear Mao Jianfeng,
"r-help-owner" is not the place for help, but:
r-help at r-project.org
(CC-ed here)
In any case, strsplit() does the job, i.e.:
> unlist(strsplit("BCPy01-01", "-"))
[1] "BCPy01" "01"
You can work with the whole variable, like:
splitpop <- strsplit(df1$popcode, "-")
then access the first part with> unlist(lapply(splitpop, "[", 1))
[1] "BCPy01" "BCPy01" "BCPy01" "BCPy01"
"BCPy01" "BCPy01" "BCPy01"
"BCPy01"
[9] "BCPy01" "BCPy01"
and the second part with> unlist(lapply(splitpop, "[", 2))
[1] "01" "01" "01" "02" "02"
"02" "02" "02" "02" "03"
hth,
Adrian
On Friday 10 April 2009, Mao Jianfeng wrote:> Dear, R-lister,
>
> I have a dataframe like the followed. And, I want to split a character
> variable ("popcode", or "codetot") into several new
variables. For
example,> split "BCPy01-01" (popcode[1]) into "BCPy01" and
"01". I need to know
how> to do that. I have tried strsplit() and substring() functions. But, I
still> can not perform the spliting.
It always helps to see exactly what you tried
and a description of how the results differ from
what you wanted to get.
>
> Any advice? Thanks in advance.
>
> df1:
> popcode codetot p3need
> BCPy01-01 BCPy01-01-1 100.0000
> BCPy01-01 BCPy01-01-2 100.0000
> BCPy01-01 BCPy01-01-3 100.0000
> BCPy01-02 BCPy01-02-1 92.5926
> BCPy01-02 BCPy01-02-1 100.0000
> BCPy01-02 BCPy01-02-2 92.5926
> BCPy01-02 BCPy01-02-2 100.0000
> BCPy01-02 BCPy01-02-3 92.5926
> BCPy01-02 BCPy01-02-3 100.0000
> BCPy01-03 BCPy01-03-1 100.0000
>
> Regards,
>
> Mao Jian-feng
--
Adrian Dusa
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
+40 21 3120210 / int.101
Fax: +40 21 3158391
Adrian Dusa
2009-Apr-10 15:05 UTC
[R] split a character variable into several character variable by a character
Good observation, Bill! Adrian On Friday 10 April 2009, William Dunlap wrote:> strsplit() is the way to do it, but if your putative > character strings come from a data.frame you need to make > sure they are really character strings and not factors > (at least in R 2.8.1). > > > d<-data.frame(name=c("Bill Dunlap", "First Last"), num=1:2) > > d > > name num > 1 Bill Dunlap 1 > 2 First Last 2 > > > sapply(d,class) > > name num > "factor" "integer" > > > strsplit(d$name, " ") > > Error in strsplit(d$name, " ") : non-character argument > > > strsplit(as.character(d$name), " ") > > [[1]] > [1] "Bill" "Dunlap" > > [[2]] > [1] "First" "Last" > > > d1<-data.frame(stringsAsFactors=FALSE,name=c("Bill Dunlap", "First > > Last"), num=1:2) > > > sapply(d1,class) > > name num > "character" "integer" > > > strsplit(d1$name, " ") > > [[1]] > [1] "Bill" "Dunlap" > > [[2]] > [1] "First" "Last" > > Bill Dunlap > TIBCO Software Inc - Spotfire Division > wdunlap tibco.com > > ------------------------------------------------------------------------ > - > [R] split a character variable into several character variable by a > character > > Adrian Dusa dusa.adrian at gmail.com > Fri Apr 10 15:48:53 CEST 2009 > > Dear Mao Jianfeng, > > "r-help-owner" is not the place for help, but: > r-help at r-project.org > (CC-ed here) > > In any case, strsplit() does the job, i.e.: > > unlist(strsplit("BCPy01-01", "-")) > > [1] "BCPy01" "01" > > You can work with the whole variable, like: > splitpop <- strsplit(df1$popcode, "-") > > then access the first part with > > > unlist(lapply(splitpop, "[", 1)) > > [1] "BCPy01" "BCPy01" "BCPy01" "BCPy01" "BCPy01" "BCPy01" "BCPy01" > "BCPy01" > [9] "BCPy01" "BCPy01" > > and the second part with > > > unlist(lapply(splitpop, "[", 2)) > > [1] "01" "01" "01" "02" "02" "02" "02" "02" "02" "03" > > hth, > Adrian > > On Friday 10 April 2009, Mao Jianfeng wrote: > > Dear, R-lister, > > > > I have a dataframe like the followed. And, I want to split a character > > variable ("popcode", or "codetot") into several new variables. For > > example, > > > split "BCPy01-01" (popcode[1]) into "BCPy01" and "01". I need to know > > how > > > to do that. I have tried strsplit() and substring() functions. But, I > > still > > > can not perform the spliting. > > It always helps to see exactly what you tried > and a description of how the results differ from > what you wanted to get. > > > Any advice? Thanks in advance. > > > > df1: > > popcode codetot p3need > > BCPy01-01 BCPy01-01-1 100.0000 > > BCPy01-01 BCPy01-01-2 100.0000 > > BCPy01-01 BCPy01-01-3 100.0000 > > BCPy01-02 BCPy01-02-1 92.5926 > > BCPy01-02 BCPy01-02-1 100.0000 > > BCPy01-02 BCPy01-02-2 92.5926 > > BCPy01-02 BCPy01-02-2 100.0000 > > BCPy01-02 BCPy01-02-3 92.5926 > > BCPy01-02 BCPy01-02-3 100.0000 > > BCPy01-03 BCPy01-03-1 100.0000 > > > > Regards, > > > > Mao Jian-feng-- Adrian Dusa Romanian Social Data Archive 1, Schitu Magureanu Bd. 050025 Bucharest sector 5 Romania Tel.:+40 21 3126618 \ +40 21 3120210 / int.101 Fax: +40 21 3158391
Francisco J. Zagmutt
2009-Apr-10 19:15 UTC
[R] split a character variable into several character variable by a character
Hello Mao,
If the popcode variable has a fixed number of characters (i.e each entry
has 9 characters), you can use a simple call to substr:
dat<-read.table("clipboard", header=T)#Read from your email
varleft<-substr(dat$popcode,0,6)
varright<-substr(dat$popcode,8,9)
datnew<-data.frame(dat,varleft,varright)
> datnew
popcode codetot p3need varleft varright
1 BCPy01-01 BCPy01-01-1 100.0000 BCPy01 01
2 BCPy01-01 BCPy01-01-2 100.0000 BCPy01 01
3 BCPy01-01 BCPy01-01-3 100.0000 BCPy01 01
4 BCPy01-02 BCPy01-02-1 92.5926 BCPy01 02
5 BCPy01-02 BCPy01-02-1 100.0000 BCPy01 02
6 BCPy01-02 BCPy01-02-2 92.5926 BCPy01 02
7 BCPy01-02 BCPy01-02-2 100.0000 BCPy01 02
8 BCPy01-02 BCPy01-02-3 92.5926 BCPy01 02
9 BCPy01-02 BCPy01-02-3 100.0000 BCPy01 02
10 BCPy01-03 BCPy01-03-1 100.0000 BCPy01 03
You can use a similar construction for codetot.
I hope this helps,
Francisco
Francisco J. Zagmutt
Vose Consulting
2891 20th Street
Boulder, CO, 80304
USA
francisco at voseconsulting.com
www.voseconsulting.com
Mao Jianfeng wrote:> Dear, R-lister,
>
> I have a dataframe like the followed. And, I want to split a character
> variable ("popcode", or "codetot") into several new
variables. For example,
> split "BCPy01-01" (in popcode) into "BCPy01" and
"01". I need to know how to
> do that. I have tried strsplit() and substring() functions. But, I still
can
> not perform the spliting.
>
> Any advice? Thanks in advance.
>
> df1:
> popcode codetot p3need
> BCPy01-01 BCPy01-01-1 100.0000
> BCPy01-01 BCPy01-01-2 100.0000
> BCPy01-01 BCPy01-01-3 100.0000
> BCPy01-02 BCPy01-02-1 92.5926
> BCPy01-02 BCPy01-02-1 100.0000
> BCPy01-02 BCPy01-02-2 92.5926
> BCPy01-02 BCPy01-02-2 100.0000
> BCPy01-02 BCPy01-02-3 92.5926
> BCPy01-02 BCPy01-02-3 100.0000
> BCPy01-03 BCPy01-03-1 100.0000
>
> Regards,
>
> Mao Jian-feng
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>