Adrian Dusa
2009-Apr-10 13:48 UTC
[R] split a character variable into several character variable by a character
Dear Mao Jianfeng, "r-help-owner" is not the place for help, but: r-help at r-project.org (CC-ed here) In any case, strsplit() does the job, i.e.:> unlist(strsplit("BCPy01-01", "-"))[1] "BCPy01" "01" You can work with the whole variable, like: splitpop <- strsplit(df1$popcode, "-") then access the first part with> unlist(lapply(splitpop, "[", 1))[1] "BCPy01" "BCPy01" "BCPy01" "BCPy01" "BCPy01" "BCPy01" "BCPy01" "BCPy01" [9] "BCPy01" "BCPy01" and the second part with> unlist(lapply(splitpop, "[", 2))[1] "01" "01" "01" "02" "02" "02" "02" "02" "02" "03" hth, Adrian On Friday 10 April 2009, Mao Jianfeng wrote:> Dear, R-lister, > > I have a dataframe like the followed. And, I want to split a character > variable ("popcode", or "codetot") into several new variables. For example, > split "BCPy01-01" (popcode[1]) into "BCPy01" and "01". I need to know how > to do that. I have tried strsplit() and substring() functions. But, I still > can not perform the spliting. > > Any advice? Thanks in advance. > > df1: > popcode codetot p3need > BCPy01-01 BCPy01-01-1 100.0000 > BCPy01-01 BCPy01-01-2 100.0000 > BCPy01-01 BCPy01-01-3 100.0000 > BCPy01-02 BCPy01-02-1 92.5926 > BCPy01-02 BCPy01-02-1 100.0000 > BCPy01-02 BCPy01-02-2 92.5926 > BCPy01-02 BCPy01-02-2 100.0000 > BCPy01-02 BCPy01-02-3 92.5926 > BCPy01-02 BCPy01-02-3 100.0000 > BCPy01-03 BCPy01-03-1 100.0000 > > Regards, > > Mao Jian-feng-- Adrian Dusa Romanian Social Data Archive 1, Schitu Magureanu Bd. 050025 Bucharest sector 5 Romania Tel.:+40 21 3126618 \ +40 21 3120210 / int.101 Fax: +40 21 3158391
Mao Jianfeng
2009-Apr-10 14:55 UTC
[R] split a character variable into several character variable by a character
Dear, R-lister, I have a dataframe like the followed. And, I want to split a character variable ("popcode", or "codetot") into several new variables. For example, split "BCPy01-01" (in popcode) into "BCPy01" and "01". I need to know how to do that. I have tried strsplit() and substring() functions. But, I still can not perform the spliting. Any advice? Thanks in advance. df1: popcode codetot p3need BCPy01-01 BCPy01-01-1 100.0000 BCPy01-01 BCPy01-01-2 100.0000 BCPy01-01 BCPy01-01-3 100.0000 BCPy01-02 BCPy01-02-1 92.5926 BCPy01-02 BCPy01-02-1 100.0000 BCPy01-02 BCPy01-02-2 92.5926 BCPy01-02 BCPy01-02-2 100.0000 BCPy01-02 BCPy01-02-3 92.5926 BCPy01-02 BCPy01-02-3 100.0000 BCPy01-03 BCPy01-03-1 100.0000 Regards, Mao Jian-feng [[alternative HTML version deleted]]
William Dunlap
2009-Apr-10 14:56 UTC
[R] split a character variable into several character variable by a character
strsplit() is the way to do it, but if your putative character strings come from a data.frame you need to make sure they are really character strings and not factors (at least in R 2.8.1). > d<-data.frame(name=c("Bill Dunlap", "First Last"), num=1:2) > d name num 1 Bill Dunlap 1 2 First Last 2 > sapply(d,class) name num "factor" "integer" > strsplit(d$name, " ") Error in strsplit(d$name, " ") : non-character argument > strsplit(as.character(d$name), " ") [[1]] [1] "Bill" "Dunlap" [[2]] [1] "First" "Last" > d1<-data.frame(stringsAsFactors=FALSE,name=c("Bill Dunlap", "First Last"), num=1:2) > sapply(d1,class) name num "character" "integer" > strsplit(d1$name, " ") [[1]] [1] "Bill" "Dunlap" [[2]] [1] "First" "Last" Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com ------------------------------------------------------------------------ - [R] split a character variable into several character variable by a character Adrian Dusa dusa.adrian at gmail.com Fri Apr 10 15:48:53 CEST 2009 Dear Mao Jianfeng, "r-help-owner" is not the place for help, but: r-help at r-project.org (CC-ed here) In any case, strsplit() does the job, i.e.:> unlist(strsplit("BCPy01-01", "-"))[1] "BCPy01" "01" You can work with the whole variable, like: splitpop <- strsplit(df1$popcode, "-") then access the first part with> unlist(lapply(splitpop, "[", 1))[1] "BCPy01" "BCPy01" "BCPy01" "BCPy01" "BCPy01" "BCPy01" "BCPy01" "BCPy01" [9] "BCPy01" "BCPy01" and the second part with> unlist(lapply(splitpop, "[", 2))[1] "01" "01" "01" "02" "02" "02" "02" "02" "02" "03" hth, Adrian On Friday 10 April 2009, Mao Jianfeng wrote:> Dear, R-lister, > > I have a dataframe like the followed. And, I want to split a character > variable ("popcode", or "codetot") into several new variables. Forexample,> split "BCPy01-01" (popcode[1]) into "BCPy01" and "01". I need to knowhow> to do that. I have tried strsplit() and substring() functions. But, Istill> can not perform the spliting.It always helps to see exactly what you tried and a description of how the results differ from what you wanted to get.> > Any advice? Thanks in advance. > > df1: > popcode codetot p3need > BCPy01-01 BCPy01-01-1 100.0000 > BCPy01-01 BCPy01-01-2 100.0000 > BCPy01-01 BCPy01-01-3 100.0000 > BCPy01-02 BCPy01-02-1 92.5926 > BCPy01-02 BCPy01-02-1 100.0000 > BCPy01-02 BCPy01-02-2 92.5926 > BCPy01-02 BCPy01-02-2 100.0000 > BCPy01-02 BCPy01-02-3 92.5926 > BCPy01-02 BCPy01-02-3 100.0000 > BCPy01-03 BCPy01-03-1 100.0000 > > Regards, > > Mao Jian-feng-- Adrian Dusa Romanian Social Data Archive 1, Schitu Magureanu Bd. 050025 Bucharest sector 5 Romania Tel.:+40 21 3126618 \ +40 21 3120210 / int.101 Fax: +40 21 3158391
Adrian Dusa
2009-Apr-10 15:05 UTC
[R] split a character variable into several character variable by a character
Good observation, Bill! Adrian On Friday 10 April 2009, William Dunlap wrote:> strsplit() is the way to do it, but if your putative > character strings come from a data.frame you need to make > sure they are really character strings and not factors > (at least in R 2.8.1). > > > d<-data.frame(name=c("Bill Dunlap", "First Last"), num=1:2) > > d > > name num > 1 Bill Dunlap 1 > 2 First Last 2 > > > sapply(d,class) > > name num > "factor" "integer" > > > strsplit(d$name, " ") > > Error in strsplit(d$name, " ") : non-character argument > > > strsplit(as.character(d$name), " ") > > [[1]] > [1] "Bill" "Dunlap" > > [[2]] > [1] "First" "Last" > > > d1<-data.frame(stringsAsFactors=FALSE,name=c("Bill Dunlap", "First > > Last"), num=1:2) > > > sapply(d1,class) > > name num > "character" "integer" > > > strsplit(d1$name, " ") > > [[1]] > [1] "Bill" "Dunlap" > > [[2]] > [1] "First" "Last" > > Bill Dunlap > TIBCO Software Inc - Spotfire Division > wdunlap tibco.com > > ------------------------------------------------------------------------ > - > [R] split a character variable into several character variable by a > character > > Adrian Dusa dusa.adrian at gmail.com > Fri Apr 10 15:48:53 CEST 2009 > > Dear Mao Jianfeng, > > "r-help-owner" is not the place for help, but: > r-help at r-project.org > (CC-ed here) > > In any case, strsplit() does the job, i.e.: > > unlist(strsplit("BCPy01-01", "-")) > > [1] "BCPy01" "01" > > You can work with the whole variable, like: > splitpop <- strsplit(df1$popcode, "-") > > then access the first part with > > > unlist(lapply(splitpop, "[", 1)) > > [1] "BCPy01" "BCPy01" "BCPy01" "BCPy01" "BCPy01" "BCPy01" "BCPy01" > "BCPy01" > [9] "BCPy01" "BCPy01" > > and the second part with > > > unlist(lapply(splitpop, "[", 2)) > > [1] "01" "01" "01" "02" "02" "02" "02" "02" "02" "03" > > hth, > Adrian > > On Friday 10 April 2009, Mao Jianfeng wrote: > > Dear, R-lister, > > > > I have a dataframe like the followed. And, I want to split a character > > variable ("popcode", or "codetot") into several new variables. For > > example, > > > split "BCPy01-01" (popcode[1]) into "BCPy01" and "01". I need to know > > how > > > to do that. I have tried strsplit() and substring() functions. But, I > > still > > > can not perform the spliting. > > It always helps to see exactly what you tried > and a description of how the results differ from > what you wanted to get. > > > Any advice? Thanks in advance. > > > > df1: > > popcode codetot p3need > > BCPy01-01 BCPy01-01-1 100.0000 > > BCPy01-01 BCPy01-01-2 100.0000 > > BCPy01-01 BCPy01-01-3 100.0000 > > BCPy01-02 BCPy01-02-1 92.5926 > > BCPy01-02 BCPy01-02-1 100.0000 > > BCPy01-02 BCPy01-02-2 92.5926 > > BCPy01-02 BCPy01-02-2 100.0000 > > BCPy01-02 BCPy01-02-3 92.5926 > > BCPy01-02 BCPy01-02-3 100.0000 > > BCPy01-03 BCPy01-03-1 100.0000 > > > > Regards, > > > > Mao Jian-feng-- Adrian Dusa Romanian Social Data Archive 1, Schitu Magureanu Bd. 050025 Bucharest sector 5 Romania Tel.:+40 21 3126618 \ +40 21 3120210 / int.101 Fax: +40 21 3158391
Francisco J. Zagmutt
2009-Apr-10 19:15 UTC
[R] split a character variable into several character variable by a character
Hello Mao, If the popcode variable has a fixed number of characters (i.e each entry has 9 characters), you can use a simple call to substr: dat<-read.table("clipboard", header=T)#Read from your email varleft<-substr(dat$popcode,0,6) varright<-substr(dat$popcode,8,9) datnew<-data.frame(dat,varleft,varright) > datnew popcode codetot p3need varleft varright 1 BCPy01-01 BCPy01-01-1 100.0000 BCPy01 01 2 BCPy01-01 BCPy01-01-2 100.0000 BCPy01 01 3 BCPy01-01 BCPy01-01-3 100.0000 BCPy01 01 4 BCPy01-02 BCPy01-02-1 92.5926 BCPy01 02 5 BCPy01-02 BCPy01-02-1 100.0000 BCPy01 02 6 BCPy01-02 BCPy01-02-2 92.5926 BCPy01 02 7 BCPy01-02 BCPy01-02-2 100.0000 BCPy01 02 8 BCPy01-02 BCPy01-02-3 92.5926 BCPy01 02 9 BCPy01-02 BCPy01-02-3 100.0000 BCPy01 02 10 BCPy01-03 BCPy01-03-1 100.0000 BCPy01 03 You can use a similar construction for codetot. I hope this helps, Francisco Francisco J. Zagmutt Vose Consulting 2891 20th Street Boulder, CO, 80304 USA francisco at voseconsulting.com www.voseconsulting.com Mao Jianfeng wrote:> Dear, R-lister, > > I have a dataframe like the followed. And, I want to split a character > variable ("popcode", or "codetot") into several new variables. For example, > split "BCPy01-01" (in popcode) into "BCPy01" and "01". I need to know how to > do that. I have tried strsplit() and substring() functions. But, I still can > not perform the spliting. > > Any advice? Thanks in advance. > > df1: > popcode codetot p3need > BCPy01-01 BCPy01-01-1 100.0000 > BCPy01-01 BCPy01-01-2 100.0000 > BCPy01-01 BCPy01-01-3 100.0000 > BCPy01-02 BCPy01-02-1 92.5926 > BCPy01-02 BCPy01-02-1 100.0000 > BCPy01-02 BCPy01-02-2 92.5926 > BCPy01-02 BCPy01-02-2 100.0000 > BCPy01-02 BCPy01-02-3 92.5926 > BCPy01-02 BCPy01-02-3 100.0000 > BCPy01-03 BCPy01-03-1 100.0000 > > Regards, > > Mao Jian-feng > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >