Daisy Englert Duursma
2010-Oct-26 03:56 UTC
[R] divide column in a dataframe based on a character
Hello, If I have a dataframe: example(data.frame) zz<-c("aa_bb","bb_cc","cc_dd","dd_ee","ee_ff","ff_gg","gg_hh","ii_jj","jj_kk","kk_ll") ddd <- cbind(dd, group = zz) and I want to divide the column named group by the "_", how would I do this? so instead of the first row being x y fac char group 1 1 C a aa_bb it should be: x y fac char group_a group_b 1 1 C a aa bb I know for a vector I can: x1 <- c("a_b","b_c","c_d") do.call("rbind",strsplit(x1, "_")) but I am not sure how this relates to my data.frame Thanks, Daisy -- Daisy Englert Duursma Room E8C156 Dept. Biological Sciences Macquarie University? NSW? 2109 Australia
Bill.Venables at csiro.au
2010-Oct-26 04:23 UTC
[R] divide column in a dataframe based on a character
You are nearly there. ____ example(data.frame) zz <- c("aa_bb","bb_cc","cc_dd","dd_ee","ee_ff", "ff_gg","gg_hh","ii_jj","jj_kk","kk_ll") ddd <- cbind(dd, group = zz) ddd <- within(ddd, { group <- as.character(group) tmp <- do.call(rbind, strsplit(group, "_")) group_b <- tmp[,2] group_a <- tmp[,1] rm(tmp, group) }) ____> dddx y fac char group_a group_b 1 1 1 B a aa bb 2 1 2 B b bb cc 3 1 3 A c cc dd 4 1 4 A d dd ee 5 1 5 A e ee ff 6 1 6 C f ff gg 7 1 7 A g gg hh 8 1 8 B h ii jj 9 1 9 C i jj kk 10 1 10 B j kk ll -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Daisy Englert Duursma Sent: Tuesday, 26 October 2010 1:57 PM To: r-help at r-project.org Subject: [R] divide column in a dataframe based on a character Hello, If I have a dataframe: example(data.frame) zz<-c("aa_bb","bb_cc","cc_dd","dd_ee","ee_ff","ff_gg","gg_hh","ii_jj","jj_kk","kk_ll") ddd <- cbind(dd, group = zz) and I want to divide the column named group by the "_", how would I do this? so instead of the first row being x y fac char group 1 1 C a aa_bb it should be: x y fac char group_a group_b 1 1 C a aa bb I know for a vector I can: x1 <- c("a_b","b_c","c_d") do.call("rbind",strsplit(x1, "_")) but I am not sure how this relates to my data.frame Thanks, Daisy -- Daisy Englert Duursma Room E8C156 Dept. Biological Sciences Macquarie University? NSW? 2109 Australia ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius
2010-Oct-26 04:33 UTC
[R] divide column in a dataframe based on a character
On Oct 25, 2010, at 8:56 PM, Daisy Englert Duursma wrote:> Hello, > > If I have a dataframe: > > example(data.frame) > zz<- > c > ("aa_bb > ","bb_cc > ","cc_dd","dd_ee","ee_ff","ff_gg","gg_hh","ii_jj","jj_kk","kk_ll") > ddd <- cbind(dd, group = zz) > > and I want to divide the column named group by the "_", how would I > do this? > > so instead of the first row being > x y fac char group > 1 1 C a aa_bb > > it should be: > x y fac char group_a group_b > 1 1 C a aa bb > > > > I know for a vector I can: > x1 <- c("a_b","b_c","c_d") > do.call("rbind",strsplit(x1, "_")) > > but I am not sure how this relates to my data.frameThe group columns is a factor, as is the default structure for non- numeric character arguments to dataframe() and cbind.data.frame(). If you want to the split values you must first convert to character: > ddd$group_a <- lapply(strsplit(as.character(ddd$group), "_"), "[", 1) > ddd$group_b <- lapply(strsplit(as.character(ddd$group), "_"), "[", 2) > ddd x y fac char group group_a group_b 1 1 1 C a aa_bb aa bb 2 1 2 B b bb_cc bb cc 3 1 3 C c cc_dd cc dd 4 1 4 C d dd_ee dd ee 5 1 5 B e ee_ff ee ff 6 1 6 A f ff_gg ff gg 7 1 7 C g gg_hh gg hh 8 1 8 A h ii_jj ii jj 9 1 9 B i jj_kk jj kk 10 1 10 B j kk_ll kk ll -- David.
Daisy Englert Duursma
2010-Oct-26 04:47 UTC
[R] divide column in a dataframe based on a character
Thanks for the help. Easy as.. On Tue, Oct 26, 2010 at 3:33 PM, David Winsemius <dwinsemius at comcast.net> wrote:> > On Oct 25, 2010, at 8:56 PM, Daisy Englert Duursma wrote: > >> Hello, >> >> If I have a dataframe: >> >> example(data.frame) >> >> zz<-c("aa_bb","bb_cc","cc_dd","dd_ee","ee_ff","ff_gg","gg_hh","ii_jj","jj_kk","kk_ll") >> ddd <- cbind(dd, group = zz) >> >> and I want to divide the column named group by the "_", how would I do >> this? >> >> so instead of the first row being >> x ? y ?fac char ?group >> 1 ?1 ? C ? ?a ? ? aa_bb >> >> it should be: >> x ?y fac ?char group_a ? ?group_b >> 1 ?1 ? C ? ?a ? ? ?aa ? ? ? ? ? ? bb >> >> >> >> I know for a vector I can: >> x1 <- c("a_b","b_c","c_d") >> do.call("rbind",strsplit(x1, "_")) >> >> but I am not sure how this relates to my data.frame > > The group columns is a factor, as is the default structure for non-numeric > character arguments to dataframe() and cbind.data.frame(). If you want to > the split values you must first convert to character: > >> ddd$group_a <- lapply(strsplit(as.character(ddd$group), "_"), "[", 1) >> ddd$group_b <- lapply(strsplit(as.character(ddd$group), "_"), "[", 2) >> ddd > ? x ?y fac char group group_a group_b > 1 ?1 ?1 ? C ? ?a aa_bb ? ?aa ? ? bb > 2 ?1 ?2 ? B ? ?b bb_cc ? ?bb ? ? cc > 3 ?1 ?3 ? C ? ?c cc_dd ? ?cc ? ? dd > 4 ?1 ?4 ? C ? ?d dd_ee ? ?dd ? ? ee > 5 ?1 ?5 ? B ? ?e ee_ff ? ?ee ? ? ff > 6 ?1 ?6 ? A ? ?f ff_gg ? ?ff ? ? gg > 7 ?1 ?7 ? C ? ?g gg_hh ? ?gg ? ? hh > 8 ?1 ?8 ? A ? ?h ii_jj ? ?ii ? ? jj > 9 ?1 ?9 ? B ? ?i jj_kk ? ?jj ? ? kk > 10 1 10 ? B ? ?j kk_ll ? ?kk ? ? ll > > -- > David. > >-- Daisy Englert Duursma Room E8C156 Dept. Biological Sciences Macquarie University? NSW? 2109 Australia Tel +61 2 9850 9256 Unit 2, 35 Denison St Hornsby, NSW 2077 Mobile: 0421858456
On 10/25/2010 8:56 PM, Daisy Englert Duursma wrote:> Hello, > > If I have a dataframe: > > example(data.frame) > zz<-c("aa_bb","bb_cc","cc_dd","dd_ee","ee_ff","ff_gg","gg_hh","ii_jj","jj_kk","kk_ll") > ddd<- cbind(dd, group = zz) > > and I want to divide the column named group by the "_", how would I do this? > > so instead of the first row being > x y fac char group > 1 1 C a aa_bb > > it should be: > x y fac char group_a group_b > 1 1 C a aa bb > > > > I know for a vector I can: > x1<- c("a_b","b_c","c_d") > do.call("rbind",strsplit(x1, "_")) > > but I am not sure how this relates to my data.frame > > Thanks, > DaisyDaisy, You've already gotten a couple of responses, but I thought I'd point out an different approach (the let-someone-else-deal-with-the-details approach). The reshape package has a colsplit function which is designed just for this sort of thing. library("reshape") ddd <- cbind(ddd, colsplit(ddd$group, split="_", names=c("group_a", "group_b"))) -- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health & Science University