Daisy Englert Duursma
2010-Oct-26 03:56 UTC
[R] divide column in a dataframe based on a character
Hello,
If I have a dataframe:
example(data.frame)
zz<-c("aa_bb","bb_cc","cc_dd","dd_ee","ee_ff","ff_gg","gg_hh","ii_jj","jj_kk","kk_ll")
ddd <- cbind(dd, group = zz)
and I want to divide the column named group by the "_", how would I do
this?
so instead of the first row being
x y fac char group
1 1 C a aa_bb
it should be:
x y fac char group_a group_b
1 1 C a aa bb
I know for a vector I can:
x1 <- c("a_b","b_c","c_d")
do.call("rbind",strsplit(x1, "_"))
but I am not sure how this relates to my data.frame
Thanks,
Daisy
--
Daisy Englert Duursma
Room E8C156
Dept. Biological Sciences
Macquarie University? NSW? 2109
Australia
Bill.Venables at csiro.au
2010-Oct-26 04:23 UTC
[R] divide column in a dataframe based on a character
You are nearly there.
____
example(data.frame)
zz <-
c("aa_bb","bb_cc","cc_dd","dd_ee","ee_ff",
"ff_gg","gg_hh","ii_jj","jj_kk","kk_ll")
ddd <- cbind(dd, group = zz)
ddd <- within(ddd, {
group <- as.character(group)
tmp <- do.call(rbind, strsplit(group, "_"))
group_b <- tmp[,2]
group_a <- tmp[,1]
rm(tmp, group)
})
____
> ddd
x y fac char group_a group_b
1 1 1 B a aa bb
2 1 2 B b bb cc
3 1 3 A c cc dd
4 1 4 A d dd ee
5 1 5 A e ee ff
6 1 6 C f ff gg
7 1 7 A g gg hh
8 1 8 B h ii jj
9 1 9 C i jj kk
10 1 10 B j kk ll
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Daisy Englert Duursma
Sent: Tuesday, 26 October 2010 1:57 PM
To: r-help at r-project.org
Subject: [R] divide column in a dataframe based on a character
Hello,
If I have a dataframe:
example(data.frame)
zz<-c("aa_bb","bb_cc","cc_dd","dd_ee","ee_ff","ff_gg","gg_hh","ii_jj","jj_kk","kk_ll")
ddd <- cbind(dd, group = zz)
and I want to divide the column named group by the "_", how would I do
this?
so instead of the first row being
x y fac char group
1 1 C a aa_bb
it should be:
x y fac char group_a group_b
1 1 C a aa bb
I know for a vector I can:
x1 <- c("a_b","b_c","c_d")
do.call("rbind",strsplit(x1, "_"))
but I am not sure how this relates to my data.frame
Thanks,
Daisy
--
Daisy Englert Duursma
Room E8C156
Dept. Biological Sciences
Macquarie University? NSW? 2109
Australia
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius
2010-Oct-26 04:33 UTC
[R] divide column in a dataframe based on a character
On Oct 25, 2010, at 8:56 PM, Daisy Englert Duursma wrote:> Hello, > > If I have a dataframe: > > example(data.frame) > zz<- > c > ("aa_bb > ","bb_cc > ","cc_dd","dd_ee","ee_ff","ff_gg","gg_hh","ii_jj","jj_kk","kk_ll") > ddd <- cbind(dd, group = zz) > > and I want to divide the column named group by the "_", how would I > do this? > > so instead of the first row being > x y fac char group > 1 1 C a aa_bb > > it should be: > x y fac char group_a group_b > 1 1 C a aa bb > > > > I know for a vector I can: > x1 <- c("a_b","b_c","c_d") > do.call("rbind",strsplit(x1, "_")) > > but I am not sure how this relates to my data.frameThe group columns is a factor, as is the default structure for non- numeric character arguments to dataframe() and cbind.data.frame(). If you want to the split values you must first convert to character: > ddd$group_a <- lapply(strsplit(as.character(ddd$group), "_"), "[", 1) > ddd$group_b <- lapply(strsplit(as.character(ddd$group), "_"), "[", 2) > ddd x y fac char group group_a group_b 1 1 1 C a aa_bb aa bb 2 1 2 B b bb_cc bb cc 3 1 3 C c cc_dd cc dd 4 1 4 C d dd_ee dd ee 5 1 5 B e ee_ff ee ff 6 1 6 A f ff_gg ff gg 7 1 7 C g gg_hh gg hh 8 1 8 A h ii_jj ii jj 9 1 9 B i jj_kk jj kk 10 1 10 B j kk_ll kk ll -- David.
Daisy Englert Duursma
2010-Oct-26 04:47 UTC
[R] divide column in a dataframe based on a character
Thanks for the help. Easy as.. On Tue, Oct 26, 2010 at 3:33 PM, David Winsemius <dwinsemius at comcast.net> wrote:> > On Oct 25, 2010, at 8:56 PM, Daisy Englert Duursma wrote: > >> Hello, >> >> If I have a dataframe: >> >> example(data.frame) >> >> zz<-c("aa_bb","bb_cc","cc_dd","dd_ee","ee_ff","ff_gg","gg_hh","ii_jj","jj_kk","kk_ll") >> ddd <- cbind(dd, group = zz) >> >> and I want to divide the column named group by the "_", how would I do >> this? >> >> so instead of the first row being >> x ? y ?fac char ?group >> 1 ?1 ? C ? ?a ? ? aa_bb >> >> it should be: >> x ?y fac ?char group_a ? ?group_b >> 1 ?1 ? C ? ?a ? ? ?aa ? ? ? ? ? ? bb >> >> >> >> I know for a vector I can: >> x1 <- c("a_b","b_c","c_d") >> do.call("rbind",strsplit(x1, "_")) >> >> but I am not sure how this relates to my data.frame > > The group columns is a factor, as is the default structure for non-numeric > character arguments to dataframe() and cbind.data.frame(). If you want to > the split values you must first convert to character: > >> ddd$group_a <- lapply(strsplit(as.character(ddd$group), "_"), "[", 1) >> ddd$group_b <- lapply(strsplit(as.character(ddd$group), "_"), "[", 2) >> ddd > ? x ?y fac char group group_a group_b > 1 ?1 ?1 ? C ? ?a aa_bb ? ?aa ? ? bb > 2 ?1 ?2 ? B ? ?b bb_cc ? ?bb ? ? cc > 3 ?1 ?3 ? C ? ?c cc_dd ? ?cc ? ? dd > 4 ?1 ?4 ? C ? ?d dd_ee ? ?dd ? ? ee > 5 ?1 ?5 ? B ? ?e ee_ff ? ?ee ? ? ff > 6 ?1 ?6 ? A ? ?f ff_gg ? ?ff ? ? gg > 7 ?1 ?7 ? C ? ?g gg_hh ? ?gg ? ? hh > 8 ?1 ?8 ? A ? ?h ii_jj ? ?ii ? ? jj > 9 ?1 ?9 ? B ? ?i jj_kk ? ?jj ? ? kk > 10 1 10 ? B ? ?j kk_ll ? ?kk ? ? ll > > -- > David. > >-- Daisy Englert Duursma Room E8C156 Dept. Biological Sciences Macquarie University? NSW? 2109 Australia Tel +61 2 9850 9256 Unit 2, 35 Denison St Hornsby, NSW 2077 Mobile: 0421858456
On 10/25/2010 8:56 PM, Daisy Englert Duursma wrote:> Hello, > > If I have a dataframe: > > example(data.frame) > zz<-c("aa_bb","bb_cc","cc_dd","dd_ee","ee_ff","ff_gg","gg_hh","ii_jj","jj_kk","kk_ll") > ddd<- cbind(dd, group = zz) > > and I want to divide the column named group by the "_", how would I do this? > > so instead of the first row being > x y fac char group > 1 1 C a aa_bb > > it should be: > x y fac char group_a group_b > 1 1 C a aa bb > > > > I know for a vector I can: > x1<- c("a_b","b_c","c_d") > do.call("rbind",strsplit(x1, "_")) > > but I am not sure how this relates to my data.frame > > Thanks, > DaisyDaisy, You've already gotten a couple of responses, but I thought I'd point out an different approach (the let-someone-else-deal-with-the-details approach). The reshape package has a colsplit function which is designed just for this sort of thing. library("reshape") ddd <- cbind(ddd, colsplit(ddd$group, split="_", names=c("group_a", "group_b"))) -- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health & Science University