O'Hanlon, Simon J
2012-Mar-06  16:06 UTC
[R] Label rows of table by factor level for groups of factors
Dear useRs, I am sure this is a fairly simple problem, but I just cannot get my head around it. I have a dataframe which contains several factor variables. I can use table() to tell me how many different combinations there are of these variables. What I should like to do is to add a column to my original dataframe which labels each row according to the unique combination of factors. E.g. in the simple example below I create a dataframe 'df' with 3 columns, the values of which take 0 or 1. I can then classify each row in the table and I find that I have 4 unique combinations of factors. I would now like to add a fourth column to df which labels each row according to whether it was unique combination 1,2,3 or 4: x1=c(rep(0:1,6)) x2=c(rep(c(1,1,0,0)6)) x3=c(rep(1,6),rep(0,6)) df=data.frame(x1,x2,x3) tabledf=as.data.frame(with(df, table(x1,x2,x3))) res=c(3,4,3,4,3,4,1,2,1,2,1,2) desired=data.frame(x1,x2,x3,res) df tabledf desired I realise that this is probably quite simple to do, I am just struggling to get my head around it! Help much appreciated in advance. Cheers, Simon -------------------------------- Simon O'Hanlon BSc, MSc Department of Infectious Disease Epidemiology Imperial College London St. Mary's Hospital London W2 1PG
Sarah Goslee
2012-Mar-06  18:16 UTC
[R] Label rows of table by factor level for groups of factors
One possible approach is to use unique() to get the list of distinct combinations, cbind() an identifying variable to that list, then use merge() to join it to your existing data frame. But I'm not seeing how you are getting four unique combinations. Given your sample data (with the missing comma replaced):> dim(tabledf)[1] 8 4> head(desired)x1 x2 x3 res 1 0 1 1 3 2 1 1 1 4 3 0 0 1 3 4 1 0 1 4 5 0 1 1 3 6 1 1 1 4 tabledf has 8 rows, not 4, and I don't see how rows 1 and 3 or rows 2 and 4 of your desired df should get the same classification. Regardless, if you can make a data frame like tabledf with an additional column for your desired res variable, you can merge() it with your original data frame. Sarah On Tue, Mar 6, 2012 at 11:06 AM, O'Hanlon, Simon J <simon.ohanlon at imperial.ac.uk> wrote:> Dear useRs, > I am sure this is a fairly simple problem, but I just cannot get my head around it. > > > I have a dataframe which contains several factor variables. I can use table() to tell me how many different combinations there are of these variables. What I should like to do is to add a column to my original dataframe which labels each row according to the unique combination of factors. > > > E.g. in the simple example below I create a dataframe 'df' with 3 columns, the values of which take 0 or 1. I can then classify each row in the table and I find that I have 4 unique combinations of factors. I would now like to add a fourth column to df which labels each row according to whether it was unique combination 1,2,3 or 4: > > x1=c(rep(0:1,6)) > x2=c(rep(c(1,1,0,0)6)) > x3=c(rep(1,6),rep(0,6)) > df=data.frame(x1,x2,x3) > tabledf=as.data.frame(with(df, table(x1,x2,x3))) > res=c(3,4,3,4,3,4,1,2,1,2,1,2) > desired=data.frame(x1,x2,x3,res) > df > tabledf > desired > > > I realise that this is probably quite simple to do, I am just struggling to get my head around it! Help much appreciated in advance. >-- Sarah Goslee http://www.functionaldiversity.org
Sarah Goslee
2012-Mar-06  18:27 UTC
[R] Label rows of table by factor level for groups of factors
Well, if you can get this to run your version of R is markedly\ different than mine.> #Start of code > > x1=c(rep(0:1,6)) > x2=c(rep(c(1,1,0,0)6))Error: unexpected numeric constant in "x2=c(rep(c(1,1,0,0)6"> x3=c(rep(1,6),rep(0,6))On Tue, Mar 6, 2012 at 1:23 PM, O'Hanlon, Simon J <simon.ohanlon at imperial.ac.uk> wrote:> Hi Sarah, > Thanks a lot for your suggestion. I'll give it a go if I can (I just spent the last 3 hours using unique record filtering and vlookups in Excel to achieve what I'm sure can be accomplished in 3 or 4 lines of R code!). > > I think you might want to run the sample code again though. I just tried it (and there was no missing comma) and I get: > > ? x1 x2 x3 > 1 ? 0 ?1 ?1 > 2 ? 1 ?1 ?1 > 3 ? 0 ?1 ?1 > 4 ? 1 ?1 ?1 > 5 ? 0 ?1 ?1 > 6 ? 1 ?1 ?1 > 7 ? 0 ?1 ?0 > 8 ? 1 ?1 ?0 > 9 ? 0 ?1 ?0 > 10 ?1 ?1 ?0 > 11 ?0 ?1 ?0 > 12 ?1 ?1 ?0 >> tabledf > ?x1 x2 x3 Freq > 1 ?0 ?1 ?0 ? ?3 > 2 ?1 ?1 ?0 ? ?3 > 3 ?0 ?1 ?1 ? ?3 > 4 ?1 ?1 ?1 ? ?3 >> desired > ? x1 x2 x3 res > 1 ? 0 ?1 ?1 ? 3 > 2 ? 1 ?1 ?1 ? 4 > 3 ? 0 ?1 ?1 ? 3 > 4 ? 1 ?1 ?1 ? 4 > 5 ? 0 ?1 ?1 ? 3 > 6 ? 1 ?1 ?1 ? 4 > 7 ? 0 ?1 ?0 ? 1 > 8 ? 1 ?1 ?0 ? 2 > 9 ? 0 ?1 ?0 ? 1 > 10 ?1 ?1 ?0 ? 2 > 11 ?0 ?1 ?0 ? 1 > 12 ?1 ?1 ?0 ? 2 >> nrow(tabledf) > [1] 4 >> dim(tabledf) > [1] 4 4 > > #Start of code > > x1=c(rep(0:1,6)) > x2=c(rep(c(1,1,0,0)6)) > x3=c(rep(1,6),rep(0,6)) > df=data.frame(x1,x2,x3) > tabledf=as.data.frame(with(df, table(x1,x2,x3))) > res=c(3,4,3,4,3,4,1,2,1,2,1,2) > desired=data.frame(x1,x2,x3,res) > df > tabledf > desired > > #End of code > > Cheers! > > Simon > > -------------------------------- > Simon O'Hanlon, BSc MSc > Department of Infectious Disease Epidemiology > Imperial College London > St. Mary's Hospital > London > W2 1PG > ________________________________________ > From: Sarah Goslee [sarah.goslee at gmail.com] > Sent: 06 March 2012 18:16 > To: O'Hanlon, Simon J > Cc: r-help at R-project.org > Subject: Re: [R] Label rows of table by factor level for groups of factors > > One possible approach is to use unique() to get the list of distinct > combinations, cbind() an identifying variable to that list, then use > merge() to join it to your existing data frame. > > But I'm not seeing how you are getting four unique combinations. > Given your sample data (with the missing comma replaced): >> dim(tabledf) > [1] 8 4 >> head(desired) > ?x1 x2 x3 res > 1 ?0 ?1 ?1 ? 3 > 2 ?1 ?1 ?1 ? 4 > 3 ?0 ?0 ?1 ? 3 > 4 ?1 ?0 ?1 ? 4 > 5 ?0 ?1 ?1 ? 3 > 6 ?1 ?1 ?1 ? 4 > > tabledf has 8 rows, not 4, and I don't see how rows 1 and 3 > or rows 2 and 4 of your desired df should get the same > classification. > > Regardless, if you can make a data frame like tabledf with > an additional column for your desired res variable, you can > merge() it with your original data frame. > > Sarah > > On Tue, Mar 6, 2012 at 11:06 AM, O'Hanlon, Simon J > <simon.ohanlon at imperial.ac.uk> wrote: >> Dear useRs, >> I am sure this is a fairly simple problem, but I just cannot get my head around it. >> >> >> I have a dataframe which contains several factor variables. I can use table() to tell me how many different combinations there are of these variables. What I should like to do is to add a column to my original dataframe which labels each row according to the unique combination of factors. >> >> >> E.g. in the simple example below I create a dataframe 'df' with 3 columns, the values of which take 0 or 1. I can then classify each row in the table and I find that I have 4 unique combinations of factors. I would now like to add a fourth column to df which labels each row according to whether it was unique combination 1,2,3 or 4: >> >> x1=c(rep(0:1,6)) >> x2=c(rep(c(1,1,0,0)6)) >> x3=c(rep(1,6),rep(0,6)) >> df=data.frame(x1,x2,x3) >> tabledf=as.data.frame(with(df, table(x1,x2,x3))) >> res=c(3,4,3,4,3,4,1,2,1,2,1,2) >> desired=data.frame(x1,x2,x3,res) >> df >> tabledf >> desired >> >> >> I realise that this is probably quite simple to do, I am just struggling to get my head around it! Help much appreciated in advance. >>
Sarah Goslee
2012-Mar-06  18:44 UTC
[R] Label rows of table by factor level for groups of factors
On Tue, Mar 6, 2012 at 1:32 PM, O'Hanlon, Simon J <simon.ohanlon at imperial.ac.uk> wrote:> Ah! > > Thanks. > > I had already made vector x2 previously and then went and changed it for some reason, which was why I didn't notice the error (because the subsequent code was able to run regardless). Sorry about that. > > so x2 should have read x2=c(rep(1,12)) which is what I originally had and what I was basing my plea for help on.That would explain the difference in results. Regardless, the method I suggested should work. x1=c(rep(0:1,6)) x2=c(rep(1,12)) x3=c(rep(1,6),rep(0,6)) df=data.frame(x1,x2,x3) tabledf=as.data.frame(with(df, table(x1,x2,x3))) tabledf <- cbind(tabledf, res=1:nrow(tabledf)) newdf <- merge(df, tabledf) Note that row order is not preserved; if you need that you can add an id column to df before merging and sort on it after. Please notice also that I've been copying the R-help list on my replies, so that other people who either have similar questions or might be moved to help can see what we've been discussing. Sarah> ________________________________________ > From: Sarah Goslee [sarah.goslee at gmail.com] > Sent: 06 March 2012 18:27 > To: O'Hanlon, Simon J; r-help > Subject: Re: [R] Label rows of table by factor level for groups of factors > > Well, if you can get this to run your version of R is markedly\ > different than mine. > >> #Start of code >> >> x1=c(rep(0:1,6)) >> x2=c(rep(c(1,1,0,0)6)) > Error: unexpected numeric constant in "x2=c(rep(c(1,1,0,0)6" >> x3=c(rep(1,6),rep(0,6)) > > > > On Tue, Mar 6, 2012 at 1:23 PM, O'Hanlon, Simon J > <simon.ohanlon at imperial.ac.uk> wrote: >> Hi Sarah, >> Thanks a lot for your suggestion. I'll give it a go if I can (I just spent the last 3 hours using unique record filtering and vlookups in Excel to achieve what I'm sure can be accomplished in 3 or 4 lines of R code!). >> >> I think you might want to run the sample code again though. I just tried it (and there was no missing comma) and I get: >> >> ? x1 x2 x3 >> 1 ? 0 ?1 ?1 >> 2 ? 1 ?1 ?1 >> 3 ? 0 ?1 ?1 >> 4 ? 1 ?1 ?1 >> 5 ? 0 ?1 ?1 >> 6 ? 1 ?1 ?1 >> 7 ? 0 ?1 ?0 >> 8 ? 1 ?1 ?0 >> 9 ? 0 ?1 ?0 >> 10 ?1 ?1 ?0 >> 11 ?0 ?1 ?0 >> 12 ?1 ?1 ?0 >>> tabledf >> ?x1 x2 x3 Freq >> 1 ?0 ?1 ?0 ? ?3 >> 2 ?1 ?1 ?0 ? ?3 >> 3 ?0 ?1 ?1 ? ?3 >> 4 ?1 ?1 ?1 ? ?3 >>> desired >> ? x1 x2 x3 res >> 1 ? 0 ?1 ?1 ? 3 >> 2 ? 1 ?1 ?1 ? 4 >> 3 ? 0 ?1 ?1 ? 3 >> 4 ? 1 ?1 ?1 ? 4 >> 5 ? 0 ?1 ?1 ? 3 >> 6 ? 1 ?1 ?1 ? 4 >> 7 ? 0 ?1 ?0 ? 1 >> 8 ? 1 ?1 ?0 ? 2 >> 9 ? 0 ?1 ?0 ? 1 >> 10 ?1 ?1 ?0 ? 2 >> 11 ?0 ?1 ?0 ? 1 >> 12 ?1 ?1 ?0 ? 2 >>> nrow(tabledf) >> [1] 4 >>> dim(tabledf) >> [1] 4 4 >> >> #Start of code >> >> x1=c(rep(0:1,6)) >> x2=c(rep(c(1,1,0,0)6)) >> x3=c(rep(1,6),rep(0,6)) >> df=data.frame(x1,x2,x3) >> tabledf=as.data.frame(with(df, table(x1,x2,x3))) >> res=c(3,4,3,4,3,4,1,2,1,2,1,2) >> desired=data.frame(x1,x2,x3,res) >> df >> tabledf >> desired >> >> #End of code >> >> Cheers! >> >> Simon >> >> -------------------------------- >> Simon O'Hanlon, BSc MSc >> Department of Infectious Disease Epidemiology >> Imperial College London >> St. Mary's Hospital >> London >> W2 1PG >> ________________________________________ >> From: Sarah Goslee [sarah.goslee at gmail.com] >> Sent: 06 March 2012 18:16 >> To: O'Hanlon, Simon J >> Cc: r-help at R-project.org >> Subject: Re: [R] Label rows of table by factor level for groups of factors >> >> One possible approach is to use unique() to get the list of distinct >> combinations, cbind() an identifying variable to that list, then use >> merge() to join it to your existing data frame. >> >> But I'm not seeing how you are getting four unique combinations. >> Given your sample data (with the missing comma replaced): >>> dim(tabledf) >> [1] 8 4 >>> head(desired) >> ?x1 x2 x3 res >> 1 ?0 ?1 ?1 ? 3 >> 2 ?1 ?1 ?1 ? 4 >> 3 ?0 ?0 ?1 ? 3 >> 4 ?1 ?0 ?1 ? 4 >> 5 ?0 ?1 ?1 ? 3 >> 6 ?1 ?1 ?1 ? 4 >> >> tabledf has 8 rows, not 4, and I don't see how rows 1 and 3 >> or rows 2 and 4 of your desired df should get the same >> classification. >> >> Regardless, if you can make a data frame like tabledf with >> an additional column for your desired res variable, you can >> merge() it with your original data frame. >> >> Sarah >> >> On Tue, Mar 6, 2012 at 11:06 AM, O'Hanlon, Simon J >> <simon.ohanlon at imperial.ac.uk> wrote: >>> Dear useRs, >>> I am sure this is a fairly simple problem, but I just cannot get my head around it. >>> >>> >>> I have a dataframe which contains several factor variables. I can use table() to tell me how many different combinations there are of these variables. What I should like to do is to add a column to my original dataframe which labels each row according to the unique combination of factors. >>> >>> >>> E.g. in the simple example below I create a dataframe 'df' with 3 columns, the values of which take 0 or 1. I can then classify each row in the table and I find that I have 4 unique combinations of factors. I would now like to add a fourth column to df which labels each row according to whether it was unique combination 1,2,3 or 4: >>> >>> x1=c(rep(0:1,6)) >>> x2=c(rep(c(1,1,0,0)6)) >>> x3=c(rep(1,6),rep(0,6)) >>> df=data.frame(x1,x2,x3) >>> tabledf=as.data.frame(with(df, table(x1,x2,x3))) >>> res=c(3,4,3,4,3,4,1,2,1,2,1,2) >>> desired=data.frame(x1,x2,x3,res) >>> df >>> tabledf >>> desired >>> >>> >>> I realise that this is probably quite simple to do, I am just struggling to get my head around it! Help much appreciated in advance. >>>