All, I'm sure that this is covered somewhere, but I can't seem to find a good explanation. I have an existing table that contains information grouped by date. This is as so: Day NumberOfCustomers NumberOfComplaints 20060512 10040 40 20060513 32420 11 ... I also have a table at the detail level as so: Day Meal PricePaid UsedCupon 20060512 Fish 14 Y 20060512 Chicken 20 N ... Is there a simple way to create summaries on the detail table and then join them into the first table above so that it looks like this: Day NumberOfCustomers NumberOfComplaints AveragePricePaid NumberUsingCupon I can do a tapply to get what I want from the detail table, but I can't figure out how to turn that into a table and join it back in. Thanks, Josh =============================================================================Please access the attached hyperlink for an important electr...{{dropped}}
On Fri, 2007-01-26 at 12:39 -0500, Kalish, Josh wrote:> All, > > I'm sure that this is covered somewhere, but I can't seem to find a > good explanation. I have an existing table that contains information > grouped by date. This is as so: > > Day NumberOfCustomers NumberOfComplaints > 20060512 10040 40 > 20060513 32420 11 > ... > > > I also have a table at the detail level as so: > > Day Meal PricePaid UsedCupon > 20060512 Fish 14 Y > 20060512 Chicken 20 N > ... > > Is there a simple way to create summaries on the detail table and then > join them into the first table above so that it looks like this: > > Day NumberOfCustomers NumberOfComplaints AveragePricePaid > NumberUsingCupon > > > I can do a tapply to get what I want from the detail table, but I > can't figure out how to turn that into a table and join it back in. > > > > Thanks, > > JoshSkipping the steps of using tapply() or aggregate() to get the summarized data from the second data frame, you would then use merge() to perform a SQL-like 'join' operation:> DF.1Day NumberOfCustomers NumberOfComplaints 1 20060512 10040 40 2 20060513 32420 11> DF.2Day Meal PricePaid UsedCupon 1 20060512 Fish 14 Y 2 20060512 Chicken 20 N> merge(DF.1, DF.2, by = "Day")Day NumberOfCustomers NumberOfComplaints Meal PricePaid 1 20060512 10040 40 Fish 14 2 20060512 10040 40 Chicken 20 UsedCupon 1 Y 2 N By default, only rows matching on the 'by' argument in both data frames will be in the result. See the 'all.x' and 'all.y' arguments to handle other scenarios of including non-matching rows. See ?merge, which BTW: help.search("join") would point you to, if you are familiar with the term from relational data base operations. HTH, Marc Schwartz
Marc, Thanks for pointing out the merge function. That gets me part of the way there. The only thing is that I can't get the tapply() results into a format that merge() will take. For example: merge( set1 , tapply( set2$f1 , set2$commonField, mean ) , by="commonField" ) Gives me "Error in names... Unused arguments..." I'm not sure what the result of a tapply() exactly is, but it doesn't seem to be a table. Yeah, rank amateur questions... Thanks, Josh -----Original Message----- From: Marc Schwartz [mailto:marc_schwartz at comcast.net] Sent: Friday, January 26, 2007 1:08 PM To: Kalish, Josh Cc: 'r-help at stat.math.ethz.ch' Subject: Re: [R] Using tapply to create a new table On Fri, 2007-01-26 at 12:39 -0500, Kalish, Josh wrote:> All, > > I'm sure that this is covered somewhere, but I can't seem to find a > good explanation. I have an existing table that contains information > grouped by date. This is as so: > > Day NumberOfCustomers NumberOfComplaints > 20060512 10040 40 > 20060513 32420 11 > ... > > > I also have a table at the detail level as so: > > Day Meal PricePaid UsedCupon > 20060512 Fish 14 Y > 20060512 Chicken 20 N > ... > > Is there a simple way to create summaries on the detail table and then > join them into the first table above so that it looks like this: > > Day NumberOfCustomers NumberOfComplaints AveragePricePaid > NumberUsingCupon > > > I can do a tapply to get what I want from the detail table, but I > can't figure out how to turn that into a table and join it back in. > > > > Thanks, > > JoshSkipping the steps of using tapply() or aggregate() to get the summarized data from the second data frame, you would then use merge() to perform a SQL-like 'join' operation:> DF.1Day NumberOfCustomers NumberOfComplaints 1 20060512 10040 40 2 20060513 32420 11> DF.2Day Meal PricePaid UsedCupon 1 20060512 Fish 14 Y 2 20060512 Chicken 20 N> merge(DF.1, DF.2, by = "Day")Day NumberOfCustomers NumberOfComplaints Meal PricePaid 1 20060512 10040 40 Fish 14 2 20060512 10040 40 Chicken 20 UsedCupon 1 Y 2 N By default, only rows matching on the 'by' argument in both data frames will be in the result. See the 'all.x' and 'all.y' arguments to handle other scenarios of including non-matching rows. See ?merge, which BTW: help.search("join") would point you to, if you are familiar with the term from relational data base operations. HTH, Marc Schwartz =============================================================================Please access the attached hyperlink for an important electr...{{dropped}}
Ok, Thanks. I wasn't sure what the actual output of a tapply was. -----Original Message----- From: Marc Schwartz [mailto:marc_schwartz at comcast.net] Sent: Friday, January 26, 2007 1:47 PM To: Kalish, Josh Cc: 'r-help at stat.math.ethz.ch' Subject: Re: [R] Using tapply to create a new table Josh, As per the "Value" section of ?tapply, it "returns a single atomic value for each cell". This is easily viewed by using: tapply(set2$f1, set2$commonField, mean) in a stand alone fashion or: str(tapply(set2$f1, set2$commonField, mean)) which will display the internal structure of the result. See ?str To use merge() in the fashion you seem to want, you would want to use aggregate() and not tapply(). The former returns a data frame, where the "key" or "by" values will be part of the output. See ?aggregate for more information. I would also recommend that both for the readability of the code you write and to help clarify for yourself, the objects that are returned from each step, that you not nest the function calls as you have below. There are times when it makes sense, but there are times when the code would end up being a good candidate for an Obfuscated R contest. :-) HTH, Marc On Fri, 2007-01-26 at 13:21 -0500, Kalish, Josh wrote:> Marc, > > Thanks for pointing out the merge function. That gets me part of the > way there. The only thing is that I can't get the tapply() results > into a format that merge() will take. For example: > > merge( set1 , tapply( set2$f1 , set2$commonField, mean ) , > by="commonField" ) > > Gives me "Error in names... Unused arguments..." > > I'm not sure what the result of a tapply() exactly is, but it doesn't > seem to be a table. > > Yeah, rank amateur questions... > > Thanks, > > Josh > > -----Original Message----- > From: Marc Schwartz [mailto:marc_schwartz at comcast.net] > Sent: Friday, January 26, 2007 1:08 PM > To: Kalish, Josh > Cc: 'r-help at stat.math.ethz.ch' > Subject: Re: [R] Using tapply to create a new table > > On Fri, 2007-01-26 at 12:39 -0500, Kalish, Josh wrote: > > All, > > > > I'm sure that this is covered somewhere, but I can't seem to find a > > good explanation. I have an existing table that contains > information > > grouped by date. This is as so: > > > > Day NumberOfCustomers NumberOfComplaints > > 20060512 10040 40 > > 20060513 32420 11 > > ... > > > > > > I also have a table at the detail level as so: > > > > Day Meal PricePaid UsedCupon > > 20060512 Fish 14 Y > > 20060512 Chicken 20 N > > ... > > > > Is there a simple way to create summaries on the detail table and > then > > join them into the first table above so that it looks like this: > > > > Day NumberOfCustomers NumberOfComplaints AveragePricePaid > > NumberUsingCupon > > > > > > I can do a tapply to get what I want from the detail table, but I > > can't figure out how to turn that into a table and join it back in. > > > > > > > > Thanks, > > > > Josh > > Skipping the steps of using tapply() or aggregate() to get the > summarized data from the second data frame, you would then use merge() > to perform a SQL-like 'join' operation: > > > DF.1 > Day NumberOfCustomers NumberOfComplaints > 1 20060512 10040 40 > 2 20060513 32420 11 > > > DF.2 > Day Meal PricePaid UsedCupon > 1 20060512 Fish 14 Y > 2 20060512 Chicken 20 N > > > merge(DF.1, DF.2, by = "Day") > Day NumberOfCustomers NumberOfComplaints Meal PricePaid > 1 20060512 10040 40 Fish 14 > 2 20060512 10040 40 Chicken 20 > UsedCupon > 1 Y > 2 N > > > By default, only rows matching on the 'by' argument in both data > frames will be in the result. See the 'all.x' and 'all.y' arguments to > handle other scenarios of including non-matching rows. > > See ?merge, which BTW: > > help.search("join") > > would point you to, if you are familiar with the term from relational > data base operations. > > HTH, > > Marc Schwartz=============================================================================Please access the attached hyperlink for an important electr...{{dropped}}