Louisell, Paul T.
2003-Nov-06 19:33 UTC
[R] Question about computing offsets automatically
Hi, I'm using R version 1.8.0 on Windows NT. When fitting a glm with Poisson random component and a log link, I frequently need to include an offset. Typically I use xtabs or table to get the counts for the contingency table, and then I use as.data.frame.table to create a data frame that I can use in the glm function. I have not found an option that allows me to total the offset variable to obtain offsets for cells in the contingency table. For example, suppose I have the following data frame named Data: F1 F2 Off 1 A C 4 2 A C 3 3 A C 2 4 B C 3 5 A D 2 6 A D 4 7 B D 1 xtabs(~F1+F2, data=Data) produces the contingency table: F2 F1 C D A 3 2 B 1 1 And as.data.frame.table(xtabs(~F1+F2, data=Data)) changes the contingency table to a data frame suitable for use in the glm function: F1 F2 Freq 1 A C 3 2 B C 1 3 A D 2 4 B D 1 What I'm looking for is some option that would add a 4th column to the output of as.data.frame.table which contains the offsets for each cell in the contingency table: F1 F2 Freq Off 1 A C 3 9 2 B C 1 3 3 A D 2 6 4 B D 1 1 Does such an option exist somewhere in R (I wasn't able to find it in the documentation for the table, xtabs, as.data.frame.table, or glm functions)? I can obtain the Off column easily enough in a simple loop, but I thought there might be an option for this somewhere. Paul Louisell Statistician (860) 565-5417 louisept at pweh.com
On Thu, 2003-11-06 at 13:33, Louisell, Paul T. wrote:> Hi, > > I'm using R version 1.8.0 on Windows NT. When fitting a glm with Poisson > random component and a log link, I frequently need to include an offset. > Typically I use xtabs or table to get the counts for the contingency table, > and then I use as.data.frame.table to create a data frame that I can use in > the glm function. I have not found an option that allows me to total the > offset variable to obtain offsets for cells in the contingency table. > > For example, suppose I have the following data frame named Data: > > F1 F2 Off > 1 A C 4 > 2 A C 3 > 3 A C 2 > 4 B C 3 > 5 A D 2 > 6 A D 4 > 7 B D 1 > > xtabs(~F1+F2, data=Data) produces the contingency table: > > F2 > F1 C D > A 3 2 > B 1 1 > > And as.data.frame.table(xtabs(~F1+F2, data=Data)) changes the contingency > table to a data frame suitable for use in the glm function: > > F1 F2 Freq > 1 A C 3 > 2 B C 1 > 3 A D 2 > 4 B D 1 > > What I'm looking for is some option that would add a 4th column to the > output of as.data.frame.table which contains the offsets for each cell in > the contingency table: > > F1 F2 Freq Off > 1 A C 3 9 > 2 B C 1 3 > 3 A D 2 6 > 4 B D 1 1 > > Does such an option exist somewhere in R (I wasn't able to find it in the > documentation for the table, xtabs, as.data.frame.table, or glm functions)? > I can obtain the Off column easily enough in a simple loop, but I thought > there might be an option for this somewhere.I don't know of an easy 'option' approach, but you can use aggregate() to get the sums and then do a cbind() to add the fourth column:> aggregate(Data$Off, list(F1 = Data$F1, F2 = Data$F2), sum)F1 F2 x 1 A C 9 2 B C 3 3 A D 6 4 B D 1 So:> df <- as.data.frame.table(xtabs(~F1+F2, data = Data)) > dfF1 F2 Freq 1 A C 3 2 B C 1 3 A D 2 4 B D 1> Off <- aggregate(Data$Off, list(F1 = Data$F1, F2 = Data$F2), sum)$x > Off[1] 9 3 6 1> cbind(df, Off)F1 F2 Freq Off 1 A C 3 9 2 B C 1 3 3 A D 2 6 4 B D 1 1 HTH, Marc Schwartz