Hi:
You could try something like this:
For illustration, I'll use a data frame that was presented in a recent post
to the ggplot2 group. The poster wanted regressions by individual, but you
can add more than one grouping variable to the code I show below. It uses
the plyr package.
library(plyr)
ds_test <- structure(list(individual = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L,
8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L), .Label = c("1",
"2", "3", "4", "5", "6",
"7",
"8", "9", "10", "11", "12",
"13",
"14", "15", "16", "17", "18",
"19",
"20", "21", "22", "23", "24",
"25",
"26", "27", "28", "29", "30"),
class = "factor"),
time = c(0L, 1671L, 1896L, 0L, 105L, 196L, 384L, 582L, 797L,
998L, 1419L, 0L, 290L, 451L, 752L, 0L, 487L, 619L, 820L,
0L, 384L, 463L, 832L, 932L, 1322L, 1688L, 0L, 101L, 390L,
0L, 746L, 761L, 899L, 1118L, 1236L, 1375L, 0L, 544L, 870L,
927L, 1117L, 1870L, 0L, 326L, 383L, 573L, 1326L, 0L, 1572L,
1592L), size = c(2, 2.6, 2.6, 1.2, 1.4, 1.5,
1.6, 1.7, 1.8, 2, 2.2, 1.3, 1.6, 1.5, 1.5, 2.8, 2.8, 2.4,
2.9, 2.1, 2.4, 2.4, 2.4, 2.3, 2.5, 2.4, 6, 5.8, 5.4, 1.1,
1.6, 1.5, 1.5, 1.5, 2.3, 2.3, 3.2, 4.1, 4, 3.9, 4.1, 4.3,
1.2, 2.1, 2.2, 2.2, 3, 2.2, 3, 3.9)), .Names = c("individual",
"time",
"size"), row.names = c(NA, 50L), class = "data.frame")
# Run models by individual and put the results into a list. The advantage
# is that one can extract multiple pieces from each component of the list,
# if so desired, by writing simple extraction functions using plyr. dlply()
is an
# apply-like function: the first letter indicates that the input object
(first
# argument) is a data frame and that the output object after executing
# the function is a list (in this case, a list of lists). The anonymous
function
# in the call performs the desired operation on each generic data subset x.
mods <- dlply(ds_test, .(individual), function(x) lm(size ~ time, data = x))
# This function does the actual work within subgroup; since the number
# of residuals will vary from group to group, the output of the calling
# function has to be a list object of residuals, one component per
individual.
# The outer function do.call() is intended to collapse the list object into
a
# vector, and the resulting vector can be attached to the original data
frame
# with $:
res <- function(x) resid(x)
ds_test$u <- do.call(c, llply(mods, res))
In your case, where you have multiple grouping factors, you may have to be a
little more careful, but the strategy is the same. You could possibly reduce
it to a one-liner (untested):
ds_test$u <- do.call(c, dlply(ds_test, .(individual), function(x)
resid(lm(size ~ time, data = x))))
HTH,
Dennis
On Wed, Nov 24, 2010 at 4:56 PM, Ray Zhang <lza11@sfu.ca> wrote:
>
> Hi there,
>
> I have a huge data set with multiple firms years and other firm
> characteristics. I want to run a regression on the dependent variable and
> other explanatory variables and calculate the residual terms by grouping
the
> firms in same year and same industry.
>
> What I want to do is to divide my obseravtion into sub sample that contains
> the observation with same fiscal year(FYEAR=1990) and same firm
> characteristic (Industry =1) and run the regression and put the residual
> back to the observation by creating a new column. I want to do that for
> multiple years and multiple firms. I wonder is that any easy command with
> out creating multiple loops?
>
> Ray
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]