Kirsten,
this may not be the most elegant approach, but I think I would create
dummy variables on the fly for each disease code you have, then use the
"with" function to get means.
Give this a try:
# setup some fake data as an example
icd9 <- c(rep("dis1",4), rep("dis2",5),
rep("dis3",3))
n <- length(icd9)
exposure <- rnorm(n)
working <- data.frame(icd9, exposure)
# get some useful things together
numDis <- length(levels(working$icd9))
diseaseCodes <- levels(working$icd9)
# capture results in here
results <- data.frame(ICD9code=diseaseCodes, ave_exposure=rep(NA,
numDis), ICD9count=rep(NA, numDis), other_ave=rep(NA, numDis),
other_Ct=rep(NA, numDis))
for(k in 1:numDis){
working$indicator <- ifelse(working$icd9==diseaseCodes[k],
"Yes",
"No")
xbars <- with(working, tapply(exposure, list(indicator), mean))
# Summary Statistics for Groups in Dataframes
counts <- table(working$indicator)
results[k,] <- c(diseaseCodes[k], xbars["Yes"],
counts["Yes"],
xbars["No"], counts["No"])
}
Andrea Johnson
Biostatistician, Roche Molecular Systems, Inc.
[[alternative HTML version deleted]]