Kirsten, this may not be the most elegant approach, but I think I would create dummy variables on the fly for each disease code you have, then use the "with" function to get means. Give this a try: # setup some fake data as an example icd9 <- c(rep("dis1",4), rep("dis2",5), rep("dis3",3)) n <- length(icd9) exposure <- rnorm(n) working <- data.frame(icd9, exposure) # get some useful things together numDis <- length(levels(working$icd9)) diseaseCodes <- levels(working$icd9) # capture results in here results <- data.frame(ICD9code=diseaseCodes, ave_exposure=rep(NA, numDis), ICD9count=rep(NA, numDis), other_ave=rep(NA, numDis), other_Ct=rep(NA, numDis)) for(k in 1:numDis){ working$indicator <- ifelse(working$icd9==diseaseCodes[k], "Yes", "No") xbars <- with(working, tapply(exposure, list(indicator), mean)) # Summary Statistics for Groups in Dataframes counts <- table(working$indicator) results[k,] <- c(diseaseCodes[k], xbars["Yes"], counts["Yes"], xbars["No"], counts["No"]) } Andrea Johnson Biostatistician, Roche Molecular Systems, Inc. [[alternative HTML version deleted]]