I would be incredibly grateful to anyone who'll help me translate some SAS code into R code. Say for example that I have a dataset named "dat1" that includes five variables: wshed, site, species, bda, and sla. I can calculate with the following SAS code the mean, CV, se, and number of observations of "bda" and "sla" for each combination of "wshed," "species," and "site," restricting the species considered to only three of several species in dat1 (b, c, and p). Moreover, I can output these calculations and grouping variables to a dataset named "dat2" that will reside in RAM and include the variables wshed, site, species, mBdA, msla, cBda, sBdA, ssla, nBda, and nsla. proc sort data=dat1; by wshed site species; proc means data=dat1 noprint mean cv stderr n; by wshed site species; where species in ('b', 'c', 'p'); var BdA sla; output out=dat2 mean=mBdA msla cv=cBdA csla stderr=sBdA ssla n=nBdA nsla; Thanks, Greg
Here's one way... Not tested, so there maybe typos and such, but I've used this approach successfully quite a few times. It can get kind of slow if dat1 has many, many rows. The coding assumes no missing data, though that could be handled by adding the na.rm argument in apppropriate places, and changing the nrow() to something that counts only non-missing data. myfun <- function(dfr) { data.frame( wshed=dfr$wshed[1], site=dfr$site[1], species=dfr$species[1], mBda=mean(dfr$BdA), cBda=sd(dfr$Bda)/mean(dfr$Bda), sBda=sd(dfr$Bda)/sqrt(nrow(dfr)), nBda=nrow(dfr), msla=mean(dfr$BdA), csla=sd(dfr$sla)/mean(dfr$sla), ssla=sd(dfr$sla)/sqrt(nrow(dfr)), nsla=nrow(dfr)) } tmp1 <- split(dat1,paste(dat1$wshed,dat1$site,dat1$species)) tmp2 <- lapply(tmp1,myfun) dat2 <- do.call('rbind',tmp2) -Don At 6:18 PM -0400 7/16/04, Greg Adkison wrote:>I would be incredibly grateful to anyone who'll help me translate some >SAS code into R code. > >Say for example that I have a dataset named "dat1" that includes five >variables: wshed, site, species, bda, and sla. I can calculate with the >following SAS code the mean, CV, se, and number of observations of >"bda" and "sla" for each combination of "wshed," "species," and "site," >restricting the species considered to only three of several species in >dat1 (b, c, and p). Moreover, I can output these calculations and >grouping variables to a dataset named "dat2" that will reside in RAM >and include the variables wshed, site, species, mBdA, msla, cBda, >sBdA, ssla, nBda, and nsla. > >proc sort data=dat1; > by wshed site species; >proc means data=dat1 noprint mean cv stderr n; > by wshed site species; > where species in ('b', 'c', 'p'); > var BdA sla; > output out=dat2 > mean=mBdA msla > cv=cBdA csla > stderr=sBdA ssla > n=nBdA nsla; > >Thanks, >Greg > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://www.stat.math.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html-- -------------------------------------- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA
On Fri, 2004-07-16 at 23:18, Greg Adkison wrote:> I would be incredibly grateful to anyone who'll help me translate some > SAS code into R code.Searching for "SAS code OR script OR translate" on http://maths.newcastle.edu.au/~rking/R/ gives a few results, one of which looks promising is http://tolstoy.newcastle.edu.au/R/help/04/04/0009.html and http://tolstoy.newcastle.edu.au/R/help/04/02/0660.html> Say for example that I have a dataset named "dat1" that includes five > variables: wshed, site, species, bda, and sla. I can calculate with the > following SAS code the mean, CV, se, and number of observations of > "bda" and "sla" for each combination of "wshed," "species," and "site," > restricting the species considered to only three of several species in > dat1 (b, c, and p). Moreover, I can output these calculations and > grouping variables to a dataset named "dat2" that will reside in RAM > and include the variables wshed, site, species, mBdA, msla, cBda, > sBdA, ssla, nBda, and nsla.data(iris) attach(iris) iris[c(1,2,51,52,101,102), ] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa ... 51 7.0 3.2 4.7 1.4 versicolor 52 6.4 3.2 4.5 1.5 versicolor ... 101 6.3 3.3 6.0 2.5 virginica 102 5.8 2.7 5.1 1.9 virginica> tapply(Sepal.Length, Species, function(x) c( mean(x), sd(x)/mean(x),length(x) )) $setosa [1] 5.00600000 0.07041344 50.00000000 $versicolor [1] 5.93600000 0.08695606 50.00000000 $virginica [1] 6.58800000 0.09652089 50.00000000> proc sort data=dat1; > by wshed site species; > proc means data=dat1 noprint mean cv stderr n; > by wshed site species; > where species in ('b', 'c', 'p'); > var BdA sla; > output out=dat2 > mean=mBdA msla > cv=cBdA csla > stderr=sBdA ssla > n=nBdA nsla; > > Thanks, > Greg > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
Greg Adkison wrote:> I would be incredibly grateful to anyone who'll help me translate some > SAS code into R code. > > Say for example that I have a dataset named "dat1" that includes five > variables: wshed, site, species, bda, and sla. I can calculate with the > following SAS code the mean, CV, se, and number of observations of > "bda" and "sla" for each combination of "wshed," "species," and "site," > restricting the species considered to only three of several species in > dat1 (b, c, and p). Moreover, I can output these calculations and > grouping variables to a dataset named "dat2" that will reside in RAM > and include the variables wshed, site, species, mBdA, msla, cBda, > sBdA, ssla, nBda, and nsla. > > proc sort data=dat1; > by wshed site species; > proc means data=dat1 noprint mean cv stderr n; > by wshed site species; > where species in ('b', 'c', 'p'); > var BdA sla; > output out=dat2 > mean=mBdA msla > cv=cBdA csla > stderr=sBdA ssla > n=nBdA nsla; > > Thanks, > GregThe following handles any number of analysis variables, with automatic naming of all statistics computed from them. It requires the Hmisc package. # Generate some data. Put one NA in sla. set.seed(1) dat1 <- expand.grid(wshed=1:2, site=c('A','B'), species=c('a','b','c','p'), reps=1:10) n <- nrow(dat1) dat1 <- transform(dat1, BdA = rnorm(n, 100, 20), sla = c(rnorm(n-1, 200, 30), NA)) # Can use upData function in Hmisc in place of transform # Summarization function, per stratum, for a matrix of analysis # variables g <- function(y) { n <- apply(y, 2, function(z) sum(!is.na(z))) m <- apply(y, 2, mean, na.rm=TRUE) s <- apply(y, 2, sd, na.rm=TRUE) cv <- s/m se <- s/sqrt(n) w <- c(m, cv, se, n) names(w) <- t(outer(c('m','c','s','n'), colnames(y), paste, sep='')) w } library(Hmisc) dat2 <- with(dat1, summarize(cbind(BdA, sla), llist(wshed, site, species), g, subset=species %in% c('b','c','p'), stat.name='mBdA') ) options(digits=3) dat2 # is a data frame wshed site species mBdA msla cBdA csla sBdA ssla nBdA nsla 1 1 A b 100.5 195 0.133 0.1813 4.23 11.20 10 10 2 1 A c 99.7 206 0.101 0.1024 3.17 6.68 10 10 3 1 A p 101.4 188 0.239 0.1580 7.65 9.39 10 10 4 1 B b 109.9 203 0.118 0.1433 4.09 9.21 10 10 5 1 B c 98.4 221 0.193 0.1250 6.01 8.72 10 10 6 1 B p 102.9 203 0.216 0.1446 7.03 9.29 10 10 7 2 A b 95.8 195 0.241 0.2011 7.31 12.40 10 10 8 2 A c 98.7 207 0.194 0.1274 6.04 8.33 10 10 9 2 A p 102.2 191 0.217 0.1709 7.01 10.31 10 10 10 2 B b 97.8 191 0.235 0.2079 7.27 12.58 10 10 11 2 B c 100.9 194 0.164 0.0987 5.24 6.07 10 10 12 2 B p 103.0 209 0.144 0.0769 4.69 5.35 10 9 -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University