thr3ads.net - R help - [R] sas to r [Jul 2004]

If this information is useful, please help other people find it:
Share via:

Greg Adkison

2004-Jul-16 22:18 UTC

[R] sas to r

I would be incredibly grateful to anyone who'll help me translate some 
SAS code into R code.

Say for example that I have a dataset named "dat1" that includes five 
variables:  wshed, site, species, bda, and sla.  I can calculate with the 
following SAS code the mean, CV, se, and number of observations of 
"bda" and "sla" for each combination of "wshed,"
"species," and "site,"
restricting the species considered to only three of several species in 
dat1 (b, c, and p).  Moreover, I can output these calculations and 
grouping variables to a dataset named "dat2" that will reside in RAM 
and include the variables  wshed, site, species, mBdA, msla, cBda, 
sBdA, ssla, nBda, and nsla.

proc sort data=dat1;
  by wshed site species;
proc means data=dat1 noprint mean cv stderr n;
  by wshed site species;
  where species in ('b', 'c', 'p');
  var BdA sla;
  output out=dat2
    mean=mBdA msla
    cv=cBdA csla
    stderr=sBdA ssla
    n=nBdA nsla;

Thanks,
Greg

Don MacQueen

2004-Jul-16 22:58 UTC

head link

[R] sas to r

Here's one way...

Not tested, so there maybe typos and such, but I've used this 
approach successfully quite a few times.

It can get kind of slow if dat1 has many, many rows.
The coding assumes no missing data, though that could be handled by 
adding the na.rm argument in apppropriate places, and changing the 
nrow() to something that counts only non-missing data.

myfun <- function(dfr) {
   data.frame(
              wshed=dfr$wshed[1],
              site=dfr$site[1],
              species=dfr$species[1],
              mBda=mean(dfr$BdA),
              cBda=sd(dfr$Bda)/mean(dfr$Bda),
              sBda=sd(dfr$Bda)/sqrt(nrow(dfr)),
              nBda=nrow(dfr),
              msla=mean(dfr$BdA),
              csla=sd(dfr$sla)/mean(dfr$sla),
              ssla=sd(dfr$sla)/sqrt(nrow(dfr)),
              nsla=nrow(dfr))
}

tmp1 <- split(dat1,paste(dat1$wshed,dat1$site,dat1$species))
tmp2 <- lapply(tmp1,myfun)
dat2 <- do.call('rbind',tmp2)

-Don

At 6:18 PM -0400 7/16/04, Greg Adkison wrote:>I would be incredibly grateful to anyone who'll help me translate some
>SAS code into R code.
>
>Say for example that I have a dataset named "dat1" that includes
five
>variables:  wshed, site, species, bda, and sla.  I can calculate with the
>following SAS code the mean, CV, se, and number of observations of
>"bda" and "sla" for each combination of
"wshed," "species," and "site,"
>restricting the species considered to only three of several species in
>dat1 (b, c, and p).  Moreover, I can output these calculations and
>grouping variables to a dataset named "dat2" that will reside in
RAM
>and include the variables  wshed, site, species, mBdA, msla, cBda,
>sBdA, ssla, nBda, and nsla.
>
>proc sort data=dat1;
>   by wshed site species;
>proc means data=dat1 noprint mean cv stderr n;
>   by wshed site species;
>   where species in ('b', 'c', 'p');
>   var BdA sla;
>   output out=dat2
>     mean=mBdA msla
>     cv=cBdA csla
>     stderr=sBdA ssla
>     n=nBdA nsla;
>
>Thanks,
>Greg
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

-- 
--------------------------------------
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA

Adaikalavan Ramasamy

2004-Jul-16 23:10 UTC

head link

[R] sas to r

On Fri, 2004-07-16 at 23:18, Greg Adkison wrote:> I would be incredibly grateful to anyone who'll help me translate some 
> SAS code into R code.
Searching for "SAS code OR script OR translate" on
http://maths.newcastle.edu.au/~rking/R/ gives a few results, one of
which looks promising is
http://tolstoy.newcastle.edu.au/R/help/04/04/0009.html and
http://tolstoy.newcastle.edu.au/R/help/04/02/0660.html
> Say for example that I have a dataset named "dat1" that includes
five
> variables:  wshed, site, species, bda, and sla.  I can calculate with the 
> following SAS code the mean, CV, se, and number of observations of 
> "bda" and "sla" for each combination of
"wshed," "species," and "site,"
> restricting the species considered to only three of several species in 
> dat1 (b, c, and p).  Moreover, I can output these calculations and 
> grouping variables to a dataset named "dat2" that will reside in
RAM
> and include the variables  wshed, site, species, mBdA, msla, cBda, 
> sBdA, ssla, nBda, and nsla.
data(iris)
attach(iris)
iris[c(1,2,51,52,101,102), ]
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1            5.1         3.5          1.4         0.2     setosa
2            4.9         3.0          1.4         0.2     setosa
...
51           7.0         3.2          4.7         1.4 versicolor
52           6.4         3.2          4.5         1.5 versicolor
...
101          6.3         3.3          6.0         2.5  virginica
102          5.8         2.7          5.1         1.9  virginica
> tapply(Sepal.Length, Species, function(x) c( mean(x), sd(x)/mean(x),length(x) ))
$setosa
[1]  5.00600000  0.07041344 50.00000000

$versicolor
[1]  5.93600000  0.08695606 50.00000000

$virginica
[1]  6.58800000  0.09652089 50.00000000

> proc sort data=dat1;
>   by wshed site species;
> proc means data=dat1 noprint mean cv stderr n;
>   by wshed site species;
>   where species in ('b', 'c', 'p');
>   var BdA sla;
>   output out=dat2
>     mean=mBdA msla
>     cv=cBdA csla
>     stderr=sBdA ssla
>     n=nBdA nsla;
> 
> Thanks,
> Greg
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>

Frank E Harrell Jr

2004-Jul-17 12:36 UTC

head link

[R] sas to r

Greg Adkison wrote:> I would be incredibly grateful to anyone who'll help me translate some 
> SAS code into R code.
> 
> Say for example that I have a dataset named "dat1" that includes
five
> variables:  wshed, site, species, bda, and sla.  I can calculate with the 
> following SAS code the mean, CV, se, and number of observations of 
> "bda" and "sla" for each combination of
"wshed," "species," and "site,"
> restricting the species considered to only three of several species in 
> dat1 (b, c, and p).  Moreover, I can output these calculations and 
> grouping variables to a dataset named "dat2" that will reside in
RAM
> and include the variables  wshed, site, species, mBdA, msla, cBda, 
> sBdA, ssla, nBda, and nsla.
> 
> proc sort data=dat1;
>   by wshed site species;
> proc means data=dat1 noprint mean cv stderr n;
>   by wshed site species;
>   where species in ('b', 'c', 'p');
>   var BdA sla;
>   output out=dat2
>     mean=mBdA msla
>     cv=cBdA csla
>     stderr=sBdA ssla
>     n=nBdA nsla;
> 
> Thanks,
> Greg
The following handles any number of analysis variables, with automatic
naming of all statistics computed from them.  It requires the Hmisc package.

# Generate some data.  Put one NA in sla.
set.seed(1)
dat1 <- expand.grid(wshed=1:2, site=c('A','B'),
                     species=c('a','b','c','p'),
                     reps=1:10)
n <- nrow(dat1)
dat1 <- transform(dat1,
                   BdA = rnorm(n, 100, 20),
                   sla = c(rnorm(n-1, 200, 30), NA))
# Can use upData function in Hmisc in place of transform

# Summarization function, per stratum, for a matrix of analysis
# variables
g <- function(y) {
   n <- apply(y, 2, function(z) sum(!is.na(z)))
   m <- apply(y, 2, mean, na.rm=TRUE)
   s <- apply(y, 2, sd,   na.rm=TRUE)
   cv <- s/m
   se <- s/sqrt(n)
   w <- c(m, cv, se, n)
   names(w) <- t(outer(c('m','c','s','n'),
colnames(y), paste, sep=''))
   w
}
library(Hmisc)
dat2 <-  with(dat1,
               summarize(cbind(BdA, sla),
                         llist(wshed, site, species),
                         g,
                         subset=species %in%
c('b','c','p'),
                         stat.name='mBdA')
               )

options(digits=3)
dat2  # is a data frame

    wshed site species  mBdA msla  cBdA   csla sBdA  ssla nBdA nsla
1      1    A       b 100.5  195 0.133 0.1813 4.23 11.20   10   10
2      1    A       c  99.7  206 0.101 0.1024 3.17  6.68   10   10
3      1    A       p 101.4  188 0.239 0.1580 7.65  9.39   10   10
4      1    B       b 109.9  203 0.118 0.1433 4.09  9.21   10   10
5      1    B       c  98.4  221 0.193 0.1250 6.01  8.72   10   10
6      1    B       p 102.9  203 0.216 0.1446 7.03  9.29   10   10
7      2    A       b  95.8  195 0.241 0.2011 7.31 12.40   10   10
8      2    A       c  98.7  207 0.194 0.1274 6.04  8.33   10   10
9      2    A       p 102.2  191 0.217 0.1709 7.01 10.31   10   10
10     2    B       b  97.8  191 0.235 0.2079 7.27 12.58   10   10
11     2    B       c 100.9  194 0.164 0.0987 5.24  6.07   10   10
12     2    B       p 103.0  209 0.144 0.0769 4.69  5.35   10    9


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

Apparently Analagous Threads

Search for more apparently analagous threads

R help - Jul 2004 - sas to r

[R] sas to r

[R] sas to r

[R] sas to r

[R] sas to r

Apparently Analagous Threads