thr3ads.net - R help - [R] combining output from several operations [Aug 2002]

If this information is useful, please help other people find it:
Share via:

Tim Wilson

2002-Aug-22 22:09 UTC

[R] combining output from several operations

Hi everyone,

I wonder if there's a patient soul out there who has a minute to look at
the following. 

I've got a set of summary statistics I need to perform many times.
Naturally, I've looked at writing a function to automate the process as
much as possible. (These are the data I mentioned recently in my
question about weighted means.) I'm having trouble figuring out the
proper syntax for taking the results of several different functions and
combining them into a single function. I'm pasting an example below of the
analysis I need to do for each column of a number of dataframes. This
works perfectly, but repeating this procedure a couple hundred times
doesn't thrill me.

The only thing that isn't complete below is that I need the describe
function (from the Hmisc library) to give me the standard deviation
as well as the mean. Is it possible to do that without modifying the
describe function directly?

I'd be glad to hear any suggestions from the R gurus on the list.

-Tim
> lapply(split(faculty$Q8, list(faculty$TWOYROR4, faculty$FACULTY)),describe)
$"2.1"
X[[1]] 
      n missing  unique    Mean 
     47       0       3   3.362 

3 (38, 81%), 4 (1, 2%), 5 (8, 17%) 

$"4.1"
X[[2]] 
      n missing  unique    Mean 
    147       0       5   1.837 

          0  1  2  3 4
Frequency 1 59 57 23 7
%         1 40 39 16 5

$"2.2"
X[[3]] 
      n missing  unique    Mean 
      2       0       1       3 

$"4.2"
X[[4]] 
      n missing  unique    Mean 
     25       0       5     1.8 

          0  1  2  3 4
Frequency 2  8  9  5 1
%         8 32 36 20 4
> a <- aggregate(faculty$Q8, list(CETP=faculty$CETP), mean)
NOTE: I'm using the aggregate function to weight the means so that each
CETP contributes equally to an overall mean and standard deviation. I
need to use this procedure on each of the four results of lapply above.
I can't figure that out at all.
> a                  CETP        x
1  ACEPT               2.521739
2  LaCEPT              1.666667
3  MASTEP              2.442308
4  MMSTEC              1.900000
5  NYCETP              1.875000
6  PETE                1.600000
7  STEMTEC             2.428571
8  Temple/Philadelphia 2.750000
9  TxCETP              2.218182
10 VCEPT               2.222222> mean(a$x)
[1] 2.162469> a <- aggregate(faculty$Q8, list(CETP=faculty$CETP), sd)
> mean(a$x)
[1] 1.041506>
-- 
Tim Wilson      |   Visit Sibley online:   | Check out:
Henry Sibley HS |  http://www.isd197.org   | http://www.zope.com
W. St. Paul, MN |                          | http://slashdot.org
wilson at visi.com |  <dtml-var pithy_quote>  | http://linux.com
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Frank E Harrell Jr

2002-Aug-23 04:04 UTC

head link

[R] combining output from several operations

On Thu, 22 Aug 2002 17:09:34 -0500
Tim Wilson <wilson at visi.com> wrote:
> Hi everyone,
> 
> I wonder if there's a patient soul out there who has a minute to look
at
> the following. 
> 
> I've got a set of summary statistics I need to perform many times.
> Naturally, I've looked at writing a function to automate the process as
> much as possible. (These are the data I mentioned recently in my
> question about weighted means.) I'm having trouble figuring out the
> proper syntax for taking the results of several different functions and
> combining them into a single function. I'm pasting an example below of
the
> analysis I need to do for each column of a number of dataframes. This
> works perfectly, but repeating this procedure a couple hundred times
> doesn't thrill me.
> 
> The only thing that isn't complete below is that I need the describe
> function (from the Hmisc library) to give me the standard deviation
> as well as the mean. Is it possible to do that without modifying the
> describe function directly?
> 
> I'd be glad to hear any suggestions from the R gurus on the list.
> 
> -Tim
> 
> > lapply(split(faculty$Q8, list(faculty$TWOYROR4, faculty$FACULTY)),
> describe)
> $"2.1"
> X[[1]] 
>       n missing  unique    Mean 
>      47       0       3   3.362 
> 
> 3 (38, 81%), 4 (1, 2%), 5 (8, 17%) 
> 
> $"4.1"
> X[[2]] 
>       n missing  unique    Mean 
>     147       0       5   1.837 
> 
>           0  1  2  3 4
> Frequency 1 59 57 23 7
> %         1 40 39 16 5
> 
> $"2.2"
> X[[3]] 
>       n missing  unique    Mean 
>       2       0       1       3 
> 
> $"4.2"
> X[[4]] 
>       n missing  unique    Mean 
>      25       0       5     1.8 
> 
>           0  1  2  3 4
> Frequency 2  8  9  5 1
> %         8 32 36 20 4
> 
> > a <- aggregate(faculty$Q8, list(CETP=faculty$CETP), mean)
> 
> NOTE: I'm using the aggregate function to weight the means so that each
> CETP contributes equally to an overall mean and standard deviation. I
> need to use this procedure on each of the four results of lapply above.
> I can't figure that out at all.
> 
> > a
>                   CETP        x
> 1  ACEPT               2.521739
> 2  LaCEPT              1.666667
> 3  MASTEP              2.442308
> 4  MMSTEC              1.900000
> 5  NYCETP              1.875000
> 6  PETE                1.600000
> 7  STEMTEC             2.428571
> 8  Temple/Philadelphia 2.750000
> 9  TxCETP              2.218182
> 10 VCEPT               2.222222
> > mean(a$x)
> [1] 2.162469
> > a <- aggregate(faculty$Q8, list(CETP=faculty$CETP), sd)
> > mean(a$x)
> [1] 1.041506
> >
> 
> -- 
> Tim Wilson      |   Visit Sibley online:   | Check out:
> Henry Sibley HS |  http://www.isd197.org   | http://www.zope.com
> W. St. Paul, MN |                          | http://slashdot.org
> wilson at visi.com |  <dtml-var pithy_quote>  | http://linux.com
Tim - describe takes a weights= argument, but you're right - describe does
not compute the SD [due to my bias against SD as a descriptive statistic,
especially for skewed data].

Frank Harrell

-- 
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Aug 2002 - combining output from several operations

[R] combining output from several operations

[R] combining output from several operations

Seemingly Similar Threads