thr3ads.net - R help - [R] Multiple binomial tests on a large table [Jul 2010]

If this information is useful, please help other people find it:
Share via:

Wilson, Andrew

2010-Jul-29 08:05 UTC

[R] Multiple binomial tests on a large table

I need to run binomial tests (binom.test) on a large set of data, stored
in a table - 600 tests in total.

The values of x are stored in a column, as are the values of n.  The
data for each test are on a separate row.

For example:

X	N
11	19
9	26
13	21
13	27
18	30

It is a two-tailed test, and P in all cases is 0.5.

My question is:  Is there a quicker way of running these tests without
having to type an individual command for each test - and ideally also to
store the resulting p-values in a single data vector?

Many thanks for any pointers,

Andrew Wilson

Dennis Murphy

2010-Jul-29 10:03 UTC

head link

[R] Multiple binomial tests on a large table

Hi:

Here's one approach (not unique), and dragged out a bit to illustrate its
different components.

1. Create a list object, something like
   l <- vector('list', 600)
2. Populate it. There are several ways to do this, but one is to initially
create a vector of file names and then populate the list by looping over the
file names. If your file names have a simple format (dat001 - dat600, say),
then it's easy to create the file name vector with paste(); otherwise, you
may need to do more work. Then run a loop that assigns to each list
component the corresponding data frame, something like

for(i in seq_along(filenames)) l[[i]] <- get(filenames[i])

3. Create a function for one of the data sets, under the obvious proviso
that you intend to process each data frame in the list the same way. To
return only the p-values from a binomial test applied to each row of your
input data frame, the following works for me (explanation below):

f <- function(df)
  do.call(c, with(df, mapply(binom.test, x = X, n = N))[3, ])

4. Use lapply() to map the function to each component data frame in your
list; the result will also be a list.
 pvlist <- lapply(l, f)

5. *IF* each of your data frames has the same number of rows, you can use
the following to slurp together all the p-values into a matrix:

do.call(rbind, pvlist)

OTOH, if the number of rows vary from one data frame to another, it may be
best to keep the p-value results in list form or perhaps you could flatten
them into a numeric vector, depending on your purposes.

----
The function f:

 mapply() allows you, in this case, to execute the non-vectorized function
binom.test() to a pair of vector arguments supplied from the input data
frame. The result is a 9 x n matrix where each column comprises a list of
output for each of the n calls to binom.test() [where n = number of rows of
the input data frame]. Since you wanted the p-values (component/row 3), we
pull out the third row of the matrix. This will return a list, so using the
concatenation function c() in do.call() coerces them into a numeric vector
for output.

The lapply() call maps the function f to each component of the list of data
frames created in (2).
-----

An alternative approach to this problem would be to use the plyr (and
perhaps reshape, too) package, since it was designed to handle this
'split-apply-combine' strategy.

HTH,
Dennis

On Thu, Jul 29, 2010 at 1:05 AM, Wilson, Andrew
<a.wilson@lancaster.ac.uk>wrote:
> I need to run binomial tests (binom.test) on a large set of data, stored
> in a table - 600 tests in total.
>
> The values of x are stored in a column, as are the values of n.  The
> data for each test are on a separate row.
>
> For example:
>
> X       N
> 11      19
> 9       26
> 13      21
> 13      27
> 18      30
>
> It is a two-tailed test, and P in all cases is 0.5.
>
> My question is:  Is there a quicker way of running these tests without
> having to type an individual command for each test - and ideally also to
> store the resulting p-values in a single data vector?
>
> Many thanks for any pointers,
>
> Andrew Wilson
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Dennis Murphy

2010-Jul-29 10:28 UTC

head link

[R] Multiple binomial tests on a large table

Hi:

As it turns out, this is pretty straightforward using plyr's ldply()
function. Here's a toy example:

d1 <- structure(list(X = c(11L, 9L, 13L, 13L, 18L), N = c(19L, 26L,
21L, 27L, 30L)), .Names = c("X", "N"), class =
"data.frame", row.names c(NA,
-5L))
w <- sample(1:50, 5)
d2 <- data.frame(X = mapply(rbinom, 1, w, 0.5), N = w)
w <- sample(1:50, 5)
d3 <- data.frame(X = mapply(rbinom, 1, w, 0.5), N = w)

# Combine data frames into a list - since these are already R objects, the
call is easy:
l <- list(d1, d2, d3)

# the function:
f <- function(df)
  do.call(c, with(df, mapply(binom.test, x = X, n = N))[3, ])

# do.call + lapply:
do.call(rbind, lapply(l, f))
          [,1]      [,2]       [,3]      [,4]      [,5]
[1,] 0.6476059 0.1686375 0.38331032 1.0000000 0.3615946
[2,] 0.3019956 0.6515878 0.02944937 0.5600646 1.0000000
[3,] 1.0000000 1.0000000 0.81452942 0.0390625 0.4050322

# plyr approach:
library(plyr)
ldply(l, f)
         V1        V2         V3        V4        V5
1 0.6476059 0.1686375 0.38331032 1.0000000 0.3615946
2 0.3019956 0.6515878 0.02944937 0.5600646 1.0000000
3 1.0000000 1.0000000 0.81452942 0.0390625 0.4050322

ldply() takes a list as input along with a function to process in the
lapply() step and returns a data frame of results. So the plyr approach can
be summarized as:

1. Create a list of data frames.
2. Create a function to apply to each data frame.
3. Load the plyr package.
4. Run ldply().

Essentially, the plyr package provides a number of convenient 'wrapper'
functions to simplify the 'split-apply-combine' strategy of data
analysis
for various combinations of input and output objects.

HTH,
Dennis


On Thu, Jul 29, 2010 at 1:05 AM, Wilson, Andrew
<a.wilson@lancaster.ac.uk>wrote:
> I need to run binomial tests (binom.test) on a large set of data, stored
> in a table - 600 tests in total.
>
> The values of x are stored in a column, as are the values of n.  The
> data for each test are on a separate row.
>
> For example:
>
> X       N
> 11      19
> 9       26
> 13      21
> 13      27
> 18      30
>
> It is a two-tailed test, and P in all cases is 0.5.
>
> My question is:  Is there a quicker way of running these tests without
> having to type an individual command for each test - and ideally also to
> store the resulting p-values in a single data vector?
>
> Many thanks for any pointers,
>
> Andrew Wilson
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Jul 2010 - Multiple binomial tests on a large table

[R] Multiple binomial tests on a large table

[R] Multiple binomial tests on a large table

[R] Multiple binomial tests on a large table

Seemingly Similar Threads