Two possible solutions below.
On Fri, 3 Aug 2012, darnold wrote:
> Hi,
>
> Reading about a "Heads and Tails" game in
>
http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/amsbook.mac.pdf
> Introduction to Probability (Example 1.4, pp. 5-8).
>
> You toss a coin 40 times. If heads, Peter wins $1, tails, he loses $1. I
> think I can do that ok with:
>
> winnings <- sum(sample(c(-1,1), 40, replace=TRUE))
>
> But I have to do it 10,000 times and I have to record and collect the
> winnings. In other languages, I would probably use a for loop and
"push"
> each winnings into some sort of collective array or vector. However, for
> loops seem to be discouraged in R and it seems the same can be said for
> "pushing" a calculation onto a vector. So, can someone give me
some guidance
> on how to collect 10,000 winnings?
>
> The second part of the game asks us to keep track of how often Peter is in
> the lead during each game. Obviously, he is in the lead at any moment his
> cumulative winnings are positive. But the game requires that we also do
> something at the moment the cumulative winnings are zero. (1) if the
> previous cumulative sum was nonnegative, then the zero counts a
"staying in
> the lead." So, for example, during a single game, Peter might be in
the lead
> for say 34 out of the 40 tosses. I must record the 34 and perform the game
> 9,999 more times, each time recording the number of times that Peter is in
> the lead. So again, any thoughts on how to do this without for loops and
> "pushing?"
In general, vectorizing means doing a bunch of calculations in each
step. Keep in mind that apply and friends are basically convenience
notation for loops (they are convenient, but are best applied after
a bunch of vectorization has already been applied).
In some cases multiple levels of loops can be vectorized together.
I was only able to manage one level, but choosing the RIGHT level
to vectorize is important... get as many calculations done at each
step of code as possible to get the fastest speedup.
The ideas:
1) Have "sample" do as many coin flips at once as possible. Fold
the results into a matrix to make it look like multiple
sets of results.
2) Use vector logical tests to perform the game logic. Use
vector indexing if appropriate to look at "previous" results.
3) Use automatic conversion of logical results to integer for
accumulating counts of successes.
4) Consider using cumsum or similar vectorized operations.
5) Since there are more games than moves in each game, consider
simulating each step of all games vectorially to minimize
the number of looping steps.
The code:
#------------------------------------
# Head or Tails game
# Approach by vectorizing each game evaluation and looping over games
loopbygame <- function( results ) {
gameleadcount <- function(v) {
pot <- cumsum( v )
sum( pot > 0 | c( FALSE, 0==pot[ -1 ] & 0 != pot[ -nrow( results ) ]
) )
}
apply(results,2,gameleadcount)
}
# Approach by vectorizing one step from all games and looping over steps
loopbymove <- function( results ) {
pot <- rep( 0, ncol( results ) )
leadcount <- rep( 0, ncol( results ) )
for ( moveno in 1:nrow( results ) ) {
oldpot <- pot
pot <- pot + results[ moveno, ]
leadcount <- leadcount + ( pot > 0 | 0 == pot & 0 != oldpot )
}
leadcount
}
gamelen <- 40
gamecount <- 10000
set.seed( 123 )
results <- matrix( sample( c( -1, 1 ), gamelen*gamecount, replace=TRUE )
, ncol=gamecount )
system.time( ans1 <- loopbymove( results ) )
system.time( ans2 <-loopbygame( results ) )
all.equal( ans1, ans2 )
My results:
> system.time( ans1 <- loopbymove( results ) )
user system elapsed
0.040 0.000 0.037> system.time( ans2 <- loopbygame( results ) )
user system elapsed
0.224 0.000 0.219> all.equal( ans1, ans2 )
[1] TRUE
> Thanks for the help. Great list.
You are welcome. Note that on this "great list" normally requests for
"ideas" on optimization without a concrete, reproducible code example
of
some type are groused at for not showing sufficient research on your part
and for essentially asking us to do your work for you. In this case I
found the problem interesting enough to respond in spite of your lack
of reproducible (even if inefficient) code.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k