I'm looking for a good form in which to store matrix results of a simulation. I am doing a simulation study. Each simulation generates some data and then analyzes it. I want to record the results of many simulations and analyze them. Say r has the results of one simulation, and I care about r$coefficients, a vector of coefficients, and r$var, the estimated covariance matrix. I'll do lots of simulations and then look at the results, computing the mean of each value. I'm looking for a good way to save and then analyze the results. The coefficients seem to fit well into a data frame, but I'm looking for a good way to handle the matrix. The only structure I've discovered that can even handle a set of matrices is a list. It also occurs to me the results could go to a 3 dimensional array; I suppose it would be good to make the last index vary with the simulation. Neither of these approaches seems ideal, because I would need to handle the matrix separately from the other data I want to store. I'm hoping to do something like simresults <- rbind(simresults, r$coeff, r$var). The result also needs to be amenable to calculations. If m1 and m2 are matrices (same dimension for each) mean(list(m1, m2)) doesn't work, so even though list will record the data it isn't a great form for analysis. (But I suppose some apply variant would work with 3d arrays). Any suggestions for good ways to approach this? Again, the ideal solution would have * consistent handling of matrices and other data * easy computation of (e.g.) means for the results. Thanks. P.S. I'm also aware I could accumulate means as I go, but I'm looking for a more general solution.
In simulation like you describe, it is best to avoid using rbind in a loop, as that has more overhead than creating objects of the size required to store the results before you start the loop. Also, if all your results are numbers, it may be better to avoid data.frames as they require more overhead than simple arrays. See, e.g., Venables and Ripley (2002) Modern Applied Statistics with S, 4th ed. (Springer) or Venables and Ripley (2000) S Programming (Springer). If I wanted to save all the coefficients and all the covariance matrices, I might create separate arrays for coefficients and for the covariance matrices, like the following, N <- 2 # number of simulates k <- 3 # number of coefficients Coef <- array(NA, dim=c(N, k)) dimnames(Coef) <- list(NULL, letters[1:k]) Var <- array(NA, dim=c(N, k, k)) dimnames(Var) <- list(NULL, letters[1:k], letters[1:k]) ## Each iteration would include something like the following: i <- 1 Coef1 <- 1:3 Var1 <- array(1:9, dim=c(3,3)) Var1 <- (Var1+t(Var1)) Coef[1,] <- Coef1 Var[1, , ] <- Var1 ## ## If I did not want to store two copies of all the covariances, ## I might do something like the following # Set up Results <- array(NA, dim=c(N, 2*k + choose(k, 2))) ## In each interation: Results[1,1:k] <- Coef1 Results[1, -(1:k)] <- Var1[!lower.tri(Var1)] hope this helps. spencer graves Ross Boylan wrote:> I'm looking for a good form in which to store matrix results of a > simulation. > > I am doing a simulation study. Each simulation generates some data > and then analyzes it. I want to record the results of many > simulations and analyze them. Say r has the results of one > simulation, and I care about r$coefficients, a vector of coefficients, > and r$var, the estimated covariance matrix. > > I'll do lots of simulations and then look at the results, computing > the mean of each value. > > I'm looking for a good way to save and then analyze the results. The > coefficients seem to fit well into a data frame, but I'm looking for a > good way to handle the matrix. > > The only structure I've discovered that can even handle a set of > matrices is a list. It also occurs to me the results could go to a 3 > dimensional array; I suppose it would be good to make the last index > vary with the simulation. > > Neither of these approaches seems ideal, because I would need to > handle the matrix separately from the other data I want to store. I'm > hoping to do something like simresults <- rbind(simresults, r$coeff, > r$var). > > The result also needs to be amenable to calculations. If m1 and m2 > are matrices (same dimension for each) mean(list(m1, m2)) doesn't > work, so even though list will record the data it isn't a great form > for analysis. (But I suppose some apply variant would work with 3d > arrays). > > Any suggestions for good ways to approach this? Again, the ideal > solution would have > * consistent handling of matrices and other data > * easy computation of (e.g.) means for the results. > > Thanks. > > P.S. I'm also aware I could accumulate means as I go, but I'm looking > for a more general solution. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Adaikalavan RAMASAMY
2003-Sep-10 02:03 UTC
[R] recording and taking mean of a set of matrices
" mean(list(m1, m2)) " will not work. mylist <- list(m1, m2) sapply( mylist, FUN=mean ) gives will give you the mean of m1 and m2 sapply( mylist, FUN= function(x) apply(x, 2, mean) ) will give you the column means of m1 and m2 in a matrix format. Double check the resulting dimension. Here are two ways to store results (say calculating quadratic of a series) after each iteration : - for(x in 1:100){ # Option 1 tmp[ ,x] <- x^2 cat(x, "\t", tmp[ ,x], "\n", sep="", file="out.txt", append=TRUE) # Option 2 save(tmp, file="out.rda", compress=T) Regards, Adai. -----Original Message----- From: Ross Boylan [mailto:ross at biostat.ucsf.edu] Sent: Wednesday, September 10, 2003 8:33 AM To: R-help at stat.math.ethz.ch Subject: [R] recording and taking mean of a set of matrices I'm looking for a good form in which to store matrix results of a simulation. I am doing a simulation study. Each simulation generates some data and then analyzes it. I want to record the results of many simulations and analyze them. Say r has the results of one simulation, and I care about r$coefficients, a vector of coefficients, and r$var, the estimated covariance matrix. I'll do lots of simulations and then look at the results, computing the mean of each value. I'm looking for a good way to save and then analyze the results. The coefficients seem to fit well into a data frame, but I'm looking for a good way to handle the matrix. The only structure I've discovered that can even handle a set of matrices is a list. It also occurs to me the results could go to a 3 dimensional array; I suppose it would be good to make the last index vary with the simulation. Neither of these approaches seems ideal, because I would need to handle the matrix separately from the other data I want to store. I'm hoping to do something like simresults <- rbind(simresults, r$coeff, r$var). The result also needs to be amenable to calculations. If m1 and m2 are matrices (same dimension for each) mean(list(m1, m2)) doesn't work, so even though list will record the data it isn't a great form for analysis. (But I suppose some apply variant would work with 3d arrays). Any suggestions for good ways to approach this? Again, the ideal solution would have * consistent handling of matrices and other data * easy computation of (e.g.) means for the results. Thanks. P.S. I'm also aware I could accumulate means as I go, but I'm looking for a more general solution. ______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Gabor Grothendieck
2003-Sep-10 02:31 UTC
[R] recording and taking mean of a set of matrices
Just one small comment. R stores arrays in reverse odometer order, i.e. the leftmost subscript varies fastest. Thus, you might want to use the last subscript to represent the simulation number rather than the first. That way, the entire vector or matrix is stored together (i.e. contiguously in memory), which might help performance. --- Spencer Graves <spencer.graves@pdf.com> wrote:> In simulation like you describe, it is best to avoid using rbind in a >loop, as that has more overhead than creating objects of the size >required to store the results before you start the loop. Also, if all >your results are numbers, it may be better to avoid data.frames as they >require more overhead than simple arrays. See, e.g., Venables and >Ripley (2002) Modern Applied Statistics with S, 4th ed. (Springer) or >Venables and Ripley (2000) S Programming (Springer). > > If I wanted to save all the coefficients and all the covariance >matrices, I might create separate arrays for coefficients and for the >covariance matrices, like the following, > >N <- 2 # number of simulates >k <- 3 # number of coefficients >Coef <- array(NA, dim=c(N, k)) >dimnames(Coef) <- list(NULL, letters[1:k]) >Var <- array(NA, dim=c(N, k, k)) >dimnames(Var) <- list(NULL, letters[1:k], letters[1:k]) > >## Each iteration would include something like the following: >i <- 1 >Coef1 <- 1:3 >Var1 <- array(1:9, dim=c(3,3)) >Var1 <- (Var1+t(Var1)) >Coef[1,] <- Coef1 >Var[1, , ] <- Var1 > >## >## If I did not want to store two copies of all the covariances, >## I might do something like the following > ># Set up >Results <- array(NA, dim=c(N, 2*k + choose(k, 2))) > >## In each interation: >Results[1,1:k] <- Coef1 >Results[1, -(1:k)] <- Var1[!lower.tri(Var1)] > >hope this helps. >spencer graves > >Ross Boylan wrote: >> I'm looking for a good form in which to store matrix results of a >> simulation. >> >> I am doing a simulation study. Each simulation generates some data >> and then analyzes it. I want to record the results of many >> simulations and analyze them. Say r has the results of one >> simulation, and I care about r$coefficients, a vector of coefficients, >> and r$var, the estimated covariance matrix. >> >> I'll do lots of simulations and then look at the results, computing >> the mean of each value. >> >> I'm looking for a good way to save and then analyze the results. The >> coefficients seem to fit well into a data frame, but I'm looking for a >> good way to handle the matrix. >> >> The only structure I've discovered that can even handle a set of >> matrices is a list. It also occurs to me the results could go to a 3 >> dimensional array; I suppose it would be good to make the last index >> vary with the simulation. >> >> Neither of these approaches seems ideal, because I would need to >> handle the matrix separately from the other data I want to store. I'm >> hoping to do something like simresults <- rbind(simresults, r$coeff, >> r$var). >> >> The result also needs to be amenable to calculations. If m1 and m2 >> are matrices (same dimension for each) mean(list(m1, m2)) doesn't >> work, so even though list will record the data it isn't a great form >> for analysis. (But I suppose some apply variant would work with 3d >> arrays). >> >> Any suggestions for good ways to approach this? Again, the ideal >> solution would have >> * consistent handling of matrices and other data >> * easy computation of (e.g.) means for the results. >> >> Thanks. >> >> P.S. I'm also aware I could accumulate means as I go, but I'm looking >> for a more general solution. >> >> ______________________________________________ >> R-help@stat.math.ethz.ch mailing list >> https://www.stat.math.ethz.ch/mailman/listinfo/r-help > >______________________________________________ >R-help@stat.math.ethz.ch mailing list >https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Richard A. O'Keefe
2003-Sep-10 05:18 UTC
[R] recording and taking mean of a set of matrices
If there are a _lot_ of results, and they are not needed until after the simulations have all run, it's always possible to write them out to a file or files and then use scan() or read.table() or something like that to read them back in.