This should (hopefully) be a pretty simple task. What I'd like to do is read in a csv file containing means and standard deviations for a large number of 'n' parameters (up to 2000). The list would be in the following format (see attached read.csv): Paramter(1), mean, standard dev., Paramter(2), mean, standard dev., Paramter(3), mean, standard dev., ... Paramter(n), mean, standard dev., Based on the above csv file, I would then like to generate a large sample matrix for 's' samples, using the rnorm function. The matrix will be in the following format: 1,0,0, P1(1), P2(1), P3(1), ... Pn(1) 2,0,0, P1(2), P2(2), P3(2), ... Pn(2) .... s,0,0, P1(s), P2(s), P3(s), ... Pn(s) The first column contains the Row number. Taking s=30000, we would have rows numbered 1 to 30,000. The second and third column are fixed values - 0 The forth and subsequent columns contain values from the rnorm distribution for each parameter. P1(1) is the first value generated for the first parameter, P1(2) is the second value generated and so forth. P2(1) is the first value generated for the second parameter, P2(2) is the second value generated and so forth. Pn(1) is the first value generated for the n'th parameter, Pn(2) is the second value generated and so forth. Again the number of rows depends on 's', the number of samples. Therefore, I will be generating a fairly large matrix. This could be a 1,000,000 x 2,000 matrix. However, due to memory constraints, it may be necessary to break this down into smaller sub-matrices where I limit the number of rows. Firstly, is this possible in r, and secondly, can anyone help suggest a method for creating such a matrix. I'd really appreciate any help on this. Thank you. -- View this message in context: http://r.789695.n4.nabble.com/RNORM-matrix-based-on-CSV-file-values-for-MEAN-and-SD-tp4630901.html Sent from the R help mailing list archive at Nabble.com.
R. Michael Weylandt
2012-May-22 15:41 UTC
[R] RNORM matrix based on CSV file values for MEAN and SD
No CSV came through so I'll just assume you get in a data.frame from read.csv() that looks something like this params <- data.frame(mean = c(1,4,7), sd = c(2,2,5)) and you want 10 samples from each. If you're on memory constraints, you can simply loop over rows and append to a growing CSV. for(i in NROW(params)){ write.table(c(i, 0, 0, rnorm(10, params$mean[i], params$sd[i])), "out.csv", append = TRUE, sep =",", row.names = FALSE, col.names FALSE) } Note that we have to set the names to false or the appending gets messy. It's probably faster (though more work) to do a few rows at a time and to use textConnections so you aren't constantly opening and closing the file, but this should get you started. See the examples of ?textConnection for how to do that bit properly. Best, Michael On Tue, May 22, 2012 at 10:43 AM, dcoakley <danielcoakley1 at gmail.com> wrote:> This should (hopefully) be a pretty simple task. What I'd like to do is read > in a csv file containing means and standard deviations for a large number of > 'n' parameters (up to 2000). The list would be in the following format (see > attached read.csv): > > Paramter(1), mean, standard dev., > Paramter(2), mean, standard dev., > Paramter(3), mean, standard dev., > ... > Paramter(n), mean, standard dev., > > > Based on the above csv file, I would then like to generate a large sample > matrix for 's' samples, using the rnorm function. The matrix will be in the > following format: > > 1,0,0, P1(1), P2(1), P3(1), ... Pn(1) > 2,0,0, P1(2), P2(2), P3(2), ... Pn(2) > .... > s,0,0, P1(s), P2(s), P3(s), ... Pn(s) > > The first column contains the Row number. Taking s=30000, we would have rows > numbered 1 to 30,000. > > The second and third column are fixed values - 0 > > The forth and subsequent columns contain values from the rnorm distribution > for each parameter. P1(1) is the first value generated for the first > parameter, P1(2) is the second value generated and so forth. P2(1) is the > first value generated for the second parameter, P2(2) is the second value > generated and so forth. ?Pn(1) is the first value generated for the n'th > parameter, Pn(2) is the second value generated and so forth. > > Again the number of rows depends on 's', the number of samples. > > Therefore, I will be generating a fairly large matrix. This could be a > 1,000,000 x 2,000 matrix. However, due to memory constraints, it may be > necessary to break this down into smaller sub-matrices where I limit the > number of rows. Firstly, is this possible in r, and secondly, can anyone > help suggest a method for creating such a matrix. > > I'd really appreciate any help on this. Thank you. > > > > -- > View this message in context: http://r.789695.n4.nabble.com/RNORM-matrix-based-on-CSV-file-values-for-MEAN-and-SD-tp4630901.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.