thr3ads.net - R help - [R] RNORM matrix based on CSV file values for MEAN and SD [May 2012]

If this information is useful, please help other people find it:
Share via:

dcoakley

2012-May-22 14:43 UTC

[R] RNORM matrix based on CSV file values for MEAN and SD

This should (hopefully) be a pretty simple task. What I'd like to do is read
in a csv file containing means and standard deviations for a large number of
'n' parameters (up to 2000). The list would be in the following format
(see
attached read.csv):

Paramter(1), mean, standard dev.,
Paramter(2), mean, standard dev.,
Paramter(3), mean, standard dev.,
...
Paramter(n), mean, standard dev.,


Based on the above csv file, I would then like to generate a large sample
matrix for 's' samples, using the rnorm function. The matrix will be in
the
following format:

1,0,0, P1(1), P2(1), P3(1), ... Pn(1)
2,0,0, P1(2), P2(2), P3(2), ... Pn(2)
....
s,0,0, P1(s), P2(s), P3(s), ... Pn(s)

The first column contains the Row number. Taking s=30000, we would have rows
numbered 1 to 30,000.

The second and third column are fixed values - 0

The forth and subsequent columns contain values from the rnorm distribution
for each parameter. P1(1) is the first value generated for the first
parameter, P1(2) is the second value generated and so forth. P2(1) is the
first value generated for the second parameter, P2(2) is the second value
generated and so forth.  Pn(1) is the first value generated for the n'th
parameter, Pn(2) is the second value generated and so forth.

Again the number of rows depends on 's', the number of samples.

Therefore, I will be generating a fairly large matrix. This could be a
1,000,000 x 2,000 matrix. However, due to memory constraints, it may be
necessary to break this down into smaller sub-matrices where I limit the
number of rows. Firstly, is this possible in r, and secondly, can anyone
help suggest a method for creating such a matrix.

I'd really appreciate any help on this. Thank you.



--
View this message in context:
http://r.789695.n4.nabble.com/RNORM-matrix-based-on-CSV-file-values-for-MEAN-and-SD-tp4630901.html
Sent from the R help mailing list archive at Nabble.com.

R. Michael Weylandt

2012-May-22 15:41 UTC

head link

[R] RNORM matrix based on CSV file values for MEAN and SD

No CSV came through so I'll just assume you get in a data.frame from
read.csv() that looks something like this

params <- data.frame(mean = c(1,4,7), sd = c(2,2,5))

and you want 10 samples from each. If you're on memory constraints,
you can simply loop over rows and append to a growing CSV.

for(i in NROW(params)){
    write.table(c(i, 0, 0, rnorm(10, params$mean[i], params$sd[i])),
"out.csv", append = TRUE, sep =",", row.names = FALSE,
col.names FALSE)
}

Note that we have to set the names to false or the appending gets messy.

It's probably faster (though more work) to do a few rows at a time and
to use textConnections so you aren't constantly opening and closing
the file, but this should get you started.

See the examples of ?textConnection for how to do that bit properly.

Best,

Michael


On Tue, May 22, 2012 at 10:43 AM, dcoakley <danielcoakley1 at gmail.com>
wrote:> This should (hopefully) be a pretty simple task. What I'd like to do is
read
> in a csv file containing means and standard deviations for a large number
of
> 'n' parameters (up to 2000). The list would be in the following
format (see
> attached read.csv):
>
> Paramter(1), mean, standard dev.,
> Paramter(2), mean, standard dev.,
> Paramter(3), mean, standard dev.,
> ...
> Paramter(n), mean, standard dev.,
>
>
> Based on the above csv file, I would then like to generate a large sample
> matrix for 's' samples, using the rnorm function. The matrix will
be in the
> following format:
>
> 1,0,0, P1(1), P2(1), P3(1), ... Pn(1)
> 2,0,0, P1(2), P2(2), P3(2), ... Pn(2)
> ....
> s,0,0, P1(s), P2(s), P3(s), ... Pn(s)
>
> The first column contains the Row number. Taking s=30000, we would have
rows
> numbered 1 to 30,000.
>
> The second and third column are fixed values - 0
>
> The forth and subsequent columns contain values from the rnorm distribution
> for each parameter. P1(1) is the first value generated for the first
> parameter, P1(2) is the second value generated and so forth. P2(1) is the
> first value generated for the second parameter, P2(2) is the second value
> generated and so forth. ?Pn(1) is the first value generated for the
n'th
> parameter, Pn(2) is the second value generated and so forth.
>
> Again the number of rows depends on 's', the number of samples.
>
> Therefore, I will be generating a fairly large matrix. This could be a
> 1,000,000 x 2,000 matrix. However, due to memory constraints, it may be
> necessary to break this down into smaller sub-matrices where I limit the
> number of rows. Firstly, is this possible in r, and secondly, can anyone
> help suggest a method for creating such a matrix.
>
> I'd really appreciate any help on this. Thank you.
>
>
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/RNORM-matrix-based-on-CSV-file-values-for-MEAN-and-SD-tp4630901.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Maybe Matching Threads

Search for more seemingly similar threads

R help - May 2012 - RNORM matrix based on CSV file values for MEAN and SD

[R] RNORM matrix based on CSV file values for MEAN and SD

[R] RNORM matrix based on CSV file values for MEAN and SD

Maybe Matching Threads