aegea
2010-Feb-01 02:46 UTC
[R] how to generate data set with different length and calculate the mean?
Hello, This may be a rare question. I am struggling to solve it. I really appreciate any help or suggestions. Thanks a lot in advance! I put my questions between the code to make it clear. The problem I have is: I generated 10 data sets with 8 data for each set. Now I want to change the number of data in each dataset according to a vector 'size' (as follows), that is, each new dataset contains different number of data. How can I do it? After generating the new datasets, how can I seperate the data from two distributions and calculate the sample mean? Thanks a lot. # generate 10 data sets, each data sets include 8 sample. 4 from N(0, 1) and 4 from N(5, 1) data<- matrix(0,10,8) th <- c(0, 5, 1) for(i in 1:10){ data[i,] <- rnorm(8,mean= rep(th[1:2],8/2),sd=th[3]) } # change the number of samples for each data set. e.g. the first dataset needs to increase to 20, the #first 8 keep the same, add another 12 sample (6 from N(0,1) and the other 6 from N(5, 1) ), the second #dataset needs to increase to 10, keep the first 8 the same, generate another 2 (one from N(0,1) and the #other one from N(5,1)), the third data set does not need to change. etc. size=c(20, 10, 8, 14, 16, 12, 8, 80) # Since each data set changes to different size, and add different number of data, for each dataset how #can I calculate the difference of the sample mean from N(0,1) and the sample mean from #N(5,1) and the pooled standard deviation of two samples. Two difficulties: each new dataset includes #different number of data; another difficulty, when I generated data, the two successive data are #from different normal distribution, how can I seperate them and calculate the average for each sample #and pooled standard deviation? -- View this message in context: http://n4.nabble.com/how-to-generate-data-set-with-different-length-and-calculate-the-mean-tp1458420p1458420.html Sent from the R help mailing list archive at Nabble.com.
Petr PIKAL
2010-Feb-01 12:44 UTC
[R] Odp: how to generate data set with different length and calculate the mean?
Hi I have no idea how you could do what you want. I only recommend you to use list instead of matrix as list can incorporate objects with various size I am not sure if this is the most elegant way but you can make your matrix a data frame ddd<- as.data.frame(data) and than use thist lapply(ddd, function(x) unlist(list(x))) To get list of vectors Regards Petr r-help-bounces at r-project.org napsal dne 01.02.2010 03:46:34:> > Hello, > > This may be a rare question. I am struggling to solve it. I really > appreciate any help or suggestions. Thanks a lot in advance! > > > I put my questions between the code to make it clear. The problem I haveis:> I generated 10 data sets with 8 data for each set. Now I want to changethe> number of data in each dataset according to a vector 'size' (asfollows),> that is, each new dataset contains different number of data. How can Ido> it? After generating the new datasets, how can I seperate the data fromtwo> distributions and calculate the sample mean? Thanks a lot. > > > > # generate 10 data sets, each data sets include 8 sample. 4 from N(0, 1)and> 4 from N(5, 1) > data<- matrix(0,10,8) > th <- c(0, 5, 1) > for(i in 1:10){ > data[i,] <- rnorm(8,mean= rep(th[1:2],8/2),sd=th[3]) > } > > # change the number of samples for each data set. e.g. the firstdataset> needs to increase to 20, the #first 8 keep the same, add another 12sample> (6 from N(0,1) and the other 6 from N(5, 1) ), the second #dataset needsto> increase to 10, keep the first 8 the same, generate another 2 (one from > N(0,1) and the #other one from N(5,1)), the third data set does notneed to> change. etc. > > size=c(20, 10, 8, 14, 16, 12, 8, 80) > > > # Since each data set changes to different size, and add differentnumber of> data, for each dataset how #can I calculate the difference of thesample> mean from N(0,1) and the sample mean from > #N(5,1) and the pooled standard deviation of two samples. Twodifficulties:> each new dataset includes #different number of data; another difficulty, > when I generated data, the two successive data are > #from different normal distribution, how can I seperate them andcalculate> the average for each sample #and pooled standard deviation? > > > > -- > View this message in context:http://n4.nabble.com/how-to-generate-data-set-> with-different-length-and-calculate-the-mean-tp1458420p1458420.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.