Czarek Kowalski
2017-May-09 21:05 UTC
[R] Generating samples from truncated multivariate Student-t distribution
I have already posted that in attachement - pdf file. I am posting plain text here:> library(tmvtnorm)> meann = c(55, 40, 50, 35, 45, 30)> covv = matrix(c( 1, 1, 0, 2, -1, -1,+ 1, 16, -6, -6, -2, 12, + 0, -6, 4, 2, -2, -5, + 2, -6, 2, 25, 0, -17, + -1, -2, -2, 0, 9, -5, + -1, 12, -5, -17, -5, 36), 6, 6)> df = 4> lower = c(20, 20, 20, 20, 20, 20)> upper = c(60, 60, 60, 60, 60, 60)> X1 <- rtmvt(n=100000, meann, covv, df, lower, upper)> sum(X1[,1]) / 100000[1] 54.98258> sum(X1[,2]) / 100000[1] 40.36153> sum(X1[,3]) / 100000[1] 49.83571> sum(X1[,4]) / 100000[1] 34.69571 # "4th element of mean vector"> sum(X1[,5]) / 100000[1] 44.81081> sum(X1[,6]) / 100000[1] 31.10834 And corresponding results received using equation (3) from pdf file: [54.97, 40, 49.95, 35.31, # "4th element of mean vector" 44.94, 31.32] On 9 May 2017 at 22:17, David Winsemius <dwinsemius at comcast.net> wrote:> >> On May 9, 2017, at 1:11 PM, Czarek Kowalski <czarek230800 at gmail.com> wrote: >> >> Of course I have expected the difference between theory and a sample >> of realizations of RV's and result mean should still be a random >> variable. But, for example for 4th element of mean vector: 35.31 - >> 34.69571 = 0.61429. It is quite big difference, nieprawda?? I have >> expected that the difference would be smaller because of law of large >> numbers (for 10mln samples the difference is quite similar). > > I for one have no idea what is meant by a "4th element of mean vector". So I have now idea what to consider "big". I have found that my intuitions about multivariate distributions, especially those where the covariate structure is as complex as you have assumed, are often far from simulated results. > > I suggest you post some code and results. > > -- > David. > > >> >> On 9 May 2017 at 21:40, David Winsemius <dwinsemius at comcast.net> wrote: >>> >>>> On May 9, 2017, at 10:09 AM, Czarek Kowalski <czarek230800 at gmail.com> wrote: >>>> >>>> Dear Members, >>>> I am working with 6-dimensional Student-t distribution with 4 degrees >>>> of freedom truncated to [20; 60]. I have generated 100 000 samples >>>> from truncated multivariate Student-t distribution using rtmvt >>>> function from package ?tmvtnorm?. I have also calculated mean vector >>>> using equation (3) from attached pdf. The problem is, that after >>>> summing all elements in one column of rtmvt result (and dividing by >>>> 100 000) I do not receive the same result as using (3) equation. Could >>>> You tell me, what is incorrect, why there is a difference? >>> >>> I guess the question is why you would NOT expect a difference between theory and a sample of realizations of RV's? The result mean should still be a random variable, night wahr? >>> >>> >>>> Yours faithfully >>>> Czarek Kowalski >>>> <truncatedT.pdf>______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> David Winsemius >>> Alameda, CA, USA >>> > > David Winsemius > Alameda, CA, USA >
David Winsemius
2017-May-09 21:33 UTC
[R] Generating samples from truncated multivariate Student-t distribution
> On May 9, 2017, at 2:05 PM, Czarek Kowalski <czarek230800 at gmail.com> wrote: > > I have already posted that in attachement - pdf file.I see that now. I failed to scroll to the 3rd page.> I am posting > plain text here: > >> library(tmvtnorm) >> meann = c(55, 40, 50, 35, 45, 30) >> covv = matrix(c( 1, 1, 0, 2, -1, -1, > 1, 16, -6, -6, -2, 12, > 0, -6, 4, 2, -2, -5, > 2, -6, 2, 25, 0, -17, > -1, -2, -2, 0, 9, -5, > -1, 12, -5, -17, -5, 36), 6, 6) > df = 4 > lower = c(20, 20, 20, 20, 20, 20) > upper = c(60, 60, 60, 60, 60, 60) > X1 <- rtmvt(n=100000, meann, covv, df, lower, upper) > > >> sum(X1[,1]) / 100000 > [1] 54.98258 > sum(X1[,2]) / 100000 > [1] 40.36153 > sum(X1[,3]) / 100000 > [1] 49.83571 > sum(X1[,4]) / 100000 > [1] 34.69571 # "4th element of mean vector" > sum(X1[,5]) / 100000 > [1] 44.81081 > sum(X1[,6]) / 100000 > [1] 31.10834 > > And corresponding results received using equation (3) from pdf file: > [54.97, > 40, > 49.95, > 35.31, # "4th element of mean vector" > 44.94, > 31.32] >I get similar results for the output from your code, My 100-fold run of your calculations were: meansBig <- replicate(100, {Xbig <- rtmvt(n=100000, meann, covv, df, lower, upper) colMeans(Xbig)} ) describe(meansBig[4,]) # describe is from Hmisc package meansBig[4, ] n missing distinct Info Mean Gmd .05 .10 .25 100 0 100 1 34.7 0.01954 34.68 34.68 34.69 .50 .75 .90 .95 34.70 34.72 34.72 34.73 lowest : 34.65222 34.66675 34.66703 34.66875 34.67566 highest: 34.72939 34.73012 34.73051 34.73742 34.74441 So agree, 35.31 is outside the plausible range of an RV formed with that package, but I don't have any code relating to your calculations from theory. Best; David.> On 9 May 2017 at 22:17, David Winsemius <dwinsemius at comcast.net> wrote: >> >>> On May 9, 2017, at 1:11 PM, Czarek Kowalski <czarek230800 at gmail.com> wrote: >>> >>> Of course I have expected the difference between theory and a sample >>> of realizations of RV's and result mean should still be a random >>> variable. But, for example for 4th element of mean vector: 35.31 - >>> 34.69571 = 0.61429. It is quite big difference, nieprawda?? I have >>> expected that the difference would be smaller because of law of large >>> numbers (for 10mln samples the difference is quite similar). >> >> I for one have no idea what is meant by a "4th element of mean vector". So I have now idea what to consider "big". I have found that my intuitions about multivariate distributions, especially those where the covariate structure is as complex as you have assumed, are often far from simulated results. >> >> I suggest you post some code and results. >> >> -- >> David. >> >> >>> >>> On 9 May 2017 at 21:40, David Winsemius <dwinsemius at comcast.net> wrote: >>>> >>>>> On May 9, 2017, at 10:09 AM, Czarek Kowalski <czarek230800 at gmail.com> wrote: >>>>> >>>>> Dear Members, >>>>> I am working with 6-dimensional Student-t distribution with 4 degrees >>>>> of freedom truncated to [20; 60]. I have generated 100 000 samples >>>>> from truncated multivariate Student-t distribution using rtmvt >>>>> function from package ?tmvtnorm?. I have also calculated mean vector >>>>> using equation (3) from attached pdf. The problem is, that after >>>>> summing all elements in one column of rtmvt result (and dividing by >>>>> 100 000) I do not receive the same result as using (3) equation. Could >>>>> You tell me, what is incorrect, why there is a difference? >>>> >>>> I guess the question is why you would NOT expect a difference between theory and a sample of realizations of RV's? The result mean should still be a random variable, night wahr? >>>> >>>> >>>>> Yours faithfully >>>>> Czarek Kowalski >>>>> <truncatedT.pdf>______________________________________________ >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> David Winsemius >>>> Alameda, CA, USA >>>> >> >> David Winsemius >> Alameda, CA, USA >>David Winsemius Alameda, CA, USA
David Winsemius
2017-May-09 21:50 UTC
[R] Generating samples from truncated multivariate Student-t distribution
> On May 9, 2017, at 2:33 PM, David Winsemius <dwinsemius at comcast.net> wrote: > > >> On May 9, 2017, at 2:05 PM, Czarek Kowalski <czarek230800 at gmail.com> wrote: >> >> I have already posted that in attachement - pdf file. > > I see that now. I failed to scroll to the 3rd page. > >> I am posting >> plain text here: >> >>> library(tmvtnorm) >>> meann = c(55, 40, 50, 35, 45, 30) >>> covv = matrix(c( 1, 1, 0, 2, -1, -1, >> 1, 16, -6, -6, -2, 12, >> 0, -6, 4, 2, -2, -5, >> 2, -6, 2, 25, 0, -17, >> -1, -2, -2, 0, 9, -5, >> -1, 12, -5, -17, -5, 36), 6, 6) >> df = 4 >> lower = c(20, 20, 20, 20, 20, 20) >> upper = c(60, 60, 60, 60, 60, 60) >> X1 <- rtmvt(n=100000, meann, covv, df, lower, upper) >> >> >>> sum(X1[,1]) / 100000 >> [1] 54.98258 >> sum(X1[,2]) / 100000 >> [1] 40.36153 >> sum(X1[,3]) / 100000 >> [1] 49.83571 >> sum(X1[,4]) / 100000 >> [1] 34.69571 # "4th element of mean vector" >> sum(X1[,5]) / 100000 >> [1] 44.81081 >> sum(X1[,6]) / 100000 >> [1] 31.10834 >> >> And corresponding results received using equation (3) from pdf file: >> [54.97, >> 40, >> 49.95, >> 35.31, # "4th element of mean vector" >> 44.94, >> 31.32] >> > > I get similar results for the output from your code, > > My 100-fold run of your calculations were: > > meansBig <- replicate(100, {Xbig <- rtmvt(n=100000, meann, covv, df, lower, upper) > colMeans(Xbig)} ) > > describe(meansBig[4,]) # describe is from Hmisc package > > meansBig[4, ] > n missing distinct Info Mean Gmd .05 .10 .25 > 100 0 100 1 34.7 0.01954 34.68 34.68 34.69 > .50 .75 .90 .95 > 34.70 34.72 34.72 34.73 > > lowest : 34.65222 34.66675 34.66703 34.66875 34.67566 > highest: 34.72939 34.73012 34.73051 34.73742 34.74441 > > > So agree, 35.31 is outside the plausible range of an RV formed with that package, but I don't have any code relating to your calculations from theory.Further investigation: covDiag <- covv*( row(covv)==col(covv) ) # just the diagonal means Repeat with all zero covariances:> meansDiag <- replicate(100, {Xbig <- rtmvt(n=100000, meann, covDiag, df, lower, upper)+ colMeans(Xbig)} )> describe(meansDiag[4,])meansDiag[4, ] n missing distinct Info Mean Gmd .05 .10 .25 100 0 100 1 35.23 0.02074 35.21 35.21 35.22 .50 .75 .90 .95 35.23 35.25 35.26 35.26 lowest : 35.18360 35.19756 35.20098 35.20179 35.20622 highest: 35.26367 35.26635 35.26791 35.27251 35.27302 So failing to account for the covariances in your theoretical calculations mostly explains the apparent discrepancy, although your value of 35.31 would be at the far end of a statistical distribution and I wonder about some sort of error in your theoretical calculation, which didn't appear to take into account the covariance matrix. Best; David.> > Best; > David. > > >> On 9 May 2017 at 22:17, David Winsemius <dwinsemius at comcast.net> wrote: >>> >>>> On May 9, 2017, at 1:11 PM, Czarek Kowalski <czarek230800 at gmail.com> wrote: >>>> >>>> Of course I have expected the difference between theory and a sample >>>> of realizations of RV's and result mean should still be a random >>>> variable. But, for example for 4th element of mean vector: 35.31 - >>>> 34.69571 = 0.61429. It is quite big difference, nieprawda?? I have >>>> expected that the difference would be smaller because of law of large >>>> numbers (for 10mln samples the difference is quite similar). >>> >>> I for one have no idea what is meant by a "4th element of mean vector". So I have now idea what to consider "big". I have found that my intuitions about multivariate distributions, especially those where the covariate structure is as complex as you have assumed, are often far from simulated results. >>> >>> I suggest you post some code and results. >>> >>> -- >>> David. >>> >>> >>>> >>>> On 9 May 2017 at 21:40, David Winsemius <dwinsemius at comcast.net> wrote: >>>>> >>>>>> On May 9, 2017, at 10:09 AM, Czarek Kowalski <czarek230800 at gmail.com> wrote: >>>>>> >>>>>> Dear Members, >>>>>> I am working with 6-dimensional Student-t distribution with 4 degrees >>>>>> of freedom truncated to [20; 60]. I have generated 100 000 samples >>>>>> from truncated multivariate Student-t distribution using rtmvt >>>>>> function from package ?tmvtnorm?. I have also calculated mean vector >>>>>> using equation (3) from attached pdf. The problem is, that after >>>>>> summing all elements in one column of rtmvt result (and dividing by >>>>>> 100 000) I do not receive the same result as using (3) equation. Could >>>>>> You tell me, what is incorrect, why there is a difference? >>>>> >>>>> I guess the question is why you would NOT expect a difference between theory and a sample of realizations of RV's? The result mean should still be a random variable, night wahr? >>>>> >>>>> >>>>>> Yours faithfully >>>>>> Czarek Kowalski >>>>>> <truncatedT.pdf>______________________________________________ >>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>>> David Winsemius >>>>> Alameda, CA, USA >>>>> >>> >>> David Winsemius >>> Alameda, CA, USA >>> > > David Winsemius > Alameda, CA, USA > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
Martin Maechler
2017-Aug-02 07:08 UTC
[R] Generating samples from truncated multivariate Student-t distribution
>>>>> David Winsemius <dwinsemius at comcast.net> >>>>> on Tue, 9 May 2017 14:33:04 -0700 writes:>> On May 9, 2017, at 2:05 PM, Czarek Kowalski <czarek230800 at gmail.com> wrote: >> >> I have already posted that in attachement - pdf file. > I see that now. I failed to scroll to the 3rd page. from a late reader: Please, Czarek, and every other future reader: This is an unwritten "regulation" about all R lists I know (a few of which I co-maintain): ___> R code must be given as unformatted plain text rather than ___> in a formatted document (pdf in your case) >> I am posting plain text here: good .. and please do always on these lists Martin Maechler, ETH Zurich & R Core >> >>> library(tmvtnorm) >>> meann = c(55, 40, 50, 35, 45, 30) >>> covv = matrix(c( 1, 1, 0, 2, -1, -1, >> 1, 16, -6, -6, -2, 12, >> 0, -6, 4, 2, -2, -5, >> 2, -6, 2, 25, 0, -17, >> -1, -2, -2, 0, 9, -5, >> -1, 12, -5, -17, -5, 36), 6, 6) >> df = 4 >> lower = c(20, 20, 20, 20, 20, 20) >> upper = c(60, 60, 60, 60, 60, 60) >> X1 <- rtmvt(n=100000, meann, covv, df, lower, upper) >> >> >>> sum(X1[,1]) / 100000 >> [1] 54.98258 >> sum(X1[,2]) / 100000 >> [1] 40.36153 >> sum(X1[,3]) / 100000 >> [1] 49.83571 >> sum(X1[,4]) / 100000 >> [1] 34.69571 # "4th element of mean vector" >> sum(X1[,5]) / 100000 >> [1] 44.81081 >> sum(X1[,6]) / 100000 >> [1] 31.10834 >> >> And corresponding results received using equation (3) from pdf file: >> [54.97, >> 40, >> 49.95, >> 35.31, # "4th element of mean vector" >> 44.94, >> 31.32] >> > I get similar results for the output from your code, > My 100-fold run of your calculations were: > meansBig <- replicate(100, {Xbig <- rtmvt(n=100000, meann, covv, df, lower, upper) > colMeans(Xbig)} ) > describe(meansBig[4,]) # describe is from Hmisc package > meansBig[4, ] > n missing distinct Info Mean Gmd .05 .10 .25 > 100 0 100 1 34.7 0.01954 34.68 34.68 34.69 > .50 .75 .90 .95 > 34.70 34.72 34.72 34.73 > lowest : 34.65222 34.66675 34.66703 34.66875 34.67566 > highest: 34.72939 34.73012 34.73051 34.73742 34.74441 > So agree, 35.31 is outside the plausible range of an RV formed with that package, but I don't have any code relating to your calculations from theory. > Best; > David. >> On 9 May 2017 at 22:17, David Winsemius <dwinsemius at comcast.net> wrote: >>> >>>> On May 9, 2017, at 1:11 PM, Czarek Kowalski <czarek230800 at gmail.com> wrote: >>>> >>>> Of course I have expected the difference between theory and a sample >>>> of realizations of RV's and result mean should still be a random >>>> variable. But, for example for 4th element of mean vector: 35.31 - >>>> 34.69571 = 0.61429. It is quite big difference, nieprawda?? I have >>>> expected that the difference would be smaller because of law of large >>>> numbers (for 10mln samples the difference is quite similar). >>> >>> I for one have no idea what is meant by a "4th element of mean vector". So I have now idea what to consider "big". I have found that my intuitions about multivariate distributions, especially those where the covariate structure is as complex as you have assumed, are often far from simulated results. >>> >>> I suggest you post some code and results. >>> >>> -- >>> David. >>> >>> >>>> >>>> On 9 May 2017 at 21:40, David Winsemius <dwinsemius at comcast.net> wrote: >>>>>>>>>> On May 9, 2017, at 10:09 AM, Czarek Kowalski <czarek230800 at gmail.com> wrote: >>>>> >>>>> Dear Members, >>>>> I am working with 6-dimensional Student-t distribution with 4 degrees >>>>> of freedom truncated to [20; 60]. I have generated 100 000 samples >>>>> from truncated multivariate Student-t distribution using rtmvt >>>>> function from package ?tmvtnorm?. I have also calculated mean vector >>>>> using equation (3) from attached pdf. The problem is, that after >>>>> summing all elements in one column of rtmvt result (and dividing by >>>>> 100 000) I do not receive the same result as using (3) equation. Could >>>>> You tell me, what is incorrect, why there is a difference?>>>>> >>>>> I guess the question is why you would NOT expect a difference between theory and a sample of realizations of RV's? The result mean should still be a random variable, night wahr? >>>>> >>>>>>>>>> Yours faithfully >>>>> Czarek Kowalski >>>>> <truncatedT.pdf>______________________________________________ >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code.>>>>> >>>>> David Winsemius >>>>> Alameda, CA, USA >>>>> >>> >>> David Winsemius >>> Alameda, CA, USA >>> > David Winsemius > Alameda, CA, USA > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.