thr3ads.net - R help - [R] Generating samples from truncated multivariate Student-t distribution [Aug 2017]

If this information is useful, please help other people find it:
Share via:

Czarek Kowalski

2017-May-09 21:05 UTC

[R] Generating samples from truncated multivariate Student-t distribution

I have already posted that in attachement - pdf file. I am posting
plain text here:
> library(tmvtnorm)
> meann = c(55, 40, 50, 35, 45, 30)
> covv = matrix(c(  1, 1, 0, 2, -1, -1,
+                   1, 16, -6, -6, -2, 12,

+                   0, -6, 4, 2, -2, -5,

+                   2, -6, 2, 25, 0, -17,

+                  -1, -2, -2, 0, 9, -5,

+                  -1, 12, -5, -17, -5, 36), 6, 6)
> df = 4
> lower = c(20, 20, 20, 20, 20, 20)
> upper = c(60, 60, 60, 60, 60, 60)
> X1 <- rtmvt(n=100000, meann, covv, df, lower, upper)



> sum(X1[,1]) / 100000
[1] 54.98258
> sum(X1[,2]) / 100000
[1] 40.36153
> sum(X1[,3]) / 100000
[1] 49.83571
> sum(X1[,4]) / 100000
[1] 34.69571      # "4th element of mean vector"
> sum(X1[,5]) / 100000
[1] 44.81081
> sum(X1[,6]) / 100000
[1] 31.10834




And corresponding results received using equation (3) from pdf file:
[54.97,
40,
49.95,
35.31, #  "4th element of mean vector"
44.94,
31.32]

On 9 May 2017 at 22:17, David Winsemius <dwinsemius at comcast.net>
wrote:>
>> On May 9, 2017, at 1:11 PM, Czarek Kowalski <czarek230800 at
gmail.com> wrote:
>>
>> Of course I have expected the difference between theory and a sample
>> of realizations of RV's and result mean should still be a random
>> variable. But, for example for 4th element of mean vector: 35.31 -
>> 34.69571 = 0.61429. It is quite big difference, nieprawda?? I have
>> expected that the difference would be smaller because of law of large
>> numbers (for 10mln samples the difference is quite similar).
>
> I for one have no idea what is meant by a "4th element of mean
vector". So I have now idea what to consider "big". I have found
that my intuitions about multivariate distributions, especially those where the
covariate structure is as complex as you have assumed, are often far from
simulated results.
>
> I suggest you post some code and results.
>
> --
> David.
>
>
>>
>> On 9 May 2017 at 21:40, David Winsemius <dwinsemius at
comcast.net> wrote:
>>>
>>>> On May 9, 2017, at 10:09 AM, Czarek Kowalski <czarek230800
at gmail.com> wrote:
>>>>
>>>> Dear Members,
>>>> I am working with 6-dimensional Student-t distribution with 4
degrees
>>>> of freedom truncated to [20; 60]. I have generated 100 000
samples
>>>> from truncated multivariate Student-t distribution using rtmvt
>>>> function from package ?tmvtnorm?. I have also calculated  mean
vector
>>>> using equation (3) from attached pdf. The problem is, that
after
>>>> summing all elements in one column of rtmvt result (and
dividing by
>>>> 100 000) I do not receive the same result as using (3)
equation. Could
>>>> You tell me, what is incorrect, why there is a difference?
>>>
>>> I guess the question is why you would NOT expect a difference
between theory and a sample of realizations of RV's? The result mean should
still be a random variable, night wahr?
>>>
>>>
>>>> Yours faithfully
>>>> Czarek Kowalski
>>>>
<truncatedT.pdf>______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>
>>> David Winsemius
>>> Alameda, CA, USA
>>>
>
> David Winsemius
> Alameda, CA, USA
>

David Winsemius

2017-May-09 21:33 UTC

head link

[R] Generating samples from truncated multivariate Student-t distribution

> On May 9, 2017, at 2:05 PM, Czarek Kowalski <czarek230800 at
gmail.com> wrote:
> 
> I have already posted that in attachement - pdf file.
I see that now. I failed to scroll to the 3rd page.
> I am posting
> plain text here:
> 
>> library(tmvtnorm)
>> meann = c(55, 40, 50, 35, 45, 30)
>> covv = matrix(c(  1, 1, 0, 2, -1, -1,
>                    1, 16, -6, -6, -2, 12,
>                    0, -6, 4, 2, -2, -5,
>                    2, -6, 2, 25, 0, -17,
>                   -1, -2, -2, 0, 9, -5,
>                   -1, 12, -5, -17, -5, 36), 6, 6)
> df = 4
> lower = c(20, 20, 20, 20, 20, 20)
> upper = c(60, 60, 60, 60, 60, 60)
> X1 <- rtmvt(n=100000, meann, covv, df, lower, upper)
> 
> 
>> sum(X1[,1]) / 100000
> [1] 54.98258
> sum(X1[,2]) / 100000
> [1] 40.36153
> sum(X1[,3]) / 100000
> [1] 49.83571
> sum(X1[,4]) / 100000
> [1] 34.69571      # "4th element of mean vector"
> sum(X1[,5]) / 100000
> [1] 44.81081
> sum(X1[,6]) / 100000
> [1] 31.10834
> 
> And corresponding results received using equation (3) from pdf file:
> [54.97,
> 40,
> 49.95,
> 35.31, #  "4th element of mean vector"
> 44.94,
> 31.32]
> 
I get similar results for the output from your code, 

My 100-fold run of your calculations were:

meansBig <- replicate(100, {Xbig <- rtmvt(n=100000, meann, covv, df,
lower, upper)
colMeans(Xbig)} )

describe(meansBig[4,])  # describe is from Hmisc package

meansBig[4, ] 
       n  missing distinct     Info     Mean      Gmd      .05      .10      .25
     100        0      100        1     34.7  0.01954    34.68    34.68    34.69
     .50      .75      .90      .95 
   34.70    34.72    34.72    34.73 

lowest : 34.65222 34.66675 34.66703 34.66875 34.67566
highest: 34.72939 34.73012 34.73051 34.73742 34.74441


So agree, 35.31 is outside the plausible range of an RV formed with that
package, but I don't have any code relating to your calculations from
theory.

Best;
David.

> On 9 May 2017 at 22:17, David Winsemius <dwinsemius at comcast.net>
wrote:
>> 
>>> On May 9, 2017, at 1:11 PM, Czarek Kowalski <czarek230800 at
gmail.com> wrote:
>>> 
>>> Of course I have expected the difference between theory and a
sample
>>> of realizations of RV's and result mean should still be a
random
>>> variable. But, for example for 4th element of mean vector: 35.31 -
>>> 34.69571 = 0.61429. It is quite big difference, nieprawda?? I have
>>> expected that the difference would be smaller because of law of
large
>>> numbers (for 10mln samples the difference is quite similar).
>> 
>> I for one have no idea what is meant by a "4th element of mean
vector". So I have now idea what to consider "big". I have found
that my intuitions about multivariate distributions, especially those where the
covariate structure is as complex as you have assumed, are often far from
simulated results.
>> 
>> I suggest you post some code and results.
>> 
>> --
>> David.
>> 
>> 
>>> 
>>> On 9 May 2017 at 21:40, David Winsemius <dwinsemius at
comcast.net> wrote:
>>>> 
>>>>> On May 9, 2017, at 10:09 AM, Czarek Kowalski
<czarek230800 at gmail.com> wrote:
>>>>> 
>>>>> Dear Members,
>>>>> I am working with 6-dimensional Student-t distribution with
4 degrees
>>>>> of freedom truncated to [20; 60]. I have generated 100 000
samples
>>>>> from truncated multivariate Student-t distribution using
rtmvt
>>>>> function from package ?tmvtnorm?. I have also calculated 
mean vector
>>>>> using equation (3) from attached pdf. The problem is, that
after
>>>>> summing all elements in one column of rtmvt result (and
dividing by
>>>>> 100 000) I do not receive the same result as using (3)
equation. Could
>>>>> You tell me, what is incorrect, why there is a difference?
>>>> 
>>>> I guess the question is why you would NOT expect a difference
between theory and a sample of realizations of RV's? The result mean should
still be a random variable, night wahr?
>>>> 
>>>> 
>>>>> Yours faithfully
>>>>> Czarek Kowalski
>>>>>
<truncatedT.pdf>______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>> 
>>>> David Winsemius
>>>> Alameda, CA, USA
>>>> 
>> 
>> David Winsemius
>> Alameda, CA, USA
>> 
David Winsemius
Alameda, CA, USA

David Winsemius

2017-May-09 21:50 UTC

head link

[R] Generating samples from truncated multivariate Student-t distribution

> On May 9, 2017, at 2:33 PM, David Winsemius <dwinsemius at
comcast.net> wrote:
> 
> 
>> On May 9, 2017, at 2:05 PM, Czarek Kowalski <czarek230800 at
gmail.com> wrote:
>> 
>> I have already posted that in attachement - pdf file.
> 
> I see that now. I failed to scroll to the 3rd page.
> 
>> I am posting
>> plain text here:
>> 
>>> library(tmvtnorm)
>>> meann = c(55, 40, 50, 35, 45, 30)
>>> covv = matrix(c(  1, 1, 0, 2, -1, -1,
>>                   1, 16, -6, -6, -2, 12,
>>                   0, -6, 4, 2, -2, -5,
>>                   2, -6, 2, 25, 0, -17,
>>                  -1, -2, -2, 0, 9, -5,
>>                  -1, 12, -5, -17, -5, 36), 6, 6)
>> df = 4
>> lower = c(20, 20, 20, 20, 20, 20)
>> upper = c(60, 60, 60, 60, 60, 60)
>> X1 <- rtmvt(n=100000, meann, covv, df, lower, upper)
>> 
>> 
>>> sum(X1[,1]) / 100000
>> [1] 54.98258
>> sum(X1[,2]) / 100000
>> [1] 40.36153
>> sum(X1[,3]) / 100000
>> [1] 49.83571
>> sum(X1[,4]) / 100000
>> [1] 34.69571      # "4th element of mean vector"
>> sum(X1[,5]) / 100000
>> [1] 44.81081
>> sum(X1[,6]) / 100000
>> [1] 31.10834
>> 
>> And corresponding results received using equation (3) from pdf file:
>> [54.97,
>> 40,
>> 49.95,
>> 35.31, #  "4th element of mean vector"
>> 44.94,
>> 31.32]
>> 
> 
> I get similar results for the output from your code, 
> 
> My 100-fold run of your calculations were:
> 
> meansBig <- replicate(100, {Xbig <- rtmvt(n=100000, meann, covv, df,
lower, upper)
> colMeans(Xbig)} )
> 
> describe(meansBig[4,])  # describe is from Hmisc package
> 
> meansBig[4, ] 
>       n  missing distinct     Info     Mean      Gmd      .05      .10     
.25
>     100        0      100        1     34.7  0.01954    34.68    34.68   
34.69
>     .50      .75      .90      .95 
>   34.70    34.72    34.72    34.73 
> 
> lowest : 34.65222 34.66675 34.66703 34.66875 34.67566
> highest: 34.72939 34.73012 34.73051 34.73742 34.74441
> 
> 
> So agree, 35.31 is outside the plausible range of an RV formed with that
package, but I don't have any code relating to your calculations from
theory.
Further investigation:

covDiag <- covv*( row(covv)==col(covv) )  # just the diagonal means

Repeat with all zero covariances:
> meansDiag <- replicate(100, {Xbig <- rtmvt(n=100000, meann, covDiag,
df, lower, upper)
+ colMeans(Xbig)} )> describe(meansDiag[4,])meansDiag[4, ] 
       n  missing distinct     Info     Mean      Gmd      .05      .10      .25
     100        0      100        1    35.23  0.02074    35.21    35.21    35.22
     .50      .75      .90      .95 
   35.23    35.25    35.26    35.26 

lowest : 35.18360 35.19756 35.20098 35.20179 35.20622
highest: 35.26367 35.26635 35.26791 35.27251 35.27302

So failing to account for the covariances in your theoretical calculations
mostly explains the apparent discrepancy, although your value of 35.31 would be
at the  far end of a statistical distribution and I wonder about some sort of
error in your theoretical calculation, which didn't appear to take into
account the covariance matrix.

Best;
David.


> 
> Best;
> David.
> 
> 
>> On 9 May 2017 at 22:17, David Winsemius <dwinsemius at
comcast.net> wrote:
>>> 
>>>> On May 9, 2017, at 1:11 PM, Czarek Kowalski <czarek230800 at
gmail.com> wrote:
>>>> 
>>>> Of course I have expected the difference between theory and a
sample
>>>> of realizations of RV's and result mean should still be a
random
>>>> variable. But, for example for 4th element of mean vector:
35.31 -
>>>> 34.69571 = 0.61429. It is quite big difference, nieprawda?? I
have
>>>> expected that the difference would be smaller because of law of
large
>>>> numbers (for 10mln samples the difference is quite similar).
>>> 
>>> I for one have no idea what is meant by a "4th element of mean
vector". So I have now idea what to consider "big". I have found
that my intuitions about multivariate distributions, especially those where the
covariate structure is as complex as you have assumed, are often far from
simulated results.
>>> 
>>> I suggest you post some code and results.
>>> 
>>> --
>>> David.
>>> 
>>> 
>>>> 
>>>> On 9 May 2017 at 21:40, David Winsemius <dwinsemius at
comcast.net> wrote:
>>>>> 
>>>>>> On May 9, 2017, at 10:09 AM, Czarek Kowalski
<czarek230800 at gmail.com> wrote:
>>>>>> 
>>>>>> Dear Members,
>>>>>> I am working with 6-dimensional Student-t distribution
with 4 degrees
>>>>>> of freedom truncated to [20; 60]. I have generated 100
000 samples
>>>>>> from truncated multivariate Student-t distribution
using rtmvt
>>>>>> function from package ?tmvtnorm?. I have also
calculated  mean vector
>>>>>> using equation (3) from attached pdf. The problem is,
that after
>>>>>> summing all elements in one column of rtmvt result (and
dividing by
>>>>>> 100 000) I do not receive the same result as using (3)
equation. Could
>>>>>> You tell me, what is incorrect, why there is a
difference?
>>>>> 
>>>>> I guess the question is why you would NOT expect a
difference between theory and a sample of realizations of RV's? The result
mean should still be a random variable, night wahr?
>>>>> 
>>>>> 
>>>>>> Yours faithfully
>>>>>> Czarek Kowalski
>>>>>>
<truncatedT.pdf>______________________________________________
>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE
and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>> 
>>>>> David Winsemius
>>>>> Alameda, CA, USA
>>>>> 
>>> 
>>> David Winsemius
>>> Alameda, CA, USA
>>> 
> 
> David Winsemius
> Alameda, CA, USA
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA

Martin Maechler

2017-Aug-02 07:08 UTC

head link

[R] Generating samples from truncated multivariate Student-t distribution

>>>>> David Winsemius <dwinsemius at comcast.net>
>>>>>     on Tue, 9 May 2017 14:33:04 -0700 writes:
    >> On May 9, 2017, at 2:05 PM, Czarek Kowalski <czarek230800 at
gmail.com> wrote:
    >> 
    >> I have already posted that in attachement - pdf file.

    > I see that now. I failed to scroll to the 3rd page.

from a late reader:

Please, Czarek, and every other future reader:   This is an
unwritten "regulation" about all R lists I know (a few of which
I co-maintain): 

___>  R code must be given as unformatted plain text  rather than
___>  in a formatted document (pdf in your case) 


    >> I am posting plain text here:

good .. and please do always on these lists


Martin Maechler, ETH Zurich & R Core


    >> 
    >>> library(tmvtnorm)
    >>> meann = c(55, 40, 50, 35, 45, 30)
    >>> covv = matrix(c(  1, 1, 0, 2, -1, -1,
    >> 1, 16, -6, -6, -2, 12,
    >> 0, -6, 4, 2, -2, -5,
    >> 2, -6, 2, 25, 0, -17,
    >> -1, -2, -2, 0, 9, -5,
    >> -1, 12, -5, -17, -5, 36), 6, 6)
    >> df = 4
    >> lower = c(20, 20, 20, 20, 20, 20)
    >> upper = c(60, 60, 60, 60, 60, 60)
    >> X1 <- rtmvt(n=100000, meann, covv, df, lower, upper)
    >> 
    >> 
    >>> sum(X1[,1]) / 100000
    >> [1] 54.98258
    >> sum(X1[,2]) / 100000
    >> [1] 40.36153
    >> sum(X1[,3]) / 100000
    >> [1] 49.83571
    >> sum(X1[,4]) / 100000
    >> [1] 34.69571      # "4th element of mean vector"
    >> sum(X1[,5]) / 100000
    >> [1] 44.81081
    >> sum(X1[,6]) / 100000
    >> [1] 31.10834
    >> 
    >> And corresponding results received using equation (3) from pdf
file:
    >> [54.97,
    >> 40,
    >> 49.95,
    >> 35.31, #  "4th element of mean vector"
    >> 44.94,
    >> 31.32]
    >> 

    > I get similar results for the output from your code, 

    > My 100-fold run of your calculations were:

    > meansBig <- replicate(100, {Xbig <- rtmvt(n=100000, meann, covv,
df, lower, upper)
    > colMeans(Xbig)} )

    > describe(meansBig[4,])  # describe is from Hmisc package

    > meansBig[4, ] 
    > n  missing distinct     Info     Mean      Gmd      .05      .10     
.25
    > 100        0      100        1     34.7  0.01954    34.68    34.68   
34.69
    > .50      .75      .90      .95 
    > 34.70    34.72    34.72    34.73 

    > lowest : 34.65222 34.66675 34.66703 34.66875 34.67566
    > highest: 34.72939 34.73012 34.73051 34.73742 34.74441


    > So agree, 35.31 is outside the plausible range of an RV formed with
that package, but I don't have any code relating to your calculations from
theory.

    > Best;
    > David.


    >> On 9 May 2017 at 22:17, David Winsemius <dwinsemius at
comcast.net> wrote:
    >>> 
    >>>> On May 9, 2017, at 1:11 PM, Czarek Kowalski
<czarek230800 at gmail.com> wrote:
    >>>> 
    >>>> Of course I have expected the difference between theory and
a sample
    >>>> of realizations of RV's and result mean should still be
a random
    >>>> variable. But, for example for 4th element of mean vector:
35.31 -
    >>>> 34.69571 = 0.61429. It is quite big difference, nieprawda??
I have
    >>>> expected that the difference would be smaller because of
law of large
    >>>> numbers (for 10mln samples the difference is quite
similar).
    >>> 
    >>> I for one have no idea what is meant by a "4th element of
mean vector". So I have now idea what to consider "big". I have
found that my intuitions about multivariate distributions, especially those
where the covariate structure is as complex as you have assumed, are often far
from simulated results.
    >>> 
    >>> I suggest you post some code and results.
    >>> 
    >>> --
    >>> David.
    >>> 
    >>> 
    >>>> 
    >>>> On 9 May 2017 at 21:40, David Winsemius <dwinsemius at
comcast.net> wrote:
    >>>>> >>>>> On May 9, 2017, at 10:09 AM, Czarek Kowalski
<czarek230800 at gmail.com> wrote:
>>>>> 
>>>>> Dear Members,
>>>>> I am working with 6-dimensional Student-t distribution with
4 degrees
>>>>> of freedom truncated to [20; 60]. I have generated 100 000
samples
>>>>> from truncated multivariate Student-t distribution using
rtmvt
>>>>> function from package ?tmvtnorm?. I have also calculated 
mean vector
>>>>> using equation (3) from attached pdf. The problem is, that
after
>>>>> summing all elements in one column of rtmvt result (and
dividing by
>>>>> 100 000) I do not receive the same result as using (3)
equation. Could
>>>>> You tell me, what is incorrect, why there is a difference?    >>>>> 
    >>>>> I guess the question is why you would NOT expect a
difference between theory and a sample of realizations of RV's? The result
mean should still be a random variable, night wahr?
    >>>>> 
    >>>>> >>>>> Yours faithfully
>>>>> Czarek Kowalski
>>>>>
<truncatedT.pdf>______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained,
reproducible code.    >>>>> 
    >>>>> David Winsemius
    >>>>> Alameda, CA, USA
    >>>>> 
    >>> 
    >>> David Winsemius
    >>> Alameda, CA, USA
    >>> 

    > David Winsemius
    > Alameda, CA, USA

    > ______________________________________________
    > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    > https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.

Apparently Analagous Threads

Search for more maybe matching threads

R help - Aug 2017 - Generating samples from truncated multivariate Student-t distribution

[R] Generating samples from truncated multivariate Student-t distribution

[R] Generating samples from truncated multivariate Student-t distribution

[R] Generating samples from truncated multivariate Student-t distribution

[R] Generating samples from truncated multivariate Student-t distribution

Apparently Analagous Threads