thr3ads.net - R devel - [R] Bug : Autocorrelation in sample drawn from stats::rnorm (hmh) [Oct 2018]

If this information is useful, please help other people find it:
Share via:

Annaert Jan

2018-Oct-05 07:58 UTC

[R] Bug : Autocorrelation in sample drawn from stats::rnorm (hmh)

On 05/10/2018, 09:45, "R-help on behalf of hmh" <r-help-bounces at
r-project.org on behalf of hugomh at gmx.fr> wrote:

    Hi,
    
    Thanks William for this fast answer, and sorry for sending the 1st mail 
    to r-help instead to r-devel.
    
    
    I noticed that bug while I was simulating many small random walks using 
    c(0,cumsum(rnorm(10))). Then the negative auto-correlation was inducing 
    a muchsmaller space visited by the random walks than expected if there 
    would be no auto-correlation in the samples.
    
    
    The code I provided and you optimized was only provided to illustrated 
    and investigate that bug.
    
    
    It is really worrying that most of the R distributions are affected by 
    this bug !!!!
    
    What I did should have been one of the first check done for _*each*_ 
    distributions by the developers of these functions !
    
    
    And if as you suggested this is a "tolerated" _error_ of the
algorithm,
    I do think this is a bad choice, but any way, this should have been 
    mentioned in the documentations of the functions !!
    
    
    cheers,
    
    hugo
 
This is not a bug. You have simply rediscovered the finite-sample bias in the
sample autocorrelation coefficient, known at least since
Kendall, M. G. (1954). Note on bias in the estimation of autocorrelation.
Biometrika, 41(3-4), 403-404.

The bias is approximately -1/T, with T sample size, which explains why it seems
to disappear in the larger sample sizes you consider.

Jan

hmh

2018-Oct-05 08:11 UTC

head link

[R] Bug : Autocorrelation in sample drawn from stats::rnorm (hmh)

Nope.

This IS a bug:

_*The negative auto-correlation mostly disappear when I randomize small 
samples using the R function '*__*sample*__*'.*_

Please check thoroughly the code of the 1st mail I sent, there should be 
no difference between the two R functions I wrote to illustrate the bug.

The two functions that should produce the same output if there would be 
no bug are 'DistributionAutocorrelation_Unexpected' and 
'DistributionAutocorrelation_Expected'.

_/Please take the time to compare there output!!/_

The finite-sample bias in the sample autocorrelation coefficient you 
mention should affect them in the same manner. This bias is not the only 
phenomenon at work, *_there is ALSO as BUG !_*

Thanks

The first mail I sent is below :

_ _ _

Hi,

I just noticed the following bug:

When we draw a random sample using the function stats::rnorm, there 
should be not auto-correlation in the sample. But their is some 
auto-correlation _when the sample that is drawn is small_.

I describe the problem using two functions:

DistributionAutocorrelation_Unexpected which as the wrong behavior : 
_when drawing some small samples using rnorm, there is generally a 
strong negative auto-correlation in the sample_.

and

DistributionAutocorrelation_Expected which illustrate the expected behavior

*Unexpected : *

DistributionAutocorrelation_Unexpected = function(SampleSize){
 ? Cor = NULL
 ? for(repetition in 1:1e5){
 ??? X = rnorm(SampleSize)
 ??? Cor[repetition] = cor(X[-1],X[-length(X)])
 ? }
 ? return(Cor)
}

par(mfrow=c(3,3))
for(SampleSize_ in c(4,5,6,7,8,10,15,20,50)){
hist(DistributionAutocorrelation_Unexpected(SampleSize_),col='grey',main=paste0('SampleSize=',SampleSize_))
; abline(v=0,col=2)
}

output:

*Expected**:*

DistributionAutocorrelation_Expected = function(SampleSize){
 ? Cor = NULL
 ? for(repetition in 1:1e5){
 ??? X = rnorm(SampleSize)
*??? Cor[repetition] = cor(sample(X[-1]),sample(X[-length(X)]))*
 ? }
 ? return(Cor)
}

par(mfrow=c(3,3))
for(SampleSize_ in c(4,5,6,7,8,10,15,20,50)){
hist(DistributionAutocorrelation_Expected(SampleSize_),col='grey',main=paste0('SampleSize=',SampleSize_))
; abline(v=0,col=2)
}

Some more information you might need:

packageDescription("stats")
Package: stats
Version: 3.5.1
Priority: base
Title: The R Stats Package
Author: R Core Team and contributors worldwide
Maintainer: R Core Team <R-core at r-project.org>
Description: R statistical functions.
License: Part of R 3.5.1
Imports: utils, grDevices, graphics
Suggests: MASS, Matrix, SuppDists, methods, stats4
NeedsCompilation: yes
Built: R 3.5.1; x86_64-pc-linux-gnu; 2018-07-03 02:12:37 UTC; unix

Thanks for correcting that.

fill free to ask any further information you would need.

cheers,

hugo

On 05/10/2018 09:58, Annaert Jan wrote:> On 05/10/2018, 09:45, "R-help on behalf of hmh"
<r-help-bounces at r-project.org on behalf of hugomh at gmx.fr> wrote:
>
>      Hi,
>      
>      Thanks William for this fast answer, and sorry for sending the 1st
mail
>      to r-help instead to r-devel.
>      
>      
>      I noticed that bug while I was simulating many small random walks
using
>      c(0,cumsum(rnorm(10))). Then the negative auto-correlation was
inducing
>      a muchsmaller space visited by the random walks than expected if there
>      would be no auto-correlation in the samples.
>      
>      
>      The code I provided and you optimized was only provided to illustrated
>      and investigate that bug.
>      
>      
>      It is really worrying that most of the R distributions are affected by
>      this bug !!!!
>      
>      What I did should have been one of the first check done for _*each*_
>      distributions by the developers of these functions !
>      
>      
>      And if as you suggested this is a "tolerated" _error_ of the
algorithm,
>      I do think this is a bad choice, but any way, this should have been
>      mentioned in the documentations of the functions !!
>      
>      
>      cheers,
>      
>      hugo
>   
> This is not a bug. You have simply rediscovered the finite-sample bias in
the sample autocorrelation coefficient, known at least since
> Kendall, M. G. (1954). Note on bias in the estimation of autocorrelation.
Biometrika, 41(3-4), 403-404.
>
> The bias is approximately -1/T, with T sample size, which explains why it
seems to disappear in the larger sample sizes you consider.
>
> Jan
>
-- 
- no title specified

Hugo Math?-Hubert

ATER

Laboratoire Interdisciplinaire des Environnements Continentaux (LIEC)

UMR 7360 CNRS - ?B?t IBISE

Universit? de Lorraine ?- ?UFR SciFA

8, Rue du G?n?ral Delestraint

F-57070 METZ

+33(0)9 77 21 66 66
- - - - - - - - - - - - - - - - - -
Les r?flexions naissent dans les doutes et meurent dans les certitudes. 
Les doutes sont donc un signe de force et les certitudes un signe de 
faiblesse. La plupart des gens sont pourtant certains du contraire.
- - - - - - - - - - - - - - - - - -
Thoughts appear from doubts and die in convictions. Therefore, doubts 
are an indication of strength and convictions an indication of weakness. 
Yet, most people believe the opposite.

Annaert Jan

2018-Oct-05 08:28 UTC

head link

[R] Bug : Autocorrelation in sample drawn from stats::rnorm (hmh)

> Nope.
> This IS a bug:
> The negative auto-correlation mostly disappear when I randomize small
samples using the R function 'sample'.
> Please check thoroughly the code of the 1st mail I sent, there should be no
difference between the two R functions I wrote to illustrate the bug.
> The two functions that should produce the same output if there would be no
bug are 'DistributionAutocorrelation_Unexpected' and 
'DistributionAutocorrelation_Expected'.
>Please take the time to compare there output!!
>The finite-sample bias in the sample autocorrelation coefficient you mention
should affect them in the same manner. This bias is not the only phenomenon at
work, there is ALSO as BUG !
I disagree. Take a look at your code:
Cor[repetition] = cor(sample(X[-1]),sample(X[-length(X)]))

By sampling the two series in the correlation function, you discard any time
series structure; you are no longer estimating a serial correlation coefficient,
but just a correlation (which in this case is unbiased).
Try out the following:

Xs <- sample(X)
Cor[repetition] = cor(Xs[-1]),(Xs[-length(Xs)]))

The bias should reappear.

Jan


-------------- next part --------------
A non-text attachment was scrubbed...
Name: cclhgmpcmhoinmca.png
Type: image/png
Size: 76147 bytes
Desc: cclhgmpcmhoinmca.png
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20181005/1c25e7f4/attachment-0004.png>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: adeokhkijhbomjkp.png
Type: image/png
Size: 75264 bytes
Desc: adeokhkijhbomjkp.png
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20181005/1c25e7f4/attachment-0005.png>

R devel - Oct 2018 - Bug : Autocorrelation in sample drawn from stats::rnorm (hmh)

[R] Bug : Autocorrelation in sample drawn from stats::rnorm (hmh)

[R] Bug : Autocorrelation in sample drawn from stats::rnorm (hmh)

[R] Bug : Autocorrelation in sample drawn from stats::rnorm (hmh)