thr3ads.net - R help - [R] Bug : Autocorrelation in sample drawn from stats::rnorm (hmh) [Oct 2018]

If this information is useful, please help other people find it:
Share via:

William Bell

2018-Oct-04 23:52 UTC

[R] Bug : Autocorrelation in sample drawn from stats::rnorm (hmh)

Hi Hugo,
I've been able to replicate your bug, including for other distributions
(runif, rexp, rgamma, etc) which shouldn't be surprising since they're
probably all drawing from the same pseudo-random number generator.
?Interestingly, it does not seem to depend on the choice of seed, I am not sure
why that is the case.
I'll point out first of all that the R-devel mailing list is perhaps better
suited for this query, I'm fairly sure we're supposed to direct bug
reports, etc there.
It is possible this is a known quantity but is tolerated, I could think of many
reasons why that might be the case, not least of which being that as far as I
know, the vast majority of Monte Carlo methods involve >>40 trials (which
seems to be enough for the effect to disappear), with the possible exception of
procedures for testing the power of statistical tests on small samples?
There might be more to be said, but I thought I'd just add what I could from
playing around with it a little bit.
For anyone who wishes to give it a try, I suggest this implementation of the
autocorrelation tester which is about 80 times faster:
DistributionAutocorrelation_new <- function(SampleSize)????{
? ? Cor <- replicate(1e5, function() {X <- rnorm(SampleSize)? ?
return(cor(X[-1], X[-length(X)]))})? ? return(Cor)}
I have the same Stats package version installed.
- (Thomas) William BellHons BSc Candidate (Biology and Mathematics)BA Candidate
(Philosophy)McMaster University
# Hi,#?#?# I just noticed the following bug:# ??# ? When we draw a random sample
using the function stats::rnorm, there?# should be not auto-correlation in the
sample. But their is some?# auto-correlation _when the sample that is drawn is
small_.#?# I describe the problem using two functions:# ??# ?
DistributionAutocorrelation_Unexpected which as the wrong behavior :?# ? _when
drawing some small samples using rnorm, there is generally a?# strong negative
auto-correlation in the sample_.#?# and#?# DistributionAutocorrelation_Expected
which illustrate the expected behavior#?#?#?# *Unexpected : *# ??# ?
DistributionAutocorrelation_Unexpected = function(SampleSize){# ? ? Cor = NULL#
? ? for(repetition in 1:1e5){# ? ? ? X = rnorm(SampleSize)# ? ? ?
Cor[repetition] = cor(X[-1],X[-length(X)])# ? ? }# ? ? return(Cor)# ? }#?#
par(mfrow=c(3,3))# for(SampleSize_ in c(4,5,6,7,8,10,15,20,50)){# ?
hist(DistributionAutocorrelation_Unexpected(SampleSize_),col='grey',main=paste0('SampleSize=',SampleSize_))?#
? ; abline(v=0,col=2)# }#?# output:# ??# ??# ? *Expected**:*# ??# ?
DistributionAutocorrelation_Expected = function(SampleSize){# ? ? Cor = NULL# ?
? for(repetition in 1:1e5){# ? ? ? X = rnorm(SampleSize)# ? ? ? * ?
?Cor[repetition] = cor(sample(X[-1]),sample(X[-length(X)]))*# ? ? }# ? ?
return(Cor)# ? }#?# par(mfrow=c(3,3))# for(SampleSize_ in
c(4,5,6,7,8,10,15,20,50)){# ?
hist(DistributionAutocorrelation_Expected(SampleSize_),col='grey',main=paste0('SampleSize=',SampleSize_))?#
? ; abline(v=0,col=2)# }#?#?#?#?# Some more information you might need:# ??# ??#
? packageDescription("stats")# Package: stats# Version: 3.5.1#
Priority: base# Title: The R Stats Package# Author: R Core Team and contributors
worldwide# Maintainer: R Core Team <R-core at r-project.org># ?
Description: R statistical functions.# License: Part of R 3.5.1# Imports: utils,
grDevices, graphics# Suggests: MASS, Matrix, SuppDists, methods, stats4#
NeedsCompilation: yes# Built: R 3.5.1; x86_64-pc-linux-gnu; 2018-07-03 02:12:37
UTC; unix#?# Thanks for correcting that.#?# fill free to ask any further
information you would need.#?# cheers,#?# hugo#?#?# --?# ? - no title
specified#?# Hugo Math?-Hubert#?# ATER#?# Laboratoire Interdisciplinaire des
Environnements Continentaux (LIEC)#?# UMR 7360 CNRS - ?B?t IBISE#?# Universit?
de Lorraine ?- ?UFR SciFA#?# 8, Rue du G?n?ral Delestraint#?# F-57070 METZ#?#
+33(0)9 77 21 66 66# - - - - - - - - - - - - - - - - - -# ? Les r?flexions
naissent dans les doutes et meurent dans les certitudes.?# Les doutes sont donc
un signe de force et les certitudes un signe de?# faiblesse. La plupart des gens
sont pourtant certains du contraire.# - - - - - - - - - - - - - - - - - -# ?
Thoughts appear from doubts and die in convictions. Therefore, doubts?# are an
indication of strength and convictions an indication of weakness.?# Yet, most
people believe the opposite.
	[[alternative HTML version deleted]]

hmh

2018-Oct-05 07:45 UTC

head link

[R] Bug : Autocorrelation in sample drawn from stats::rnorm (hmh)

Hi,

Thanks William for this fast answer, and sorry for sending the 1st mail 
to r-help instead to r-devel.


I noticed that bug while I was simulating many small random walks using 
c(0,cumsum(rnorm(10))). Then the negative auto-correlation was inducing 
a muchsmaller space visited by the random walks than expected if there 
would be no auto-correlation in the samples.


The code I provided and you optimized was only provided to illustrated 
and investigate that bug.


It is really worrying that most of the R distributions are affected by 
this bug !!!!

What I did should have been one of the first check done for _*each*_ 
distributions by the developers of these functions !


And if as you suggested this is a "tolerated" _error_ of the
algorithm,
I do think this is a bad choice, but any way, this should have been 
mentioned in the documentations of the functions !!


cheers,

hugo


On 05/10/2018 01:52, William Bell wrote:> Hi Hugo,
>
> I've been able to replicate your bug, including for other 
> distributions (runif, rexp, rgamma, etc) which shouldn't be surprising 
> since they're probably all drawing from the same pseudo-random number 
> generator. ?Interestingly, it does not seem to depend on the choice of 
> seed, I am not sure why that is the case.
>
> I'll point out first of all that the R-devel mailing list is perhaps 
> better suited for this query, I'm fairly sure we're supposed to
direct
> bug reports, etc there.
>
> It is possible this is a known quantity but is tolerated, I could 
> think of many reasons why that might be the case, not least of which 
> being that as far as I know, the vast majority of Monte Carlo methods 
> involve >>40 trials (which seems to be enough for the effect to 
> disappear), with the possible exception of procedures for testing the 
> power of statistical tests on small samples?
>
> There might be more to be said, but I thought I'd just add what I 
> could from playing around with it a little bit.
>
> For anyone who wishes to give it a try, I suggest this implementation 
> of the autocorrelation tester which is about 80 times faster:
>
> DistributionAutocorrelation_new <- function(SampleSize)????{
> ? ? Cor <- replicate(1e5, function() {X <- rnorm(SampleSize)
> ? ? return(cor(X[-1], X[-length(X)]))})
> ? ? return(Cor)
> }
>
> I have the same Stats package version installed.
>
> - (Thomas) William Bell
> Hons BSc Candidate (Biology and Mathematics)
> BA Candidate (Philosophy)
> McMaster University
>
> # Hi,
> #
> #
> # I just noticed the following bug:
> #
> # ? When we draw a random sample using the function stats::rnorm, there
> # should be not auto-correlation in the sample. But their is some
> # auto-correlation _when the sample that is drawn is small_.
> #
> # I describe the problem using two functions:
> #
> # ? DistributionAutocorrelation_Unexpected which as the wrong behavior :
> # ? _when drawing some small samples using rnorm, there is generally a
> # strong negative auto-correlation in the sample_.
> #
> # and
> #
> # DistributionAutocorrelation_Expected which illustrate the expected 
> behavior
> #
> #
> #
> # *Unexpected : *
> #
> # ? DistributionAutocorrelation_Unexpected = function(SampleSize){
> # ? ? Cor = NULL
> # ? ? for(repetition in 1:1e5){
> # ? ? ? X = rnorm(SampleSize)
> # ? ? ? Cor[repetition] = cor(X[-1],X[-length(X)])
> # ? ? }
> # ? ? return(Cor)
> # ? }
> #
> # par(mfrow=c(3,3))
> # for(SampleSize_ in c(4,5,6,7,8,10,15,20,50)){
> # 
>
hist(DistributionAutocorrelation_Unexpected(SampleSize_),col='grey',main=paste0('SampleSize=',SampleSize_))
>
> # ? ; abline(v=0,col=2)
> # }
> #
> # output:
> #
> #
> # ? *Expected**:*
> #
> # ? DistributionAutocorrelation_Expected = function(SampleSize){
> # ? ? Cor = NULL
> # ? ? for(repetition in 1:1e5){
> # ? ? ? X = rnorm(SampleSize)
> # ? ? ? * ? ?Cor[repetition] = cor(sample(X[-1]),sample(X[-length(X)]))*
> # ? ? }
> # ? ? return(Cor)
> # ? }
> #
> # par(mfrow=c(3,3))
> # for(SampleSize_ in c(4,5,6,7,8,10,15,20,50)){
> # 
>
hist(DistributionAutocorrelation_Expected(SampleSize_),col='grey',main=paste0('SampleSize=',SampleSize_))
>
> # ? ; abline(v=0,col=2)
> # }
> #
> #
> #
> #
> # Some more information you might need:
> #
> #
> # ? packageDescription("stats")
> # Package: stats
> # Version: 3.5.1
> # Priority: base
> # Title: The R Stats Package
> # Author: R Core Team and contributors worldwide
> # Maintainer: R Core Team <R-core at r-project.org>
> # ? Description: R statistical functions.
> # License: Part of R 3.5.1
> # Imports: utils, grDevices, graphics
> # Suggests: MASS, Matrix, SuppDists, methods, stats4
> # NeedsCompilation: yes
> # Built: R 3.5.1; x86_64-pc-linux-gnu; 2018-07-03 02:12:37 UTC; unix
> #
> # Thanks for correcting that.
> #
> # fill free to ask any further information you would need.
> #
> # cheers,
> #
> # hugo
> #
> #
> # --
> # ? - no title specified
> #
> # Hugo Math?-Hubert
> #
> # ATER
> #
> # Laboratoire Interdisciplinaire des Environnements Continentaux (LIEC)
> #
> # UMR 7360 CNRS - ?B?t IBISE
> #
> # Universit? de Lorraine ?- ?UFR SciFA
> #
> # 8, Rue du G?n?ral Delestraint
> #
> # F-57070 METZ
> #
> # +33(0)9 77 21 66 66
> # - - - - - - - - - - - - - - - - - -
> # ? Les r?flexions naissent dans les doutes et meurent dans les 
> certitudes.
> # Les doutes sont donc un signe de force et les certitudes un signe de
> # faiblesse. La plupart des gens sont pourtant certains du contraire.
> # - - - - - - - - - - - - - - - - - -
> # ? Thoughts appear from doubts and die in convictions. Therefore, doubts
> # are an indication of strength and convictions an indication of 
> weakness.
> # Yet, most people believe the opposite.
-- 
- no title specified

Hugo Math?-Hubert

ATER

Laboratoire Interdisciplinaire des Environnements Continentaux (LIEC)

UMR 7360 CNRS - ?B?t IBISE

Universit? de Lorraine ?- ?UFR SciFA

8, Rue du G?n?ral Delestraint

F-57070 METZ

+33(0)9 77 21 66 66
- - - - - - - - - - - - - - - - - -
Les r?flexions naissent dans les doutes et meurent dans les certitudes. 
Les doutes sont donc un signe de force et les certitudes un signe de 
faiblesse. La plupart des gens sont pourtant certains du contraire.
- - - - - - - - - - - - - - - - - -
Thoughts appear from doubts and die in convictions. Therefore, doubts 
are an indication of strength and convictions an indication of weakness. 
Yet, most people believe the opposite.


	[[alternative HTML version deleted]]

Annaert Jan

2018-Oct-05 07:58 UTC

head link

[R] Bug : Autocorrelation in sample drawn from stats::rnorm (hmh)

On 05/10/2018, 09:45, "R-help on behalf of hmh" <r-help-bounces at
r-project.org on behalf of hugomh at gmx.fr> wrote:

    Hi,
    
    Thanks William for this fast answer, and sorry for sending the 1st mail 
    to r-help instead to r-devel.
    
    
    I noticed that bug while I was simulating many small random walks using 
    c(0,cumsum(rnorm(10))). Then the negative auto-correlation was inducing 
    a muchsmaller space visited by the random walks than expected if there 
    would be no auto-correlation in the samples.
    
    
    The code I provided and you optimized was only provided to illustrated 
    and investigate that bug.
    
    
    It is really worrying that most of the R distributions are affected by 
    this bug !!!!
    
    What I did should have been one of the first check done for _*each*_ 
    distributions by the developers of these functions !
    
    
    And if as you suggested this is a "tolerated" _error_ of the
algorithm,
    I do think this is a bad choice, but any way, this should have been 
    mentioned in the documentations of the functions !!
    
    
    cheers,
    
    hugo
 
This is not a bug. You have simply rediscovered the finite-sample bias in the
sample autocorrelation coefficient, known at least since
Kendall, M. G. (1954). Note on bias in the estimation of autocorrelation.
Biometrika, 41(3-4), 403-404.

The bias is approximately -1/T, with T sample size, which explains why it seems
to disappear in the larger sample sizes you consider.

Jan

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Oct 2018 - Bug : Autocorrelation in sample drawn from stats::rnorm (hmh)

[R] Bug : Autocorrelation in sample drawn from stats::rnorm (hmh)

[R] Bug : Autocorrelation in sample drawn from stats::rnorm (hmh)

[R] Bug : Autocorrelation in sample drawn from stats::rnorm (hmh)

Seemingly Similar Threads