Fabien Tarrade
2015-Jul-11 07:46 UTC
[R] R 3.1.2 : arima.sim(model=list(ma=0.5), n=250, innov=rnorm(250, mean=0, sd=0.1)) versus arima.sim(model=list(ma=0.5), n=250, mean=0, sd=0.1) => only the first element is not identical !
Dear all, When doing a DataCamp tutorial with R I find the following observation that using 2 different syntax for "arima.sim" give different answer for the first element If I use the the function using the list of argument describe in the help manual : arima.sim(model=list(ma=0.5),n=250,innov=rnorm(250,mean=0,sd=0.1)) or if I use the following syntax use in a DataCamp example : arima.sim(model=list(ma=0.5), n=250, mean=0, sd=0.1) it is accepted by DataCamp I don't find exactly the same results. The reason is that even if the seed is the same in both cases the first element is not identical while it should be (it doesn't mean that the results is wrong, maybe for the first element the seed is not propagated correctly) here the results of the difference using the same seed (only the first element is different using the 2 different syntaxes) : [1] -0.252214 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 [11] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 here the code to reproduce this feature : set.seed(123); test1 <- 0.05 + arima.sim(model=list(ma=0.5),n=250,innov=rnorm(250,mean=0,sd=0.1)) set.seed(123); test2 <- 0.05 + arima.sim(model=list(ma=0.5), n=250, mean=0, sd=0.1) test1-test2 I am using R 3.1.2 GUI 1.65 Mavericks build (6833) on Mac (I guess arima come with stats which is included in R (?)) The DataCamp team ask me to report to you about this observation on this mailing list. If you want me to fill a bug report some R bug tracking system, let me know Please tell me if this is the wrong list and which other information do you need from R and how to get then (compiler, version of some R packages ...) Hope this help Thanks Cheers Fabien -- Dr Fabien Tarrade Quantitative Analyst - Data Scientist - Researcher Senior data analyst specialised in the modelling, processing and statistical treatment of data. PhD in Physics, 10 years of experience as researcher at the forefront of international scientific research. Fascinated by finance and data modelling. Geneva, Switzerland Email : contact at fabien-tarrade.eu <mailto:contact at fabien-tarrade.eu> Phone : www.fabien-tarrade.eu <http://www.fabien-tarrade.eu> Phone : +33 (0)6 14 78 70 90 LinkedIn <http://ch.linkedin.com/in/fabientarrade/> Twitter <https://twitter.com/fabtar> Google <https://plus.google.com/+FabienTarradeProfile/posts> Facebook <https://www.facebook.com/fabien.tarrade.eu> Google <skype:fabtarhiggs?call> Xing <https://www.xing.com/profile/Fabien_Tarrade>
Mark Leeds
2015-Jul-11 15:38 UTC
[R] R 3.1.2 : arima.sim(model=list(ma=0.5), n=250, innov=rnorm(250, mean=0, sd=0.1)) versus arima.sim(model=list(ma=0.5), n=250, mean=0, sd=0.1) => only the first element is not identical !
Hi Fabian: I think one would say that that is not a bug. I looked at the details of arima.sim ( using debug(arima.sim) ) and there are two different series that are created inside the function. one is called innov and the other is start.innov. start.innov is used to create a burn in period for the constructed series. in your test1 call, you are not supplying arguments for what should be used for the innovations associated with start.innov which is used for the burn in period. So, arima.sim uses the defaults of mean = 0 and sd = 1.0. For your test2 call, you do provide them ( so they are used for both innov and start.innov ) and you use sd = 0.1 So, for test2, the values for the burn in period end up being different from the ones in test1. Below, I made a test3 that can be used to get the same values as test2. In short, by specifiying the innov call EXACTLY in test1, you're letting arima.sim use the default arguments for the start.innov call so that's why they're different. #=================================================================================================================== ##undebug(arima.sim) set.seed(123); test1 <- arima.sim(model=list(ma=0.5), n 250,innov=rnorm(250,mean=0,sd=0.1)) print(head(test1)) set.seed(123); test2 <- arima.sim(model=list(ma=0.5), n=250, mean=0, sd=0.1) print(head(test2)) set.seed(123); test3 <- arima.sim(model=list(ma=0.5), innov = rnorm(250,mean=0, sd=0.1), n = 250, mean=0, sd=0.1) print(head(test3)) On Sat, Jul 11, 2015 at 3:46 AM, Fabien Tarrade <fabien.tarrade at gmail.com> wrote:> Dear all, > > When doing a DataCamp tutorial with R I find the following observation > that using 2 different syntax for "arima.sim" give different answer for the > first element > > If I use the the function using the list of argument describe in the help > manual : > arima.sim(model=list(ma=0.5),n=250,innov=rnorm(250,mean=0,sd=0.1)) > > or if I use the following syntax use in a DataCamp example : > > arima.sim(model=list(ma=0.5), n=250, mean=0, sd=0.1) it is accepted by > DataCamp > > I don't find exactly the same results. The reason is that even if the seed > is the same in both cases the first element is not identical while it > should be (it doesn't mean that the results is wrong, maybe for the first > element the seed is not propagated correctly) > > here the results of the difference using the same seed (only the first > element is different using the 2 different syntaxes) : > > [1] -0.252214 0.000000 0.000000 0.000000 0.000000 0.000000 > 0.000000 0.000000 0.000000 0.000000 > [11] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 > 0.000000 0.000000 0.000000 0.000000 > > > here the code to reproduce this feature : > > set.seed(123); > test1 <- 0.05 + > arima.sim(model=list(ma=0.5),n=250,innov=rnorm(250,mean=0,sd=0.1)) > set.seed(123); > test2 <- 0.05 + arima.sim(model=list(ma=0.5), n=250, mean=0, sd=0.1) > > test1-test2 > > I am using R 3.1.2 GUI 1.65 Mavericks build (6833) on Mac (I guess arima > come with stats which is included in R (?)) > The DataCamp team ask me to report to you about this observation on this > mailing list. If you want me to fill a bug report some R bug tracking > system, let me know > > Please tell me if this is the wrong list and which other information do > you need from R and how to get then (compiler, version of some R packages > ...) > > > Hope this help > Thanks > Cheers > Fabien > -- > Dr Fabien Tarrade > > Quantitative Analyst - Data Scientist - Researcher > > Senior data analyst specialised in the modelling, processing and > statistical treatment of data. > PhD in Physics, 10 years of experience as researcher at the forefront of > international scientific research. > Fascinated by finance and data modelling. > > Geneva, Switzerland > > Email : contact at fabien-tarrade.eu <mailto:contact at fabien-tarrade.eu> > Phone : www.fabien-tarrade.eu <http://www.fabien-tarrade.eu> > Phone : +33 (0)6 14 78 70 90 > > LinkedIn <http://ch.linkedin.com/in/fabientarrade/> Twitter < > https://twitter.com/fabtar> Google < > https://plus.google.com/+FabienTarradeProfile/posts> Facebook < > https://www.facebook.com/fabien.tarrade.eu> Google > <skype:fabtarhiggs?call> Xing <https://www.xing.com/profile/Fabien_Tarrade > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Fabien Tarrade
2015-Jul-13 10:28 UTC
[R] R 3.1.2 : arima.sim(model=list(ma=0.5), n=250, innov=rnorm(250, mean=0, sd=0.1)) versus arima.sim(model=list(ma=0.5), n=250, mean=0, sd=0.1) => only the first element is not identical !
Hi Mark, Thanks for your message and sorry for the delay but it took me some time to understand your message and to try few things.> I think one would say that that is not a bug. I looked at the details > of arima.sim ( using debug(arima.sim) ) > and there are two different series that are created inside the > function. one is called innov and the other is start.innov. start.innov is > used to create a burn in period for the constructed series.I am not sure I fully understand how the 2 time series are created :one is called innov and the other is start.innov but what I am not sure to understand is the following : nobs = 5000000 set.seed(183); test1 <- arima.sim(model=list(ma=0.5), n = nobs,innov=rnorm(nobs,mean=0,sd=0.1)) set.seed(183); test2 <- arima.sim(model=list(ma=0.5), n=nobs, mean=0, sd=0.1) sum(abs(test1-test2)>0) test1[1] test2[1] test1[2] test2[2] Browse[4]> sum(abs(test1-test2)>0) [1] 1 Browse[4]> test1[1] [1] 0.09417 Browse[4]> test2[1] [1] -0.02648 Browse[4]> test1[2] [1] -0.1053 Browse[4]> test2[2] [1] -0.1053 which indicate that on 5 millions of generated event with the 2 methods only the first element is different ! (I try to change the seed and I saw the same). I could understand that most of the element are different but the fact that only the first one is different is something that I don't understand, sorry. I made a copy of the function to play a bit and print some value : => first line : test1 <- test(model=list(ma=0.5), n = nobs,innov=rnorm(nobs,mean=0,sd=0.1)) => second line : test2 <- test(model=list(ma=0.5), n=nobs, mean=0, sd=0.1) after : x <- ts(c(start.innov[seq_len(n.start)], innov[1L:n]), start = 1 - n.start) I see : [1] 0.2681255 -0.0398882 -0.0853729 -0.0707863 -0.0262917 -0.0001408 [1] 0.0268126 -0.0398882 -0.0853729 -0.0707863 -0.0262917 -0.0001408 after : if (length(model$ma)) { x <- filter(x, c(1, model$ma), sides = 1L) x[seq_along(model$ma)] <- 0 } I see : [1] 0.00000 0.09417 -0.10532 -0.11347 -0.06168 -0.01329 [1] 0.00000 -0.02648 -0.10532 -0.11347 -0.06168 -0.01329 after : if (length(model$ar)) x <- filter(x, model$ar, method = "recursive") I see : [1] 0.00000 0.09417 -0.10532 -0.11347 -0.06168 -0.01329 [1] 0.00000 -0.02648 -0.10532 -0.11347 -0.06168 -0.01329 after : if (n.start > 0) x <- x[-(seq_len(n.start))] I see : [1] 0.09417 -0.10532 -0.11347 -0.06168 -0.01329 [1] -0.02648 -0.10532 -0.11347 -0.06168 -0.01329 after : if (d > 0) x <- diffinv(x, differences = d) I see : [1] 0.09417 -0.10532 -0.11347 -0.06168 -0.01329 [1] -0.02648 -0.10532 -0.11347 -0.06168 -0.01329 so the difference appear since the beginning with the with element and which seems to be divided by 10! after : x <- ts(c(start.innov[seq_len(n.start)], innov[1L:n]), start = 1 - n.start) I see : [1] 0.2681255 ... [1] 0.0268126 ...> in your test1 call, you are not supplying arguments for what should > be used for the innovations associated with start.innov which is used > for the burn in period. So, arima.sim uses the defaults of mean = 0 > and sd = 1.0.> set.seed(123); > test1 <- arima.sim(model=list(ma=0.5), n = 250,innov=rnorm(250,mean=0,sd=0.1)) > print(head(test1)) [1] -0.30326 0.14436 0.08499 0.01645 0.17797 0.13184 > set.seed(123); > test3 <- arima.sim(model=list(ma=0.5), innov = rnorm(250,mean=0, sd=1.0), n = 250, mean=0, sd=0.1) > print(head(test3)) [1] -0.2582 1.4436 0.8499 0.1645 1.7797 1.3184 this doesn't seems to be true or I misinterpret you explanation> For your test2 call, you do provide them ( so they are used for both > innov and start.innov ) and you use sd = 0.1 So, for test2, the > values for the burn in period end up being different from the ones in > test1.I am not sure I understand the meaning of "burn in period". Do we expect that to change only the first element of 2 time series of 5M of events >> Below, I made a test3 that can be used to get the same values as > test2. In short, by specifiying the innov call EXACTLY in test1, > you're letting > arima.sim use the default arguments for the start.innov call so that's > why they're different.I did the test and I agree Thanks a lot Cheers Fabien> > > #===================================================================================================================> ##undebug(arima.sim) > > set.seed(123); > test1 <- arima.sim(model=list(ma=0.5), n = > 250,innov=rnorm(250,mean=0,sd=0.1)) > print(head(test1)) > > set.seed(123); > test2 <- arima.sim(model=list(ma=0.5), n=250, mean=0, sd=0.1) > print(head(test2)) > > set.seed(123); > test3 <- arima.sim(model=list(ma=0.5), innov = rnorm(250,mean=0, > sd=0.1), n = 250, mean=0, sd=0.1) > print(head(test3)) > > On Sat, Jul 11, 2015 at 3:46 AM, Fabien Tarrade > <fabien.tarrade at gmail.com <mailto:fabien.tarrade at gmail.com>> wrote: > > Dear all, > > When doing a DataCamp tutorial with R I find the following > observation that using 2 different syntax for "arima.sim" give > different answer for the first element > > If I use the the function using the list of argument describe in > the help manual : > arima.sim(model=list(ma=0.5),n=250,innov=rnorm(250,mean=0,sd=0.1)) > > or if I use the following syntax use in a DataCamp example : > > arima.sim(model=list(ma=0.5), n=250, mean=0, sd=0.1) it is > accepted by DataCamp > > I don't find exactly the same results. The reason is that even if > the seed is the same in both cases the first element is not > identical while it should be (it doesn't mean that the results is > wrong, maybe for the first element the seed is not propagated > correctly) > > here the results of the difference using the same seed (only the > first element is different using the 2 different syntaxes) : > > [1] -0.252214 0.000000 0.000000 0.000000 0.000000 0.000000 > 0.000000 0.000000 0.000000 0.000000 > [11] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 > 0.000000 0.000000 0.000000 0.000000 > > > here the code to reproduce this feature : > > set.seed(123); > test1 <- 0.05 + > arima.sim(model=list(ma=0.5),n=250,innov=rnorm(250,mean=0,sd=0.1)) > set.seed(123); > test2 <- 0.05 + arima.sim(model=list(ma=0.5), n=250, mean=0, sd=0.1) > > test1-test2 > > I am using R 3.1.2 GUI 1.65 Mavericks build (6833) on Mac (I guess > arima come with stats which is included in R (?)) > The DataCamp team ask me to report to you about this observation > on this mailing list. If you want me to fill a bug report some R > bug tracking system, let me know > > Please tell me if this is the wrong list and which other > information do you need from R and how to get then (compiler, > version of some R packages ...) > > > Hope this help > Thanks > Cheers > Fabien > -- > Dr Fabien Tarrade > > Quantitative Analyst - Data Scientist - Researcher > > Senior data analyst specialised in the modelling, processing and > statistical treatment of data. > PhD in Physics, 10 years of experience as researcher at the > forefront of international scientific research. > Fascinated by finance and data modelling. > > Geneva, Switzerland > > Email : contact at fabien-tarrade.eu > <mailto:contact at fabien-tarrade.eu> > <mailto:contact at fabien-tarrade.eu <mailto:contact at fabien-tarrade.eu>> > Phone : www.fabien-tarrade.eu <http://www.fabien-tarrade.eu> > <http://www.fabien-tarrade.eu> > Phone : +33 (0)6 14 78 70 90 > <tel:%2B33%20%280%296%2014%2078%2070%2090> > > LinkedIn <http://ch.linkedin.com/in/fabientarrade/> Twitter > <https://twitter.com/fabtar> Google > <https://plus.google.com/+FabienTarradeProfile/posts> Facebook > <https://www.facebook.com/fabien.tarrade.eu> Google > <skype:fabtarhiggs?call> Xing > <https://www.xing.com/profile/Fabien_Tarrade> > ______________________________________________ > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- > To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- Dr Fabien Tarrade Quantitative Analyst - Data Scientist - Researcher Senior data analyst specialised in the modelling, processing and statistical treatment of data. PhD in Physics, 10 years of experience as researcher at the forefront of international scientific research. Fascinated by finance and data modelling. Geneva, Switzerland Email : contact at fabien-tarrade.eu <mailto:contact at fabien-tarrade.eu> Phone : www.fabien-tarrade.eu <http://www.fabien-tarrade.eu> Phone : +33 (0)6 14 78 70 90 LinkedIn <http://ch.linkedin.com/in/fabientarrade/> Twitter <https://twitter.com/fabtar> Google <https://plus.google.com/+FabienTarradeProfile/posts> Facebook <https://www.facebook.com/fabien.tarrade.eu> Google <skype:fabtarhiggs?call> Xing <https://www.xing.com/profile/Fabien_Tarrade>