I'm getting familiar with the stl function in the stats packcage by trying it on an example from Brockwell & Davis's 2002 "Introduction to Times Series and Forcasting". Specifically, I'm using a subset of his red wine sales data. It's a detour from the stl material at http://www.stat.pitt.edu/stoffer/tsa3/R_toot.htm (at some point, I have to stop simply following and try to make it work with new data). I need a minimum of 36 wine sales data points in the series, since stl otherwise complains about the data being less than 2 cycles. The data is in ~/tmp/wine.txt: 464 675 703 887 1139 1077 1318 1260 1120 963 996 960 530 883 894 1045 1199 1287 1565 1577 1076 918 1008 1063 544 635 804 980 1018 1064 1404 1286 1104 999 996 1015 My sourced test code is buried in a repeat loop so that I can use a break command to circumvent the final error-causing statement that I'm trying to figure out: repeat{ # Clear variables (from stackexchange) rm( list=setdiff( ls( all.names=TRUE ), lsf.str(all.names=TRUE ) ) ) ls() head( wine <- read.table("~/tmp/wine.txt") ) ( x <- ts(wine[[1]],frequency=12) ) ( y <- ts(wine,frequency=12) ) ( a=stl(x,"per") ) #break ( b=stl(y,"per") ) } The final statement causes the error 'Error in stl(y, "per") : only univariate series are allowed'. I found an explanation at http://stackoverflow.com/questions/10492155/time-series-and-stl-in-r-error- only-univariate-series-are-allowed. That's how I came up with the assignment to x using wine[[1]]. I found an explanation to the need for double square brackets at http://www.r-tutor.com/r-introduction/list/named-list-members. My problem is that it's not very clear what is happening inside the ts structures x and y. If I simply print them, they look 100% identical: | > x | Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec | 1 464 675 703 887 1139 1077 1318 1260 1120 963 996 960 | 2 530 883 894 1045 1199 1287 1565 1577 1076 918 1008 1063 | 3 544 635 804 980 1018 1064 1404 1286 1104 999 996 1015 | > y | Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec | 1 464 675 703 887 1139 1077 1318 1260 1120 963 996 960 | 2 530 883 894 1045 1199 1287 1565 1577 1076 918 1008 1063 | 3 544 635 804 980 1018 1064 1404 1286 1104 999 996 1015 Whatever their differences, it's not causing R to misinterpret the data; that is, they each look like in single series of numerical data. Can anyone illuminate the difference in the data inside the ts data structures? The potential incompatibility with stl is just one symptom. Right now, the "solution" is black magic to me, and I would like to get a clearer picture so that I know when else (and how) to watch out for this. I've posted this to the R Help mailing list http://news.gmane.org/gmane.comp.lang.r.general and to stackoverflow at http://stackoverflow.com/questions/29759928/how-numerical-data-is-stored- inside-ts-time-series-objects.
William Dunlap
2015-Apr-21 00:00 UTC
[R] How numerical data is stored inside ts time series objects
Use the str() function to see the internal structure of most objects. In your case it would show something like:> Data <- data.frame(theData=round(sin(1:38),1)) > x <- ts(Data[[1]], frequency=12) # or Data[,1] > y <- ts(Data, frequency=12) > str(x)Time-Series [1:38] from 1 to 4.08: 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 -0.5 ...> str(y)ts [1:38, 1] 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 -0.5 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr "theData" - attr(*, "tsp")= num [1:3] 1 4.08 12 'x' contains a vector of data and 'y' contains a 1-column matrix of data. stl(x,"per") and stl(y, "per") give similar results as you got. Evidently, stl() does not know that 1-column matrices can be treated much the same as vectors and gives an error message. Thus you must extract the one column into a vector: stl(y[,1], "per"). Bill Dunlap TIBCO Software wdunlap tibco.com On Mon, Apr 20, 2015 at 4:04 PM, Paul <Paul.Domaskis at gmail.com> wrote:> I'm getting familiar with the stl function in the stats packcage by > trying it on an example from Brockwell & Davis's 2002 "Introduction to > Times Series and Forcasting". Specifically, I'm using a subset of his > red wine sales data. It's a detour from the stl material at > http://www.stat.pitt.edu/stoffer/tsa3/R_toot.htm (at some point, I > have to stop simply following and try to make it work with new data). > > I need a minimum of 36 wine sales data points in the series, since stl > otherwise complains about the data being less than 2 cycles. The data > is in ~/tmp/wine.txt: > > 464 > 675 > 703 > 887 > 1139 > 1077 > 1318 > 1260 > 1120 > 963 > 996 > 960 > 530 > 883 > 894 > 1045 > 1199 > 1287 > 1565 > 1577 > 1076 > 918 > 1008 > 1063 > 544 > 635 > 804 > 980 > 1018 > 1064 > 1404 > 1286 > 1104 > 999 > 996 > 1015 > > My sourced test code is buried in a repeat loop so that I can use a > break command to circumvent the final error-causing statement that I'm > trying to figure out: > > repeat{ > > # Clear variables (from stackexchange) > rm( list=setdiff( ls( all.names=TRUE ), lsf.str(all.names=TRUE ) ) > ) > ls() > > head( wine <- read.table("~/tmp/wine.txt") ) > ( x <- ts(wine[[1]],frequency=12) ) > ( y <- ts(wine,frequency=12) ) > ( a=stl(x,"per") ) > #break > ( b=stl(y,"per") ) > } > > The final statement causes the error 'Error in stl(y, "per") : only > univariate series are allowed'. I found an explanation at > http://stackoverflow.com/questions/10492155/time-series-and-stl-in-r-error- > only-univariate-series-are-allowed. > That's how I came up with the assignment to x using wine[[1]]. I > found an explanation to the need for > double square brackets at > http://www.r-tutor.com/r-introduction/list/named-list-members. > > My problem is that it's not very clear what is happening inside the ts > structures x and y. If I simply print them, they look 100% identical: > > | > x > | Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec > | 1 464 675 703 887 1139 1077 1318 1260 1120 963 996 960 > | 2 530 883 894 1045 1199 1287 1565 1577 1076 918 1008 1063 > | 3 544 635 804 980 1018 1064 1404 1286 1104 999 996 1015 > | > y > | Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec > | 1 464 675 703 887 1139 1077 1318 1260 1120 963 996 960 > | 2 530 883 894 1045 1199 1287 1565 1577 1076 918 1008 1063 > | 3 544 635 804 980 1018 1064 1404 1286 1104 999 996 1015 > > Whatever their differences, it's not causing R to misinterpret the > data; that is, they each look like in single series of numerical data. > > Can anyone illuminate the difference in the data inside the ts data > structures? The potential incompatibility with stl is just one > symptom. Right now, the "solution" is black magic to me, and I would > like to get a clearer picture so that I know when else (and how) to > watch out for this. > > I've posted this to the R Help mailing list > http://news.gmane.org/gmane.comp.lang.r.general and to stackoverflow > at > http://stackoverflow.com/questions/29759928/how-numerical-data-is-stored- > inside-ts-time-series-objects. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
William Dunlap <wdunlap <at> tibco.com> writes:> Use the str() function to see the internal structure of most > objects. In your case it would show something like: > > > Data <- data.frame(theData=round(sin(1:38),1)) > > x <- ts(Data[[1]], frequency=12) # or Data[,1] > > y <- ts(Data, frequency=12) > > str(x) > Time-Series [1:38] from 1 to 4.08: 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 -0.5> ... > > str(y) > ts [1:38, 1] 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 -0.5 ... > - attr(*, "dimnames")=List of 2 > ..$ : NULL > ..$ : chr "theData" > - attr(*, "tsp")= num [1:3] 1 4.08 12 > > 'x' contains a vector of data and 'y' contains a 1-column matrix of > data. stl(x,"per") and stl(y, "per") give similar results as you > got. > > Evidently, stl() does not know that 1-column matrices can be treated > much the same as vectors and gives an error message. Thus you must > extract the one column into a vector: stl(y[,1], "per").Thanks, William. Interesting that a 2D matrix of size Nx1 is treated as a different animal from a length N vector. It's a departure from math convention, and from what I'm accustomed to in Matlab. that R's vector seems more akin to a list, where the notion of orientation doesn't apply. I rummaged around the help files for str, summary, dput, args. This seems like a more complicated language than Matlab, VBA, or even C++'s STL of old (which was pretty thoroughly documented). A function like str() returns an object description, and I'm guessing the conventions with which the object is described depends a lot on the person who wrote the handling code for the class. The description for the variable y seems particularly elaborate. Would I be right in assuming that the notation is ad-hoc and not documented? For example, the two invocations str(x) and str(y) show a Time-Series and a ts. And there are many lines of output for str(y) that is heavy in punctuation.