Dear All, Apologies for not posting a code snippet, but I really need a pointer about a methodology to look at my data and possibly some R package which can ease my task. I am given a set consisting of several multivariate noisy time series, let's call it {A}. Each A_i in {A}, in turn, consists of several numerical time series. Then I have another set of shorter time series {B}. Now, for every B_j in {B}, I need to determine the time series A_i where most likely B_j comes from (A_i is not just a subset of B_j). In other words, I need to determine the distance between A_i and B_j. I was thinking about the Mahalanobis distance described here. http://en.wikipedia.org/wiki/Mahalanobis_distance However, I have several questions in my head 1) With the Mahalanobis distance, do I lose the info about the time structure of the data? I am not just comparing some distributions, but some time series and the ordering of the data is important. 2) Even if the use of the Mahalanobis distance was appropriate, it involves the calculation of a covariance matrix and a mean. Should I average A_i or B_j (or a subset of B_j having the same length as A_i)? And should I use a correlation matrix based on A_i or B_j? Any suggestion is welcome. Lorenzo [[alternative HTML version deleted]]
Did you have a look at Dynamic Time Warping and dtw package? Best, E. On Mon, May 27, 2013 at 01:34:42PM +0200, Lorenzo Isella wrote:> Dear All, > Apologies for not posting a code snippet, but I really need a pointer about > a methodology to look at my data and possibly some R package which can ease > my task. > I am given a set consisting of several multivariate noisy time series, > let's call it {A}. > Each A_i in {A}, in turn, consists of several numerical time series. > Then I have another set of shorter time series {B}. > Now, for every B_j in {B}, I need to determine the time series A_i where > most likely B_j comes from (A_i is not just a subset of B_j). > In other words, I need to determine the distance between A_i and B_j. > I was thinking about the Mahalanobis distance described here. > > http://en.wikipedia.org/wiki/Mahalanobis_distance > > However, I have several questions in my head > 1) With the Mahalanobis distance, do I lose the info about the time > structure of the data? I am not just comparing some distributions, but some > time series and the ordering of the data is important. > 2) Even if the use of the Mahalanobis distance was appropriate, it involves > the calculation of a covariance matrix and a mean. > Should I average A_i or B_j (or a subset of B_j having the same length as > A_i)? And should I use a correlation matrix based on A_i or B_j? > > Any suggestion is welcome. > > Lorenzo > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Roy Mendelssohn - NOAA Federal
2013-May-27 13:39 UTC
[R] Classification of Multivariate Time Series
Look at: State - Space Discrimination and Clustering of. Atmospheric Time Series Data. Based on Kullback Information Measures. Thomas Bengtsson If you Google the topic, there are host of other papers too, but the one meshes with exiting star-space methods. -Roy On May 27, 2013, at 4:34 AM, Lorenzo Isella <lorenzo.isella at gmail.com> wrote:> Dear All, > Apologies for not posting a code snippet, but I really need a pointer about > a methodology to look at my data and possibly some R package which can ease > my task. > I am given a set consisting of several multivariate noisy time series, > let's call it {A}. > Each A_i in {A}, in turn, consists of several numerical time series. > Then I have another set of shorter time series {B}. > Now, for every B_j in {B}, I need to determine the time series A_i where > most likely B_j comes from (A_i is not just a subset of B_j). > In other words, I need to determine the distance between A_i and B_j. > I was thinking about the Mahalanobis distance described here. > > http://en.wikipedia.org/wiki/Mahalanobis_distance > > However, I have several questions in my head > 1) With the Mahalanobis distance, do I lose the info about the time > structure of the data? I am not just comparing some distributions, but some > time series and the ordering of the data is important. > 2) Even if the use of the Mahalanobis distance was appropriate, it involves > the calculation of a covariance matrix and a mean. > Should I average A_i or B_j (or a subset of B_j having the same length as > A_i)? And should I use a correlation matrix based on A_i or B_j? > > Any suggestion is welcome. > > Lorenzo > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.********************** "The contents of this message do not reflect any position of the U.S. Government or NOAA." ********************** Roy Mendelssohn Supervisory Operations Research Analyst NOAA/NMFS Environmental Research Division Southwest Fisheries Science Center 1352 Lighthouse Avenue Pacific Grove, CA 93950-2097 e-mail: Roy.Mendelssohn at noaa.gov (Note new e-mail address) voice: (831)-648-9029 fax: (831)-648-8440 www: http://www.pfeg.noaa.gov/ "Old age and treachery will overcome youth and skill." "From those who have been given much, much will be expected" "the arc of the moral universe is long, but it bends toward justice" -MLK Jr.