Dear R helpers, Suppose I have stock returns data of say 1500 companies each for say last 4 years. Thus I have a matrix of dimension say 1000 * 1500 i.e. 1500 columns representing companies and 1000 rows of their returns. I need to find the correlation matrix of these 1500 companies. So I can find out the correlation as cor(returns) and expect to get 1500 * 1500 matrix. However, the process takes a tremendous time. Is there any way in expediting such a process. In reality, I may be dealing with lots of even 5000 stocks and may simulate even 100000 stock returns. Kindly guide. Vincy [[alternative HTML version deleted]]
On Mon, Mar 21, 2011 at 8:34 AM, Vincy Pyne <vincy_pyne at yahoo.ca> wrote:> Dear R helpers, > > Suppose I have stock returns data of say 1500 companies each for say last 4 years. Thus I have a matrix of dimension say 1000 * 1500 i.e. 1500 columns representing companies and 1000 rows of their returns. > > I need to find the correlation matrix of these 1500 companies. > > So I can find out the correlation as > > cor(returns) and expect to get 1500 * 1500 matrix. However, the process takes a tremendous time. Is there any way in expediting such a process. In reality, I may be dealing with lots of even 5000 stocks and may simulate even 100000 stock returns.How long is "tremendous time"? What platform are you on? If you can compile R against a tuned BLAS library, stats::cor will run faster IF you do not have any missing data. If you do have missing data, you may want to try the package WGCNA (where we work with bigger correlation matrices) that implements a correlation calculation that is faster particularly if there are few missing data. This will also run faster if you do have a tuned BLAS installed. HTH, Peter> > > > Kindly guide. > > Vincy > > > > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Getting the correlation of a 1000 by 1500 matrix takes about 3.5 seconds on my unimpressive Windows machine. Is that really a tremendous amount of time? You don't say what you are using the correlation matrix for. It is common for a semi-definite matrix (as you will be getting) to cause problems for applications. Some ways of getting a positive definite matrix are explained in the blog post: http://www.portfolioprobe.com/2011/03/07/factor-models-of-variance-in-finance/ On 21/03/2011 15:34, Vincy Pyne wrote:> Dear R helpers, > > Suppose I have stock returns data of say 1500 companies each for say last 4 years. Thus I have a matrix of dimension say 1000 * 1500 i.e. 1500 columns representing companies and 1000 rows of their returns. > > I need to find the correlation matrix of these 1500 companies. > > So I can find out the correlation as > > cor(returns) and expect to get 1500 * 1500 matrix. However, the process takes a tremendous time. Is there any way in expediting such a process. In reality, I may be dealing with lots of even 5000 stocks and may simulate even 100000 stock returns. > > > > Kindly guide. > > Vincy > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Patrick Burns pburns at pburns.seanet.com twitter: @portfolioprobe http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of 'Some hints for the R beginner' and 'The R Inferno')