While I am not a novice when it comes to statistics, this will be the first time I have used R, apart from some intial play. I have normally written my own code for statistical analysis in C++ or fortran, for a number of reasons (in part contingent on what the boss was willing to pay for), and having been programming for a long time, there is no need to spare me the programming details. Just give me an url for a document that explains how to do what I want to do, if there is one. What I want to do is extract time series data from a database(say, PostgreSQL or MySQL - I routinely use both), analyse it, and put a selection of the statistical results into specific tables. For example, suppose I have daily values for several variates. One thing I might try is to fit a cubic spline to the data, use the spline to obtain an estimate of rate of change and acceleration, and then see if some nonlinear function of the variates can account for a significant percentage of the variation in rate of change and acceleration, perhaps after orthogonalization if there happens to be problems with multicollinearity. Since the data comes in regularly (daily in some cases, weekly in others), I'd want to rerun the whole process at the same interval, without prior analyses messing up a current analysis. I know of plr, for PostgreSQL, but haven't figured out how to use it yet. Is this something I can orchestrate using Perl to tie things together, either using plr or not? If I can use perl for this, a sample perl script showing how to use perl to get data into R and to retrieve statistical output from R into Perl variables would be priceless. What I have read in the preliminary documentation suggests I can, but it is short on detail and directions on where to go next. I'd like to be able to put the master script for this sort of thing into a scheduled task if possible. Thanks Ted
Hi Ted, hopefully the following information gets you started: Ted wrote:> What I want to do is extract time series data from a database(say, PostgreSQL > or MySQL - I routinely use both), analyse it, and put a selection of the > statistical results into specific tables.Check the following document: http://cran.r-project.org/doc/manuals/R-data.html The R Data Import/Export Manual. There is a section on Relational Databases. Please check also the available packages here: http://cran.wustl.edu/web/packages/index.html (or from any other CRAN mirror). There is, for example, a package called RMySQL which will probably help you (but there are also others RODBC, DBI, RSQLite, DBI, ...) There is even a special interest group for databases (R-SIG-DB).> > For example, suppose I have daily values for several variates. One thing I > might try is to fit a cubic spline to the data,Please check also the listing of available packages (URL given above). There seem to be quite a few spline-related packages. Please note that there is a function smooth.spline included in the packages 'stats' (part of the standard installation of R) I hope this helps, Roland
On Thu, Jul 3, 2008 at 5:09 PM, Ted <r.ted.byers at gmail.com> wrote:> While I am not a novice when it comes to statistics, this will be the first > time I have used R, apart from some intial play. I have normally written my > own code for statistical analysis in C++ or fortran, for a number of reasons > (in part contingent on what the boss was willing to pay for), and having been > programming for a long time, there is no need to spare me the programming > details. Just give me an url for a document that explains how to do what I > want to do, if there is one. > > What I want to do is extract time series data from a database(say, PostgreSQL > or MySQL - I routinely use both), analyse it, and put a selection of the > statistical results into specific tables.For postgres see: https://stat.ethz.ch/pipermail/r-sig-db/2008q3/000458.html For MySQL try the RMySQL R package or you can use the RODBC R interface. R also has JDBC via the RJDBC R package.> > For example, suppose I have daily values for several variates. One thing I > might try is to fit a cubic spline to the data, use the spline to obtain an > estimate of rate of change and acceleration, and then see if some nonlinear > function of the variates can account for a significant percentage of the > variation in rate of change and acceleration, perhaps after orthogonalization > if there happens to be problems with multicollinearity.?spline> > Since the data comes in regularly (daily in some cases, weekly in others), I'd > want to rerun the whole process at the same interval, without prior analyses > messing up a current analysis. > > I know of plr, for PostgreSQL, but haven't figured out how to use it yet. > > Is this something I can orchestrate using Perl to tie things together, either > using plr or not? If I can use perl for this, a sample perl script showing > how to use perl to get data into R and to retrieve statistical output from R > into Perl variables would be priceless. What I have read in the preliminary > documentation suggests I can, but it is short on detail and directions on > where to go next. I'd like to be able to put the master script for this sort > of thing into a scheduled task if possible.To run an external perl program from R, from within R try: ?system and ?shell or to run an R job from perl see ?BATCH To directly interface with perl: http://www.omegahat.org/RSPerl/ read.xls and xls2csv in the R gdata package use a perl program to read an Excel spreadsheet into R (although there are other methods that R has that do not depend on perl inlcuding COM and RODBC interfaces). That being said there is a good chance that you don't really need perl or if you do that you are doing something wrong. R itself has regular expressions and substantial string handling capability as well as many database and other interfaces. I personally used to use perl (prior to using R) but I virtually never use perl any more now that I have R. Issuing, from within R, RSiteSearch("...") can locate info on various R topics.