Hello everyone, this is my first post. Nice to meet all of you. I am
having some troubles using R in combination with PHP and MySQL. I
would appreciate any assistance very much! This is kind of long, so if
you'd like a shorter version let me know.
I am working on a project that takes a list of points (inputted via the
web, and stored into a MySQL DB), and runs an R script on them. The
user receives and e-mail when the script is complete, and can view the
output (which, ideally, will be stored in a database and formatted for
the web by PHP)
What is the best way to do this?
As of now, I have it so the user can upload the list of points and they
are stored in a database. (see below) When the user requests a job to
be run, that is, for the server to use R to process the data, an entry
is added to the jobs table.
Every x seconds, a daemon (written in PHP) looks at the "jobs" table
and looks to see if any are in the "processing" state. If none are,
this means the server is free to run a script... so the daemon chooses
a job to be run. (and sets its status to processing)
At this point this information is available:
The R script that needs to be run.
The Dataset ID
-----------
Problem #1:
How should I call R so that it runs the script, lets call it "bla.R"
on
the points stored in a MySQL database? Do I have PHP create a
temporary file, call the R script with that filename as an argument,
and have R just do table.read("temp.txt")?
-----------
-----------
Problem #2:
The R script just runs linear regressions on the data. I'd like to
take only SOME of the data outputted by the "summary" function.
Let's
say we have a simple linear regression on the X and Y points:
fit <- lm(X~Y)
How can I get R to output something that can be easily split apart and
stored into a DB? I want the following values:
-Residuals Min
-Residuals 1Q
-Residuals Median
-Residuals 3Q
-Residuals Max
-The Residual Standard Error
-Multiple R-Squared
-Adjusted R-Squared
-F-Statistic
-etc..etc..
How can this be acheived?
-------------
====Database Structure===
A list of points is called a Dataset.
We have a table called "Datasets" which simply holds all the Datasets:
DATASETS
id
title
and a table "Data" which holds all the points of all the datasets:
DATA
id
ds_id
x
y
lagged
The points of a Dataset can be found from this query: "SELECT
x,y,lagged FROM DATA WHERE ds_id=(whatever dataset)"
and the table "jobs"
JOBS
id
script (which r script to run)
dataset (which dataset to use)
status (queued, processing, or completed)
======================
Thank you all so much for helping me out.. I appreciate it very much
and am looking forward to figuring this out!
-Sam