Dear R-Help, I am working on a paper in an R course for large file support in R using scan(), relational databases, and XML. I have never used SQL or heirarchical document formats such as XML (except where it occurs without user interaction), and knowledge in RDBs and XML is lacking in my program. I have tried finding a working example for the novices-novice on the topic, read many postings, the r-data I/O manual several times, and descriptions of packages RODBC, DBI, XML, among others. I understand that RDBs are (assumed at least) used widely among the R community. I have not been able to put all of the pieces together, but assuming that RDB use is actually quite widespread, it should be quite easy to fill me in and/or correct my understanding where necessary. For a cross-platform solution (PC/OSX at least, or in part) my questions/problems are about what preliminary steps are needed to get an SQL or XML query "to work" in R to begin with, what the appropriate data-file formats are, and how to convert to them if starting out with data in, say, a delimited ASCII text file. Very basic examples should suffice, say, a table with 20 random observations, a grouping variable with 2 levels, and a factor with 2 levels. ## untested code set.seed(1024) write.table("junk.txt", data.frame(Subj=c(rep(1,10),rep(2,10)),block=rep(c(rep(-1,5),rep(1,5)),2), obs=rnorm(20,0,1))) Specifically, 1- what are the minimum required non R components that are needed to support SQL or XML functionality, which may or may not need to be installed? 2- what R packages need to be installed, at a minimum (also as a cross-PC/Mac solution if possible or at least as much as possible) 3- I keep seeing reference to connections of a given name "if previously setup". What kind of setup is needed outside of R, if any? 4- what steps are needed in R to then connect to a file and import a subset based on a query? 5- Do I then use standard R routines (e.g. write()) to export as a DB, or an RDB/XML specific function? Sincerely, KeithC. [U.S] 1/k^c
I'm not sure it is possible to parse an XML file in R directly. Well, I guess it's *possible*, but may not be the best way to do it. ElementTree in Python is an easy-to-use parser that you might use to first parse your XML file (or others hierarchically structured data), organize it anyway you want, and then bring those data into R for subsequent analysis. In fact, I have recently done just this. I have another statistical program that outputs data as an XML file. So, I wrote a python program that parses that XML file, pulls out the data of interest into a text file, and then I bring those data into R for analysis.> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Keith Alan > Chamberlain > Sent: Thursday, April 10, 2008 4:14 PM > To: r-help at r-project.org > Subject: [R] Relational Databases or XML? > > Dear R-Help, > > I am working on a paper in an R course for large file support > in R using scan(), relational databases, and XML. I have > never used SQL or heirarchical document formats such as XML > (except where it occurs without user interaction), and > knowledge in RDBs and XML is lacking in my program. I have > tried finding a working example for the novices-novice on the > topic, read many postings, the r-data I/O manual several > times, and descriptions of packages RODBC, DBI, XML, among > others. I understand that RDBs are (assumed at least) used > widely among the R community. I have not been able to put all > of the pieces together, but assuming that RDB use is actually > quite widespread, it should be quite easy to fill me in > and/or correct my understanding where necessary. > > For a cross-platform solution (PC/OSX at least, or in part) > my questions/problems are about what preliminary steps are > needed to get an SQL or XML query "to work" in R to begin > with, what the appropriate data-file formats are, and how to > convert to them if starting out with data in, say, a > delimited ASCII text file. Very basic examples should > suffice, say, a table with 20 random observations, a grouping > variable with 2 levels, and a factor with 2 levels. > > ## untested code > set.seed(1024) > write.table("junk.txt", > data.frame(Subj=c(rep(1,10),rep(2,10)),block=rep(c(rep(-1,5),r > ep(1,5)),2), obs=rnorm(20,0,1))) > > Specifically, > > 1- what are the minimum required non R components that are > needed to support SQL or XML functionality, which may or may > not need to be installed? > > 2- what R packages need to be installed, at a minimum (also > as a cross-PC/Mac solution if possible or at least as much as > possible) > > 3- I keep seeing reference to connections of a given name "if > previously setup". What kind of setup is needed outside of R, if any? > > 4- what steps are needed in R to then connect to a file and > import a subset based on a query? > > 5- Do I then use standard R routines (e.g. write()) to export > as a DB, or an RDB/XML specific function? > > Sincerely, > KeithC. [U.S] > > 1/k^c > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
You may wish to try out sqldf. It allows one to manipulate data frames using sql. The real work is done by sqlite and the RSQLite interface but it sets up the database for you and all the tables and then deletes them so its as easy a typing in one line of code. Check out the home page at http://sqldf.googlecode.com . On Windows sqlite is included in the RSQLite package so you have nothing to install but an ordinary R package. For XML there is the XML R package. The ctv and Ryacas packages are two packages that use XML (for the task view database and for parsing OpenMath respectively). At least on Windows you just install the XML package and it includes everything you need. On omegahat site there is more about the R XML package. There is also a package to handle pubmed in XML in bioconductor. On Thu, Apr 10, 2008 at 4:14 PM, Keith Alan Chamberlain <Keith.Chamberlain at colorado.edu> wrote:> Dear R-Help, > > I am working on a paper in an R course for large file support in R using scan(), relational databases, and XML. I have never used SQL or heirarchical document formats such as XML (except where it occurs without user interaction), and knowledge in RDBs and XML is lacking in my program. I have tried finding a working example for the novices-novice on the topic, read many postings, the r-data I/O manual several times, and descriptions of packages RODBC, DBI, XML, among others. I understand that RDBs are (assumed at least) used widely among the R community. I have not been able to put all of the pieces together, but assuming that RDB use is actually quite widespread, it should be quite easy to fill me in and/or correct my understanding where necessary. > > For a cross-platform solution (PC/OSX at least, or in part) my questions/problems are about what preliminary steps are needed to get an SQL or XML query "to work" in R to begin with, what the appropriate data-file formats are, and how to convert to them if starting out with data in, say, a delimited ASCII text file. Very basic examples should suffice, say, a table with 20 random observations, a grouping variable with 2 levels, and a factor with 2 levels. > > ## untested code > set.seed(1024) > write.table("junk.txt", data.frame(Subj=c(rep(1,10),rep(2,10)),block=rep(c(rep(-1,5),rep(1,5)),2), obs=rnorm(20,0,1))) > > Specifically, > > 1- what are the minimum required non R components that are needed to support SQL or XML functionality, which may or may not need to be installed? > > 2- what R packages need to be installed, at a minimum (also as a cross-PC/Mac solution if possible or at least as much as possible) > > 3- I keep seeing reference to connections of a given name "if previously setup". What kind of setup is needed outside of R, if any? > > 4- what steps are needed in R to then connect to a file and import a subset based on a query? > > 5- Do I then use standard R routines (e.g. write()) to export as a DB, or an RDB/XML specific function? > > Sincerely, > KeithC. [U.S] > > 1/k^c > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >