Hi, we have a first draft of R functions reading/writing data to XML files including a rather general DTD ... which borrows heavily from the data types of a certain programming language :-) The basic idea is to create an XML standard for data exchange, together with import/export functions for as many applications as possible. We here will need R, Matlab & Octave for our research program, but the idea is of course to create a general standard. After looking in several other applications we think that all the data types there can easily be represented using S constructs (i.e., arrays and lists together with attributes) ... so why make life complicated and invent something new. Of course this only applies to the low-level representaion ... the real thing will come next when one starts defining higher level classes, this step we have avoided so far because one needs the low-level things first to have something to play with. A short description of the DTD and an R package with import/export functions can be found at http://www.ci.tuwien.ac.at/~leisch/R (Modulo some bugs) R data objects can be saved/restored without loss of information. We don't intend to cover functions or models yet. All comments and ideas are appreciated! This is just a proposal and anything can still be changed ... Best, Fritz PS: Almost all the work has been done by Torsten Hothorn, I'm just writing the email ;-) -- ------------------------------------------------------------------- Friedrich Leisch Institut für Statistik Tel: (+43 1) 58801 10715 Technische Universität Wien Fax: (+43 1) 58801 10798 Wiedner Hauptstraße 8-10/1071 Friedrich.Leisch@ci.tuwien.ac.at A-1040 Wien, Austria http://www.ci.tuwien.ac.at/~leisch PGP public key http://www.ci.tuwien.ac.at/~leisch/pgp.key ------------------------------------------------------------------- -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Great. Will take a closer look. Did you take a look at the DTDs in Omegahat - $OMEGA_HOME/XML/DTDs? There are some things in there that might prove useful. Because of its reference-oriented style, things like connected/linked observations, etc. arise more naturally than in S-like languages. D. -- _______________________________________________________________ Duncan Temple Lang duncan@research.bell-labs.com Bell Labs, Lucent Technologies office: (908)582-3217 700 Mountain Avenue, Room 2C-259 fax: (908)582-3340 Murray Hill, NJ 07974-2070 http://cm.bell-labs.com/stat/duncan "Languages shape the way we think, and determine what we can think about." Benjamin Whorf -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Fritz Could you compare/contrast this a bit with NetCDF and the UN/EDIFACT General Statistical Message (GESMES ). I know almost nothing about NetCDF, but from the little I know it seems to attempt to do a similar thing. I know a bit more about Gesmes, and there seems to be a lot of overlap with what you are proposing. Ten years ago, given the number of governments and organization involved, it would have been easy to say go ahead and do something that we can use sooner (in fact, we did). However, Gesmes is relatively advanced now, with several groups working on implementations of a least parts of it, and governments committed to exchanging data using it. A GNU implementation would be a real benefit to a lot of tax payers around the world, not to mention the statistics community. Paul Gilbert -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Fri, 3 Mar 2000, Paul Gilbert wrote:> Fritz > > Could you compare/contrast this a bit with NetCDF and the UN/EDIFACT General > Statistical Message (GESMES ). I know almost nothing about NetCDF, but from the > little I know it seems to attempt to do a similar thing. I know a bit more about > Gesmes, and there seems to be a lot of overlap with what you are proposing. >I just had a short (!) look at the description of the data model and the user guide for GESMES at http://www.ecb.int/stats/gesmes/gesmes.htm For me it looks biased to banking problems and very dataframe/array centric.> Ten years ago, given the number of governments and organization involved, it > would have been easy to say go ahead and do something that we can use sooner (in > fact, we did).The problem I see is that GESMES would probably be to complicated, think of exchanging data via CORBA, which is easy (and, in our opinion, efficient) with XML structured strings.> However, Gesmes is relatively advanced now, with several groups > working on implementations of a least parts of it, and governments committed to > exchanging data using it. A GNU implementation would be a real benefit to a lot > of tax payers around the world, not to mention the statistics community. >agree, free software is always a benefit :-) Torsten -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Hi Fritz, Interesting proposal! If I may second your comment "Note: Move dimension and/or dimnames to properties list?", which I think ties in with David James's point. My feeling is that it be better to operate at a slightly lower level and represent array-type constructions as properties of simple numeric or character vectors, ie very much the way that R does anyway. Presumably it would be possible to make a property a recursive structure, such as a list of dimnames? (I am not very XML-literate.) If I might also comment on Paul Gilbert's point about existing alternative standards. Our experience with one of our commerical partners suggests that parsers for R dput objects into XML will be written, if they do not already exist. Whatever its shortcomings, XML seems to be here to stay, and I think the development of a set of R tools at this stage will allow us to attempt some degree of standardisation. Best wishes, Jonathan. Jonathan Rougier Science Laboratories Department of Mathematical Sciences South Road University of Durham Durham DH1 3LE http://www.maths.dur.ac.uk/stats/people/jcr/jcr.html -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
PG> Could you compare/contrast this a bit with NetCDF and the UN/EDIFACT PG> General Statistical Message (GESMES ). Well, I guess I'll try to answer part of my own question. I'm groping a bit to understand the various pieces, and to what extent they overlap or are complementary, so I would especially invite anyone with more knowledge to help clear things up. Having very little true understanding, I think of XML as a language that allows one to define a format for the data exchange, whereas Gesmes actually defines a format. I believe they both result in ASCII files. I'm not sure if this means that XML could be used to define the Gesmes message format or not. CORBA on the other hand is a protocol for negotiating the exchange of a formatted message between a client and a server, so it could be used to exchange Gesmes or some other format defined by XML. CORBA is probably a replacement for the PADI protocol, based on RPC, which we use here to exchange time series data between our data servers (Fame) and R and Splus. (PADI is publicly available but not widely used.) Torsten>I just had a short (!) look at the description of the data model and the Torsten>user guide for GESMES at http://www.ecb.int/stats/gesmes/gesmes.htm Torsten>For me it looks biased to banking problems and very dataframe/array Torsten>centric. Yes, I won't recommend it for being simple. That link seems to go to Gesmes/CB (Central Bank subset of Gesmes) which is decidedly oriented toward banking and time series data. Reference documents for other parts of Gesmes are available at <http://forum.europa.eu.int/Public/irc/dsis/eeg6/library?l=/reference_implementation/gesmes_statistical&vm=detailed&sb=Title> After looking at this, one might be inclined to think that trying to be too general is dangerous. However, even if Gesmes is very complicated, I'm not sure that trying to define a different general message is the answer. Paul Gilbert -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._