Hi,
we have a first draft of R functions reading/writing data to XML files
including a rather general DTD ... which borrows heavily from the data
types of a certain programming language :-)
The basic idea is to create an XML standard for data exchange,
together with import/export functions for as many applications as
possible. We here will need R, Matlab & Octave for our research
program, but the idea is of course to create a general standard.
After looking in several other applications we think that all the data
types there can easily be represented using S constructs (i.e., arrays
and lists together with attributes) ... so why make life complicated
and invent something new.
Of course this only applies to the low-level representaion ... the
real thing will come next when one starts defining higher level
classes, this step we have avoided so far because one needs the
low-level things first to have something to play with.
A short description of the DTD and an R package with import/export
functions can be found at
http://www.ci.tuwien.ac.at/~leisch/R
(Modulo some bugs) R data objects can be saved/restored without loss
of information. We don't intend to cover functions or models yet.
All comments and ideas are appreciated! This is just a proposal and
anything can still be changed ...
Best,
Fritz
PS: Almost all the work has been done by Torsten Hothorn, I'm just
writing the email ;-)
--
-------------------------------------------------------------------
Friedrich Leisch
Institut für Statistik Tel: (+43 1) 58801 10715
Technische Universität Wien Fax: (+43 1) 58801 10798
Wiedner Hauptstraße 8-10/1071 Friedrich.Leisch@ci.tuwien.ac.at
A-1040 Wien, Austria http://www.ci.tuwien.ac.at/~leisch
PGP public key http://www.ci.tuwien.ac.at/~leisch/pgp.key
-------------------------------------------------------------------
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Great. Will take a closer look.
Did you take a look at the DTDs in Omegahat - $OMEGA_HOME/XML/DTDs?
There are some things in there that might prove useful. Because
of its reference-oriented style, things like connected/linked observations,
etc. arise more naturally than in S-like languages.
D.
--
_______________________________________________________________
Duncan Temple Lang duncan@research.bell-labs.com
Bell Labs, Lucent Technologies office: (908)582-3217
700 Mountain Avenue, Room 2C-259 fax: (908)582-3340
Murray Hill, NJ 07974-2070
http://cm.bell-labs.com/stat/duncan
"Languages shape the way we think, and determine what
we can think about."
Benjamin Whorf
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Fritz Could you compare/contrast this a bit with NetCDF and the UN/EDIFACT General Statistical Message (GESMES ). I know almost nothing about NetCDF, but from the little I know it seems to attempt to do a similar thing. I know a bit more about Gesmes, and there seems to be a lot of overlap with what you are proposing. Ten years ago, given the number of governments and organization involved, it would have been easy to say go ahead and do something that we can use sooner (in fact, we did). However, Gesmes is relatively advanced now, with several groups working on implementations of a least parts of it, and governments committed to exchanging data using it. A GNU implementation would be a real benefit to a lot of tax payers around the world, not to mention the statistics community. Paul Gilbert -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Fri, 3 Mar 2000, Paul Gilbert wrote:> Fritz > > Could you compare/contrast this a bit with NetCDF and the UN/EDIFACT General > Statistical Message (GESMES ). I know almost nothing about NetCDF, but from the > little I know it seems to attempt to do a similar thing. I know a bit more about > Gesmes, and there seems to be a lot of overlap with what you are proposing. >I just had a short (!) look at the description of the data model and the user guide for GESMES at http://www.ecb.int/stats/gesmes/gesmes.htm For me it looks biased to banking problems and very dataframe/array centric.> Ten years ago, given the number of governments and organization involved, it > would have been easy to say go ahead and do something that we can use sooner (in > fact, we did).The problem I see is that GESMES would probably be to complicated, think of exchanging data via CORBA, which is easy (and, in our opinion, efficient) with XML structured strings.> However, Gesmes is relatively advanced now, with several groups > working on implementations of a least parts of it, and governments committed to > exchanging data using it. A GNU implementation would be a real benefit to a lot > of tax payers around the world, not to mention the statistics community. >agree, free software is always a benefit :-) Torsten -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Hi Fritz, Interesting proposal! If I may second your comment "Note: Move dimension and/or dimnames to properties list?", which I think ties in with David James's point. My feeling is that it be better to operate at a slightly lower level and represent array-type constructions as properties of simple numeric or character vectors, ie very much the way that R does anyway. Presumably it would be possible to make a property a recursive structure, such as a list of dimnames? (I am not very XML-literate.) If I might also comment on Paul Gilbert's point about existing alternative standards. Our experience with one of our commerical partners suggests that parsers for R dput objects into XML will be written, if they do not already exist. Whatever its shortcomings, XML seems to be here to stay, and I think the development of a set of R tools at this stage will allow us to attempt some degree of standardisation. Best wishes, Jonathan. Jonathan Rougier Science Laboratories Department of Mathematical Sciences South Road University of Durham Durham DH1 3LE http://www.maths.dur.ac.uk/stats/people/jcr/jcr.html -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
PG> Could you compare/contrast this a bit with NetCDF and the UN/EDIFACT PG> General Statistical Message (GESMES ). Well, I guess I'll try to answer part of my own question. I'm groping a bit to understand the various pieces, and to what extent they overlap or are complementary, so I would especially invite anyone with more knowledge to help clear things up. Having very little true understanding, I think of XML as a language that allows one to define a format for the data exchange, whereas Gesmes actually defines a format. I believe they both result in ASCII files. I'm not sure if this means that XML could be used to define the Gesmes message format or not. CORBA on the other hand is a protocol for negotiating the exchange of a formatted message between a client and a server, so it could be used to exchange Gesmes or some other format defined by XML. CORBA is probably a replacement for the PADI protocol, based on RPC, which we use here to exchange time series data between our data servers (Fame) and R and Splus. (PADI is publicly available but not widely used.) Torsten>I just had a short (!) look at the description of the data model and the Torsten>user guide for GESMES at http://www.ecb.int/stats/gesmes/gesmes.htm Torsten>For me it looks biased to banking problems and very dataframe/array Torsten>centric. Yes, I won't recommend it for being simple. That link seems to go to Gesmes/CB (Central Bank subset of Gesmes) which is decidedly oriented toward banking and time series data. Reference documents for other parts of Gesmes are available at <http://forum.europa.eu.int/Public/irc/dsis/eeg6/library?l=/reference_implementation/gesmes_statistical&vm=detailed&sb=Title> After looking at this, one might be inclined to think that trying to be too general is dangerous. However, even if Gesmes is very complicated, I'm not sure that trying to define a different general message is the answer. Paul Gilbert -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._