Hi there, I try to rewrite some Java-code with R. It deals with reading XML files. I started with the XML package. In Java, I had a very useful method which gave me a node by using: name of the node index of appearance start point: global (false) / local (true) So, I could do something like this. setCurrentChildNode("data", 0); getValueOfElement("val",1,true); --> gives 45 setCurrentChildNode("data", 1); getValueOfElement("val",1,true); --> gives 11 getValueOfElement("val",1,false); --> gives 45 <root> <data loc="1"> <val i="t1"> 22 </val> <val i="t2"> 45 </val> </data> <data loc="2"> <val i="t1"> 44 </val> <val i="t2"> 11 </val> </data> </root> Now, I'd like to do something like this in R. Most important would be to retrieve a node just by its name, not by the whole path. How is it possible? Can anybody help me with this issue? Antje
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Antje Well, the XML package gives you a variety of ways to parse an XML document and manipulate it in R. Perhaps the approach that best matches the Java-style you outline is to use XPath to access nodes. To do this, you use doc = xmlTreeParse("filename.xml", useInternalNodes = TRUE) and then access the elements of interest with XPath queries, e.g. to get the value of the second <val> element within each <data> element, use xpathApply(doc, "//data", function(n) xmlValue(n[[2]])) To get the first <val> node in the first <data> you could use doc[ "//data/val" ] [[1]] or doc[[ "//data[1]/val[1]" ]] (Note the indexing/subsetting is being done in different languages.) Being able to access a node by just its name is convenient, but it may not be adequate. You may pick up too many matching nodes. So XPath is a powerful way to be able to use simplicity when it is adequate and more explicit constrantts on the path when more specificity is necessary. And XPath is a widespread standard mechanism for XML rather than specific to R or Java. HTH, D. Antje wrote:> Hi there, > > I try to rewrite some Java-code with R. It deals with reading XML files. > I started with the XML package. In Java, I had a very useful method > which gave me a node by using: > > name of the node > index of appearance > start point: global (false) / local (true) > > So, I could do something like this. > > setCurrentChildNode("data", 0); > getValueOfElement("val",1,true); > --> gives 45 > > setCurrentChildNode("data", 1); > getValueOfElement("val",1,true); > --> gives 11 > > getValueOfElement("val",1,false); > --> gives 45 > > <root> > <data loc="1"> > <val i="t1"> 22 </val> > <val i="t2"> 45 </val> > </data> > <data loc="2"> > <val i="t1"> 44 </val> > <val i="t2"> 11 </val> > </data> > </root> > > Now, I'd like to do something like this in R. Most important would be to > retrieve a node just by its name, not by the whole path. How is it > possible? > > Can anybody help me with this issue? > > Antje > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkjD4osACgkQ9p/Jzwa2QP7ZUACfYpsezY4T2AeKb3G7Jo6Vr0N0 RmwAnAtKCY5s8vBoDx7C1DFP24eveCtk =XWJ8 -----END PGP SIGNATURE-----
well not sure how its done in R , but heres a way to do it in simple Excel. http://decisionstats.com/2008/parsing-xml-files-easily/ Parsing XML files easily To parse a XML (or KML or PMML) file easily without using any complicated softwares, here is a piece of code that fits right in your excel sheet. Just import this file using Excel, and then use the function getElement, after pasting the XML code in 1 cell. xml-getelement It is used for simply reading the xml/kml code as a text string. Just pasted all the xml code in one cell, and used the start ,end function (for example start=<constraints> and end=</constraints> to get the value of constraints in the xml code). Simply read into the value in another cell using the getElement function. heres the code if you ever need it.Just paste it into the VB editor of Excel to create the GetElement function (if not there already) or simply import the file in the link above. Attribute VB_Name = "Module1? Public Function getElement(xml As String, start As String, finish As String) For i = 1 To Len(xml) If Mid(xml, i, Len(start)) = start Then For j = i + Len(start) To Len(xml) If Mid(xml, j, Len(finish)) = finish Then getElement = Mid(xml, i + Len(start), j - i - Len(start)) Exit Function End If Next j End If Next i End Function On Sun, Sep 7, 2008 at 1:52 PM, Antje <niederlein-rstat at yahoo.de> wrote:> > Hi there, > > I try to rewrite some Java-code with R. It deals with reading XML files. I started with the XML package. In Java, I had a very useful method which gave me a node by using: > > name of the node > index of appearance > start point: global (false) / local (true) > > So, I could do something like this. > > setCurrentChildNode("data", 0); > getValueOfElement("val",1,true); > --> gives 45 > > setCurrentChildNode("data", 1); > getValueOfElement("val",1,true); > --> gives 11 > > getValueOfElement("val",1,false); > --> gives 45 > > <root> > <data loc="1"> > <val i="t1"> 22 </val> > <val i="t2"> 45 </val> > </data> > <data loc="2"> > <val i="t1"> 44 </val> > <val i="t2"> 11 </val> > </data> > </root> > > Now, I'd like to do something like this in R. Most important would be to retrieve a node just by its name, not by the whole path. How is it possible? > > Can anybody help me with this issue? > > Antje > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Regards, Ajay Ohri http://tinyurl.com/liajayohri
On 7 September 2008 at 10:22, Antje wrote: | I try to rewrite some Java-code with R. It deals with reading XML files. I [...] | Now, I'd like to do something like this in R. Most important would be to | retrieve a node just by its name, not by the whole path. How is it possible? | | Can anybody help me with this issue? Have you looked at the "XML" package for R ? Dirk -- Three out of two people have difficulties with fractions.
Thanks a lot to Gabor and Duncan! I didn't know that XPath is a standard. I'll give it a deeper look to better understand it. Oh, I guess I understand a bit more xpathApply(doc, "//val", function(n) xmlValue(n)) would search globally for all nodes named "val" and return its values :-) So that's excactly what I was looking for. Not caring about the exact location of a node. I think, in my case it should be okay, to parse for nodes just by their names. Thanks again! @ Ajay: Sorry, but I was looking for a solution with R @ Dirk: I already used the XML package but didn't know the possibilities to access data as I was used to. Antje schrieb:> Hi there, > > I try to rewrite some Java-code with R. It deals with reading XML files. > I started with the XML package. In Java, I had a very useful method > which gave me a node by using: > > name of the node > index of appearance > start point: global (false) / local (true) > > So, I could do something like this. > > setCurrentChildNode("data", 0); > getValueOfElement("val",1,true); > --> gives 45 > > setCurrentChildNode("data", 1); > getValueOfElement("val",1,true); > --> gives 11 > > getValueOfElement("val",1,false); > --> gives 45 > > <root> > <data loc="1"> > <val i="t1"> 22 </val> > <val i="t2"> 45 </val> > </data> > <data loc="2"> > <val i="t1"> 44 </val> > <val i="t2"> 11 </val> > </data> > </root> > > Now, I'd like to do something like this in R. Most important would be to > retrieve a node just by its name, not by the whole path. How is it > possible? > > Can anybody help me with this issue? > > Antje > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >