Hi there,
I try to rewrite some Java-code with R. It deals with reading XML files. I
started with the XML package. In Java, I had a very useful method which gave me
a node by using:
name of the node
index of appearance
start point: global (false) / local (true)
So, I could do something like this.
setCurrentChildNode("data", 0);
getValueOfElement("val",1,true);
--> gives 45
setCurrentChildNode("data", 1);
getValueOfElement("val",1,true);
--> gives 11
getValueOfElement("val",1,false);
--> gives 45
<root>
<data loc="1">
<val i="t1"> 22 </val>
<val i="t2"> 45 </val>
</data>
<data loc="2">
<val i="t1"> 44 </val>
<val i="t2"> 11 </val>
</data>
</root>
Now, I'd like to do something like this in R. Most important would be to
retrieve a node just by its name, not by the whole path. How is it possible?
Can anybody help me with this issue?
Antje
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Antje
Well, the XML package gives you a variety of ways to parse
an XML document and manipulate it in R.
Perhaps the approach that best matches the Java-style you
outline is to use XPath to access nodes.
To do this, you use
doc = xmlTreeParse("filename.xml", useInternalNodes = TRUE)
and then access the elements of interest with XPath queries, e.g.
to get the value of the second <val> element within each <data>
element, use
xpathApply(doc, "//data", function(n) xmlValue(n[[2]]))
To get the first <val> node in the first <data> you could use
doc[ "//data/val" ] [[1]]
or
doc[[ "//data[1]/val[1]" ]]
(Note the indexing/subsetting is being done in different languages.)
Being able to access a node by just its name is convenient,
but it may not be adequate. You may pick up too many matching nodes.
So XPath is a powerful way to be able to use simplicity when it is
adequate and more explicit constrantts on the path when more
specificity is necessary. And XPath is a widespread standard
mechanism for XML rather than specific to R or Java.
HTH,
D.
Antje wrote:> Hi there,
>
> I try to rewrite some Java-code with R. It deals with reading XML files.
> I started with the XML package. In Java, I had a very useful method
> which gave me a node by using:
>
> name of the node
> index of appearance
> start point: global (false) / local (true)
>
> So, I could do something like this.
>
> setCurrentChildNode("data", 0);
> getValueOfElement("val",1,true);
> --> gives 45
>
> setCurrentChildNode("data", 1);
> getValueOfElement("val",1,true);
> --> gives 11
>
> getValueOfElement("val",1,false);
> --> gives 45
>
> <root>
> <data loc="1">
> <val i="t1"> 22 </val>
> <val i="t2"> 45 </val>
> </data>
> <data loc="2">
> <val i="t1"> 44 </val>
> <val i="t2"> 11 </val>
> </data>
> </root>
>
> Now, I'd like to do something like this in R. Most important would be
to
> retrieve a node just by its name, not by the whole path. How is it
> possible?
>
> Can anybody help me with this issue?
>
> Antje
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkjD4osACgkQ9p/Jzwa2QP7ZUACfYpsezY4T2AeKb3G7Jo6Vr0N0
RmwAnAtKCY5s8vBoDx7C1DFP24eveCtk
=XWJ8
-----END PGP SIGNATURE-----
well not sure how its done in R , but heres a way to do it in simple Excel.
http://decisionstats.com/2008/parsing-xml-files-easily/
Parsing XML files easily
To parse a XML (or KML or PMML) file easily without using any
complicated softwares, here is a piece of code that fits right in your
excel sheet.
Just import this file using Excel, and then use the function
getElement, after pasting the XML code in 1 cell.
xml-getelement
It is used for simply reading the xml/kml code as a text string. Just
pasted all the xml code in one cell, and used the start ,end function
(for example start=<constraints> and end=</constraints> to get the
value of constraints in the xml code).
Simply read into the value in another cell using the getElement function.
heres the code if you ever need it.Just paste it into the VB editor of
Excel to create the GetElement function (if not there already) or
simply import the file in the link above.
Attribute VB_Name = "Module1?
Public Function getElement(xml As String, start As String, finish As String)
For i = 1 To Len(xml)
If Mid(xml, i, Len(start)) = start Then
For j = i + Len(start) To Len(xml)
If Mid(xml, j, Len(finish)) = finish Then
getElement = Mid(xml, i + Len(start), j - i - Len(start))
Exit Function
End If
Next j
End If
Next i
End Function
On Sun, Sep 7, 2008 at 1:52 PM, Antje <niederlein-rstat at yahoo.de>
wrote:>
> Hi there,
>
> I try to rewrite some Java-code with R. It deals with reading XML files. I
started with the XML package. In Java, I had a very useful method which gave me
a node by using:
>
> name of the node
> index of appearance
> start point: global (false) / local (true)
>
> So, I could do something like this.
>
> setCurrentChildNode("data", 0);
> getValueOfElement("val",1,true);
> --> gives 45
>
> setCurrentChildNode("data", 1);
> getValueOfElement("val",1,true);
> --> gives 11
>
> getValueOfElement("val",1,false);
> --> gives 45
>
> <root>
> <data loc="1">
> <val i="t1"> 22 </val>
> <val i="t2"> 45 </val>
> </data>
> <data loc="2">
> <val i="t1"> 44 </val>
> <val i="t2"> 11 </val>
> </data>
> </root>
>
> Now, I'd like to do something like this in R. Most important would be
to retrieve a node just by its name, not by the whole path. How is it possible?
>
> Can anybody help me with this issue?
>
> Antje
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Regards,
Ajay Ohri
http://tinyurl.com/liajayohri
On 7 September 2008 at 10:22, Antje wrote: | I try to rewrite some Java-code with R. It deals with reading XML files. I [...] | Now, I'd like to do something like this in R. Most important would be to | retrieve a node just by its name, not by the whole path. How is it possible? | | Can anybody help me with this issue? Have you looked at the "XML" package for R ? Dirk -- Three out of two people have difficulties with fractions.
Thanks a lot to Gabor and Duncan! I didn't know that XPath is a standard. I'll give it a deeper look to better understand it. Oh, I guess I understand a bit more xpathApply(doc, "//val", function(n) xmlValue(n)) would search globally for all nodes named "val" and return its values :-) So that's excactly what I was looking for. Not caring about the exact location of a node. I think, in my case it should be okay, to parse for nodes just by their names. Thanks again! @ Ajay: Sorry, but I was looking for a solution with R @ Dirk: I already used the XML package but didn't know the possibilities to access data as I was used to. Antje schrieb:> Hi there, > > I try to rewrite some Java-code with R. It deals with reading XML files. > I started with the XML package. In Java, I had a very useful method > which gave me a node by using: > > name of the node > index of appearance > start point: global (false) / local (true) > > So, I could do something like this. > > setCurrentChildNode("data", 0); > getValueOfElement("val",1,true); > --> gives 45 > > setCurrentChildNode("data", 1); > getValueOfElement("val",1,true); > --> gives 11 > > getValueOfElement("val",1,false); > --> gives 45 > > <root> > <data loc="1"> > <val i="t1"> 22 </val> > <val i="t2"> 45 </val> > </data> > <data loc="2"> > <val i="t1"> 44 </val> > <val i="t2"> 11 </val> > </data> > </root> > > Now, I'd like to do something like this in R. Most important would be to > retrieve a node just by its name, not by the whole path. How is it > possible? > > Can anybody help me with this issue? > > Antje > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >