thr3ads.net - R help - [R] XML - get node by name [Sep 2008]

If this information is useful, please help other people find it:
Share via:

Antje

2008-Sep-07 08:22 UTC

[R] XML - get node by name

Hi there,

I try to rewrite some Java-code with R. It deals with reading XML files. I 
started with the XML package. In Java, I had a very useful method which gave me 
a node by using:

name of the node
index of appearance
start point: global (false) / local (true)

So, I could do something like this.

setCurrentChildNode("data", 0);
getValueOfElement("val",1,true);
--> gives 45

setCurrentChildNode("data", 1);
getValueOfElement("val",1,true);
--> gives 11

getValueOfElement("val",1,false);
--> gives 45

<root>
   <data loc="1">
     <val i="t1"> 22 </val>
     <val i="t2"> 45 </val>
   </data>
   <data loc="2">
     <val i="t1"> 44 </val>
     <val i="t2"> 11 </val>
   </data>
</root>

Now, I'd like to do something like this in R. Most important would be to 
retrieve a node just by its name, not by the whole path. How is it possible?

Can anybody help me with this issue?

Antje

Duncan Temple Lang

2008-Sep-07 14:17 UTC

head link

[R] XML - get node by name

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Antje

Well, the XML package gives you a variety of ways to parse
an XML document and manipulate it in R.
Perhaps the approach that best matches the Java-style you
outline is to use XPath to access nodes.
To do this, you use
  doc = xmlTreeParse("filename.xml", useInternalNodes = TRUE)

and then access the elements of interest with XPath queries, e.g.
to get the value of the second <val> element within each <data>
element, use

  xpathApply(doc, "//data", function(n) xmlValue(n[[2]]))

To get the first <val> node in the first <data> you could use

  doc[ "//data/val" ] [[1]]

or

  doc[[ "//data[1]/val[1]" ]]


(Note the indexing/subsetting is being done in different languages.)


Being able to access a node by just its name is convenient,
but it may not be adequate. You may pick up too many matching nodes.
So XPath is a powerful way to be able to use simplicity when it is
adequate and more explicit constrantts on the path when more
specificity is necessary.  And XPath is a widespread standard
mechanism for XML rather than specific to R or Java.

HTH,

  D.


Antje wrote:> Hi there,
> 
> I try to rewrite some Java-code with R. It deals with reading XML files.
> I started with the XML package. In Java, I had a very useful method
> which gave me a node by using:
> 
> name of the node
> index of appearance
> start point: global (false) / local (true)
> 
> So, I could do something like this.
> 
> setCurrentChildNode("data", 0);
> getValueOfElement("val",1,true);
> --> gives 45
> 
> setCurrentChildNode("data", 1);
> getValueOfElement("val",1,true);
> --> gives 11
> 
> getValueOfElement("val",1,false);
> --> gives 45
> 
> <root>
>   <data loc="1">
>     <val i="t1"> 22 </val>
>     <val i="t2"> 45 </val>
>   </data>
>   <data loc="2">
>     <val i="t1"> 44 </val>
>     <val i="t2"> 11 </val>
>   </data>
> </root>
> 
> Now, I'd like to do something like this in R. Most important would be
to
> retrieve a node just by its name, not by the whole path. How is it
> possible?
> 
> Can anybody help me with this issue?
> 
> Antje
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjD4osACgkQ9p/Jzwa2QP7ZUACfYpsezY4T2AeKb3G7Jo6Vr0N0
RmwAnAtKCY5s8vBoDx7C1DFP24eveCtk
=XWJ8
-----END PGP SIGNATURE-----

Ajay ohri

2008-Sep-07 15:38 UTC

head link

[R] XML - get node by name

well not sure how its done in R , but heres a way to do it in simple Excel.
http://decisionstats.com/2008/parsing-xml-files-easily/

Parsing XML files easily

To parse a XML (or KML or PMML) file easily without using any
complicated softwares, here is a piece of code that fits right in your
excel sheet.

Just import this file using Excel, and then use the function
getElement, after pasting the XML code in 1 cell.

xml-getelement

It is used  for simply reading the xml/kml code as a text string. Just
pasted all the xml code in one cell, and used the start ,end function
(for example start=<constraints> and end=</constraints> to get the
value of constraints in the xml code).

Simply read into the value in another cell using the getElement function.

heres the code if you ever need it.Just paste it into the VB editor of
Excel to create the GetElement function (if not there already) or
simply import the file in the link above.

Attribute VB_Name = "Module1?
Public Function getElement(xml As String, start As String, finish As String)
  For i = 1 To Len(xml)
    If Mid(xml, i, Len(start)) = start Then
      For j = i + Len(start) To Len(xml)
        If Mid(xml, j, Len(finish)) = finish Then
          getElement = Mid(xml, i + Len(start), j - i - Len(start))
          Exit Function
        End If
      Next j
    End If
  Next i
End Function

On Sun, Sep 7, 2008 at 1:52 PM, Antje <niederlein-rstat at yahoo.de>
wrote:>
> Hi there,
>
> I try to rewrite some Java-code with R. It deals with reading XML files. I
started with the XML package. In Java, I had a very useful method which gave me
a node by using:
>
> name of the node
> index of appearance
> start point: global (false) / local (true)
>
> So, I could do something like this.
>
> setCurrentChildNode("data", 0);
> getValueOfElement("val",1,true);
> --> gives 45
>
> setCurrentChildNode("data", 1);
> getValueOfElement("val",1,true);
> --> gives 11
>
> getValueOfElement("val",1,false);
> --> gives 45
>
> <root>
>  <data loc="1">
>    <val i="t1"> 22 </val>
>    <val i="t2"> 45 </val>
>  </data>
>  <data loc="2">
>    <val i="t1"> 44 </val>
>    <val i="t2"> 11 </val>
>  </data>
> </root>
>
> Now, I'd like to do something like this in R. Most important would be
to retrieve a node just by its name, not by the whole path. How is it possible?
>
> Can anybody help me with this issue?
>
> Antje
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Regards,

Ajay Ohri
http://tinyurl.com/liajayohri

Dirk Eddelbuettel

2008-Sep-07 15:42 UTC

head link

[R] XML - get node by name

On 7 September 2008 at 10:22, Antje wrote:
| I try to rewrite some Java-code with R. It deals with reading XML files. I 
[...]
| Now, I'd like to do something like this in R. Most important would be to 
| retrieve a node just by its name, not by the whole path. How is it possible?
| 
| Can anybody help me with this issue?

Have you looked at the "XML" package for R ?

Dirk

-- 
Three out of two people have difficulties with fractions.

Antje

2008-Sep-07 18:56 UTC

head link

[R] XML - get node by name

Thanks a lot to Gabor and Duncan!

I didn't know that XPath is a standard. I'll give it a deeper look to
better
understand it.

Oh, I guess I understand a bit more

xpathApply(doc, "//val", function(n) xmlValue(n))

would search globally for all nodes named "val" and return its values
:-)
So that's excactly what I was looking for. Not caring about the exact
location
of a node.
I think, in my case it should be okay, to parse for nodes just by their names.

Thanks again!

@ Ajay: Sorry, but I was looking for a solution with R
@ Dirk: I already used the XML package but didn't know the possibilities to 
access data as I was used to.



Antje schrieb:> Hi there,
> 
> I try to rewrite some Java-code with R. It deals with reading XML files. 
> I started with the XML package. In Java, I had a very useful method 
> which gave me a node by using:
> 
> name of the node
> index of appearance
> start point: global (false) / local (true)
> 
> So, I could do something like this.
> 
> setCurrentChildNode("data", 0);
> getValueOfElement("val",1,true);
> --> gives 45
> 
> setCurrentChildNode("data", 1);
> getValueOfElement("val",1,true);
> --> gives 11
> 
> getValueOfElement("val",1,false);
> --> gives 45
> 
> <root>
>   <data loc="1">
>     <val i="t1"> 22 </val>
>     <val i="t2"> 45 </val>
>   </data>
>   <data loc="2">
>     <val i="t1"> 44 </val>
>     <val i="t2"> 11 </val>
>   </data>
> </root>
> 
> Now, I'd like to do something like this in R. Most important would be
to
> retrieve a node just by its name, not by the whole path. How is it 
> possible?
> 
> Can anybody help me with this issue?
> 
> Antje
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Maybe Matching Threads

Search for more possibly parallel threads

R help - Sep 2008 - XML - get node by name

[R] XML - get node by name

[R] XML - get node by name

[R] XML - get node by name

[R] XML - get node by name

[R] XML - get node by name

Maybe Matching Threads