Hi all,
this is my first post in this mailing group. I hope that anyboby could
help me parsing a xml file.
I found this website http://www.omegahat.org/RSXML/gettingStarted.html
but unfortunately my XML file is not as easy as the one in the example.
Example:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet
href="http://werdis.dwd.de/css/UNIDART/climateTimeseriesOrderByStation.xsl
" type="text/xsl"?>
<data xmlns="http://www.unidart.eu/xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.unidart.eu/xsd
http://werdis.dwd.de/conf/timeseriesExchangeType.xsd">
<stationname value="Aachen">
<v date="2011-04-01" qualityLevel="high"
latitude="50.7839"
longitude="6.0947" altitude="202" unitA="m"
geoQualityLevel="certain"
unitV="degree C">14.1</v>
<v date="2011-04-02">17.6</v>
<v date="2011-04-03">11.5</v>
<v date="2011-04-04">10.0</v>
<v date="2011-04-05"
qualityLevel="low">9.6</v>
<v date="2011-04-06">16.0</v>
</stationname>
<stationname value="Ahaus">
<v date="2011-04-01" qualityLevel="high"
latitude="52.0828"
longitude="6.9417" altitude="45.5" unitA="m"
geoQualityLevel="certain"
unitV="degree C">12.5</v>
<v date="2011-04-02">15.9</v>
<v date="2011-04-03">12.0</v>
<v date="2011-04-04">10.1</v>
<v date="2011-04-05">8.8</v>
<v date="2011-04-06">13.5</v>
</stationname>
</data>
I would like to get a table in R like this:
stationname date value
Aachen 2011-04-01 14.1
Aachen 2011-04-01 17.6
.
.
.
Ahaus 2011-04-06 13.5
I tried to do this:
doc = xmlRoot(xmlTreeParse("de.dwd.klis.TADM.xml"))
tmp = xmlSApply(doc, function(x) xmlSApply(x, xmlValue))
but the stationname was not parsed because "Aachen" is kind of
attribute of stationname.
Could anyone give some help?
Thanks,
kai.
On Wed, Jun 29, 2011 at 8:17 AM, Kai Serschmarn <serschmarn at googlemail.com> wrote:> Hi all, > > this is my first post in this mailing group. I hope that anyboby could help > me parsing a xml file. > I found this website http://www.omegahat.org/RSXML/gettingStarted.html but > unfortunately my XML file is not as easy as the one in the example. > > Example: > > <?xml version="1.0" encoding="UTF-8"?> > <?xml-stylesheet > href="http://werdis.dwd.de/css/UNIDART/climateTimeseriesOrderByStation.xsl" > type="text/xsl"?> > <data xmlns="http://www.unidart.eu/xsd" > ?xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > ?xsi:schemaLocation="http://www.unidart.eu/xsd > ? ?http://werdis.dwd.de/conf/timeseriesExchangeType.xsd"> > <stationname value="Aachen"> > ? <v date="2011-04-01" qualityLevel="high" latitude="50.7839" > longitude="6.0947" altitude="202" unitA="m" geoQualityLevel="certain" > unitV="degree C">14.1</v> > ? <v date="2011-04-02">17.6</v> > ? <v date="2011-04-03">11.5</v> > ? <v date="2011-04-04">10.0</v> > ? <v date="2011-04-05" qualityLevel="low">9.6</v> > ? <v date="2011-04-06">16.0</v> > </stationname> > <stationname value="Ahaus"> > ? <v date="2011-04-01" qualityLevel="high" latitude="52.0828" > longitude="6.9417" altitude="45.5" unitA="m" geoQualityLevel="certain" > unitV="degree C">12.5</v> > ? <v date="2011-04-02">15.9</v> > ? <v date="2011-04-03">12.0</v> > ? <v date="2011-04-04">10.1</v> > ? <v date="2011-04-05">8.8</v> > ? <v date="2011-04-06">13.5</v> > </stationname> > </data> > > > I would like to get a table in R like this: > > stationname ? ? date ? ? ? ? ? ?value > Aachen ? ? ? ? ?2011-04-01 ? ? ?14.1 > Aachen ? ? ? ? ?2011-04-01 ? ? ?17.6 > . > . > . > Ahaus ? ? ? ? ? 2011-04-06 ? ? ?13.5 > > I tried to do this: > > doc = xmlRoot(xmlTreeParse("de.dwd.klis.TADM.xml")) > tmp = xmlSApply(doc, function(x) xmlSApply(x, xmlValue))You can loop over the doc to get to <stationname> elements, then loop over that list to get <v> elements. Then extract the node values and attributes with some assorted selectors: dumpData <- function(doc){ for(i in 1:length(doc)){ stns = doc[[i]] for (j in 1:length(stns)){ cat(stns$attributes['value'],stns[[j]][[1]]$value,stns[[j]]$attributes['date'],"\n") } } } Run that on your doc to see it printed out. Save to a data frame if that's what you need. This is not the perfect way to do it, since if you have other (non <stationname> or <v>) elements it'll try and handle those too, and fail. There's probably a way of looping over all <stationname> elements but XML makes me feel sick when I try and remember how to parse it in R at this time of the morning. its probably in the docs but this should get you started. Barry
Thank you Barry, that works fine.
Sorry for stupid questions... however, I couldn't manage to get a
dataframe out of this.
That's what I was doing:
doc = xmlRoot(xmlTreeParse("de.dwd.klis.TADM.xml"))
dumpData <- function(doc){
for(i in 1:length(doc)){
stns = doc[[i]]
for (j in 1:length(stns)){
cat(stns$attributes['value'],stns[[j]][[1]]$value,stns[[j]]
$attributes['date'],"\n")
}
}
}
dumpData(doc)
Thanks for your helping
kai>
> Am 29.06.2011 um 1106 schrieb Barry Rowlingson:
>
>> Run that on your doc to see it printed out. Save to a data frame if
>> that's what you need.
>>
>> This is not the perfect way to do it, since if you have other (non
>> <stationname> or <v>) elements it'll try and handle
those too, and
>> fail. There's probably a way of looping over all
<stationname>
>> elements but XML makes me feel sick when I try and remember how to
>> parse it in R at this time of the morning. its probably in the docs
>> but this should get you started.
>>
>> Barry
>