Hi all, this is my first post in this mailing group. I hope that anyboby could help me parsing a xml file. I found this website http://www.omegahat.org/RSXML/gettingStarted.html but unfortunately my XML file is not as easy as the one in the example. Example: <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet href="http://werdis.dwd.de/css/UNIDART/climateTimeseriesOrderByStation.xsl " type="text/xsl"?> <data xmlns="http://www.unidart.eu/xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.unidart.eu/xsd http://werdis.dwd.de/conf/timeseriesExchangeType.xsd"> <stationname value="Aachen"> <v date="2011-04-01" qualityLevel="high" latitude="50.7839" longitude="6.0947" altitude="202" unitA="m" geoQualityLevel="certain" unitV="degree C">14.1</v> <v date="2011-04-02">17.6</v> <v date="2011-04-03">11.5</v> <v date="2011-04-04">10.0</v> <v date="2011-04-05" qualityLevel="low">9.6</v> <v date="2011-04-06">16.0</v> </stationname> <stationname value="Ahaus"> <v date="2011-04-01" qualityLevel="high" latitude="52.0828" longitude="6.9417" altitude="45.5" unitA="m" geoQualityLevel="certain" unitV="degree C">12.5</v> <v date="2011-04-02">15.9</v> <v date="2011-04-03">12.0</v> <v date="2011-04-04">10.1</v> <v date="2011-04-05">8.8</v> <v date="2011-04-06">13.5</v> </stationname> </data> I would like to get a table in R like this: stationname date value Aachen 2011-04-01 14.1 Aachen 2011-04-01 17.6 . . . Ahaus 2011-04-06 13.5 I tried to do this: doc = xmlRoot(xmlTreeParse("de.dwd.klis.TADM.xml")) tmp = xmlSApply(doc, function(x) xmlSApply(x, xmlValue)) but the stationname was not parsed because "Aachen" is kind of attribute of stationname. Could anyone give some help? Thanks, kai.
On Wed, Jun 29, 2011 at 8:17 AM, Kai Serschmarn <serschmarn at googlemail.com> wrote:> Hi all, > > this is my first post in this mailing group. I hope that anyboby could help > me parsing a xml file. > I found this website http://www.omegahat.org/RSXML/gettingStarted.html but > unfortunately my XML file is not as easy as the one in the example. > > Example: > > <?xml version="1.0" encoding="UTF-8"?> > <?xml-stylesheet > href="http://werdis.dwd.de/css/UNIDART/climateTimeseriesOrderByStation.xsl" > type="text/xsl"?> > <data xmlns="http://www.unidart.eu/xsd" > ?xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > ?xsi:schemaLocation="http://www.unidart.eu/xsd > ? ?http://werdis.dwd.de/conf/timeseriesExchangeType.xsd"> > <stationname value="Aachen"> > ? <v date="2011-04-01" qualityLevel="high" latitude="50.7839" > longitude="6.0947" altitude="202" unitA="m" geoQualityLevel="certain" > unitV="degree C">14.1</v> > ? <v date="2011-04-02">17.6</v> > ? <v date="2011-04-03">11.5</v> > ? <v date="2011-04-04">10.0</v> > ? <v date="2011-04-05" qualityLevel="low">9.6</v> > ? <v date="2011-04-06">16.0</v> > </stationname> > <stationname value="Ahaus"> > ? <v date="2011-04-01" qualityLevel="high" latitude="52.0828" > longitude="6.9417" altitude="45.5" unitA="m" geoQualityLevel="certain" > unitV="degree C">12.5</v> > ? <v date="2011-04-02">15.9</v> > ? <v date="2011-04-03">12.0</v> > ? <v date="2011-04-04">10.1</v> > ? <v date="2011-04-05">8.8</v> > ? <v date="2011-04-06">13.5</v> > </stationname> > </data> > > > I would like to get a table in R like this: > > stationname ? ? date ? ? ? ? ? ?value > Aachen ? ? ? ? ?2011-04-01 ? ? ?14.1 > Aachen ? ? ? ? ?2011-04-01 ? ? ?17.6 > . > . > . > Ahaus ? ? ? ? ? 2011-04-06 ? ? ?13.5 > > I tried to do this: > > doc = xmlRoot(xmlTreeParse("de.dwd.klis.TADM.xml")) > tmp = xmlSApply(doc, function(x) xmlSApply(x, xmlValue))You can loop over the doc to get to <stationname> elements, then loop over that list to get <v> elements. Then extract the node values and attributes with some assorted selectors: dumpData <- function(doc){ for(i in 1:length(doc)){ stns = doc[[i]] for (j in 1:length(stns)){ cat(stns$attributes['value'],stns[[j]][[1]]$value,stns[[j]]$attributes['date'],"\n") } } } Run that on your doc to see it printed out. Save to a data frame if that's what you need. This is not the perfect way to do it, since if you have other (non <stationname> or <v>) elements it'll try and handle those too, and fail. There's probably a way of looping over all <stationname> elements but XML makes me feel sick when I try and remember how to parse it in R at this time of the morning. its probably in the docs but this should get you started. Barry
Thank you Barry, that works fine. Sorry for stupid questions... however, I couldn't manage to get a dataframe out of this. That's what I was doing: doc = xmlRoot(xmlTreeParse("de.dwd.klis.TADM.xml")) dumpData <- function(doc){ for(i in 1:length(doc)){ stns = doc[[i]] for (j in 1:length(stns)){ cat(stns$attributes['value'],stns[[j]][[1]]$value,stns[[j]] $attributes['date'],"\n") } } } dumpData(doc) Thanks for your helping kai> > Am 29.06.2011 um 1106 schrieb Barry Rowlingson: > >> Run that on your doc to see it printed out. Save to a data frame if >> that's what you need. >> >> This is not the perfect way to do it, since if you have other (non >> <stationname> or <v>) elements it'll try and handle those too, and >> fail. There's probably a way of looping over all <stationname> >> elements but XML makes me feel sick when I try and remember how to >> parse it in R at this time of the morning. its probably in the docs >> but this should get you started. >> >> Barry >