Hi All, I have a XML file like : <city id="2643743" name="London"> <coord lon="-0.13" lat="51.51"/> <country>GB</country> <sun rise="2017-01-30T07:40:36" set="2017-01-30T16:47:56"/> </city> <temperature value="280.15" min="278.15" max="281.15" unit="kelvin"/> <humidity value="81" unit="%"/> <pressure value="1012" unit="hPa"/> <wind> <speed value="4.6" name="Gentle Breeze"/> <gusts/> <direction value="90" code="E" name="East"/> </wind> <clouds value="90" name="overcast clouds"/> <visibility value="10000"/> <precipitation mode="no"/> <weather number="701" value="mist" icon="50d"/> <lastupdate value="2017-01-30T15:50:00"/> </current> I want to create a data frame out of this XML but obviously xmlToDataFrame() is not working. It has dynamic attributes like for node precipitation , it could have attributes like value and mode both if there is ppt in some city. My basic issue now id to read XML attributes of different nodes and convert it into a data frame, I have scraped many forums but could not find any help in this. For starters, please suggest a solution to parse the value of city node and corresponding id, name, lat, long etc. I know I am asking a lot, thanks for reading and cheers! :) -- Regards Archit [[alternative HTML version deleted]]
Hi, There might be an easy solution out there already, but I suspect that you will need to parse the XML yourself. The example below uses package xml2 not XML but you could do this with either. The example simply shows how to get values out of the XML hierarchy. Once you have the attributes you want in hand you can assemble the elements into a data frame (or a tibble from package tibble.) By the way, I had to prepend your example with '<current>' Cheers, Ben ### START library(tidyverse) library(xml2) txt <- "<current><city id=\"2643743\" name=\"London\"><coord lon=\"-0.13\" lat=\"51.51\"/><country>GB</country><sun rise=\"2017-01-30T07:40:36\" set=\"2017-01-30T16:47:56\"/></city><temperature value=\"280.15\" min=\"278.15\" max=\"281.15\" unit=\"kelvin\"/><humidity value=\"81\" unit=\"%\"/><pressure value=\"1012\" unit=\"hPa\"/><wind><speed value=\"4.6\" name=\"Gentle Breeze\"/><gusts/><direction value=\"90\" code=\"E\" name=\"East\"/></wind><clouds value=\"90\" name=\"overcast clouds\"/><visibility value=\"10000\"/><precipitation mode=\"no\"/><weather number=\"701\" value=\"mist\" icon=\"50d\"/><lastupdate value=\"2017-01-30T15:50:00\"/></current>" x <- read_xml(txt) windspeed <- x %>% xml_find_first("wind/speed") %>% xml_attrs() winddir <- x %>% xml_find_first("wind/direction") %>% xml_attrs() windspeed # value name # "4.6" "Gentle Breeze" winddir # value code name # "90" "E" "East" ### END> On Apr 27, 2017, at 6:08 AM, Archit Soni <soni.archit1989 at gmail.com> wrote: > > Hi All, > > I have a XML file like : > > <city id="2643743" name="London"> > <coord lon="-0.13" lat="51.51"/> > <country>GB</country> > <sun rise="2017-01-30T07:40:36" set="2017-01-30T16:47:56"/> > </city> > <temperature value="280.15" min="278.15" max="281.15" unit="kelvin"/> > <humidity value="81" unit="%"/> > <pressure value="1012" unit="hPa"/> > <wind> > <speed value="4.6" name="Gentle Breeze"/> > <gusts/> > <direction value="90" code="E" name="East"/> > </wind> > <clouds value="90" name="overcast clouds"/> > <visibility value="10000"/> > <precipitation mode="no"/> > <weather number="701" value="mist" icon="50d"/> > <lastupdate value="2017-01-30T15:50:00"/> > </current> > > I want to create a data frame out of this XML but > obviously xmlToDataFrame() is not working. > > It has dynamic attributes like for node precipitation , it could have > attributes like value and mode both if there is ppt in some city. > > My basic issue now id to read XML attributes of different nodes and convert > it into a data frame, I have scraped many forums but could not find any > help in this. > > For starters, please suggest a solution to parse the value of city node and > corresponding id, name, lat, long etc. > > I know I am asking a lot, thanks for reading and cheers! :) > > -- > Regards > Archit > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.Ben Tupper Bigelow Laboratory for Ocean Sciences 60 Bigelow Drive, P.O. Box 380 East Boothbay, Maine 04544 http://www.bigelow.org
Thanks Ben, got it working, just want one more help on this, If i have a node like: <precipitation mode="no"/> and in some other city it came like: <precipitation unit="3h" value="0.0925" type="rain"/> How can i make my code to handle this dynamically? I am sorry to ask such novice questions but it would be extremely helpful if you could help me with this. So, i would want my resulting data set from this code:- ppt <- (x %>% xml_find_all("precipitation") %>% xml_attrs()) if mode is no, then the three columns should come and values should be NA and if values are populated then as is. Unit Value Type NA NA NA 3h 0.0925 rain Thanks again and in advance ! Archit On Thu, Apr 27, 2017 at 6:27 PM, Ben Tupper <btupper at bigelow.org> wrote:> Hi, > > There might be an easy solution out there already, but I suspect that you > will need to parse the XML yourself. The example below uses package xml2 > not XML but you could do this with either. The example simply shows how to > get values out of the XML hierarchy. Once you have the attributes you want > in hand you can assemble the elements into a data frame (or a tibble from > package tibble.) > > By the way, I had to prepend your example with '<current>' > > Cheers, > Ben > > ### START > > library(tidyverse) > library(xml2) > > txt <- "<current><city id=\"2643743\" name=\"London\"><coord lon=\"-0.13\" > lat=\"51.51\"/><country>GB</country><sun rise=\"2017-01-30T07:40:36\" > set=\"2017-01-30T16:47:56\"/></city><temperature value=\"280.15\" > min=\"278.15\" max=\"281.15\" unit=\"kelvin\"/><humidity value=\"81\" > unit=\"%\"/><pressure value=\"1012\" unit=\"hPa\"/><wind><speed > value=\"4.6\" name=\"Gentle Breeze\"/><gusts/><direction value=\"90\" > code=\"E\" name=\"East\"/></wind><clouds value=\"90\" name=\"overcast > clouds\"/><visibility value=\"10000\"/><precipitation > mode=\"no\"/><weather number=\"701\" value=\"mist\" > icon=\"50d\"/><lastupdate value=\"2017-01-30T15:50:00\"/></current>" > > x <- read_xml(txt) > > windspeed <- x %>% > xml_find_first("wind/speed") %>% > xml_attrs() > > winddir <- x %>% > xml_find_first("wind/direction") %>% > xml_attrs() > > windspeed > # value name > # "4.6" "Gentle Breeze" > > winddir > # value code name > # "90" "E" "East" > > ### END > > > > > On Apr 27, 2017, at 6:08 AM, Archit Soni <soni.archit1989 at gmail.com> > wrote: > > > > Hi All, > > > > I have a XML file like : > > > > <city id="2643743" name="London"> > > <coord lon="-0.13" lat="51.51"/> > > <country>GB</country> > > <sun rise="2017-01-30T07:40:36" set="2017-01-30T16:47:56"/> > > </city> > > <temperature value="280.15" min="278.15" max="281.15" unit="kelvin"/> > > <humidity value="81" unit="%"/> > > <pressure value="1012" unit="hPa"/> > > <wind> > > <speed value="4.6" name="Gentle Breeze"/> > > <gusts/> > > <direction value="90" code="E" name="East"/> > > </wind> > > <clouds value="90" name="overcast clouds"/> > > <visibility value="10000"/> > > <precipitation mode="no"/> > > <weather number="701" value="mist" icon="50d"/> > > <lastupdate value="2017-01-30T15:50:00"/> > > </current> > > > > I want to create a data frame out of this XML but > > obviously xmlToDataFrame() is not working. > > > > It has dynamic attributes like for node precipitation , it could have > > attributes like value and mode both if there is ppt in some city. > > > > My basic issue now id to read XML attributes of different nodes and > convert > > it into a data frame, I have scraped many forums but could not find any > > help in this. > > > > For starters, please suggest a solution to parse the value of city node > and > > corresponding id, name, lat, long etc. > > > > I know I am asking a lot, thanks for reading and cheers! :) > > > > -- > > Regards > > Archit > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > Ben Tupper > Bigelow Laboratory for Ocean Sciences > 60 Bigelow Drive, P.O. Box 380 > East Boothbay, Maine 04544 > http://www.bigelow.org > > > >-- Regards Archit [[alternative HTML version deleted]]