Dear All, I am struggling with the parsing of the xml file you can find at https://www.dropbox.com/s/i4ld5qa26hwrhj7/account.xml?dl=0 Essentially, I would like to be able to convert it to a data.frame to manipulate it in R and detect all the attributes of an account for which unrealizedPNL goes above a threshold. I stored that file as account.xml and looking here and there on the web I put together the following script ##################################################################### library(XML) xmlfile=xmlParse("account.xml") class(xmlfile) #"XMLInternalDocument" "XMLAbstractDocument" xmltop = xmlRoot(xmlfile) #gives content of root class(xmltop)#"XMLInternalElementNode" "XMLInternalNode" "XMLAbstractNode" xmlName(xmltop) #give name of node, PubmedArticleSet xmlSize(xmltop) #how many children in node, 19 xmlName(xmltop[[1]]) #name of root's children # have a look at the content of the first child entry xmltop[[1]] # have a look at the content of the 2nd child entry xmltop[[2]] #Root Node's children number <- xmlSize(xmltop[[1]]) #number of nodes in each child name <- xmlSApply(xmltop[[1]], xmlName) #name(s) attribute <- xmlSApply(xmltop[[1]], xmlAttrs) #attribute(s) size <- xmlSApply(xmltop[[1]], xmlSize) #size values <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue)) ##################################################################### which is leading me nowhere. Any suggestion is appreciated. Cheers Lorenzo
Not sure exactly what you want since you did not show an expected output, but this will extract the attributes from AccVal in the structure:> ##################################################################### > library(XML) > > xmlfile=xmlParse("/temp/account.xml") > > class(xmlfile) #"XMLInternalDocument" "XMLAbstractDocument"[1] "XMLInternalDocument" "XMLAbstractDocument"> xmltop = xmlRoot(xmlfile) #gives content of root > > ##### try this ############## > > accts <- sapply(getNodeSet(xmltop, "//AccVal"), xmlAttrs) > > # create data.frame > accts_df <- as.data.frame(t(accts), stringsAsFactors = FALSE) > str(accts_df)'data.frame': 364 obs. of 4 variables: $ key : chr "AccountCode" "AccountReady" "AccountType" "AccruedCash" ... $ val : chr "DU108063" "true" "CORPORATION" "0" ... $ currency : chr "" "" "" "AUD" ... $ accountName: chr "DU108063" "DU108063" "DU108063" "DU108063" ...> head(accts_df)key val currency accountName 1 AccountCode DU108063 DU108063 2 AccountReady true DU108063 3 AccountType CORPORATION DU108063 4 AccruedCash 0 AUD DU108063 5 AccruedCash 0 BASE DU108063 6 AccruedCash 0 CAD DU108063>Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Sun, Oct 11, 2015 at 3:10 PM, Lorenzo Isella <lorenzo.isella at gmail.com> wrote:> Dear All, > I am struggling with the parsing of the xml file you can find at > > https://www.dropbox.com/s/i4ld5qa26hwrhj7/account.xml?dl=0 > > Essentially, I would like to be able to convert it to a data.frame to > manipulate it in R and detect all the attributes of an account for > which unrealizedPNL goes above a threshold. > I stored that file as account.xml and looking here and there on the > web I put together the following script > > > ##################################################################### > library(XML) > > xmlfile=xmlParse("account.xml") > > class(xmlfile) #"XMLInternalDocument" "XMLAbstractDocument" > xmltop = xmlRoot(xmlfile) #gives content of root > class(xmltop)#"XMLInternalElementNode" "XMLInternalNode" > "XMLAbstractNode" > xmlName(xmltop) #give name of node, PubmedArticleSet > xmlSize(xmltop) #how many children in node, 19 > xmlName(xmltop[[1]]) #name of root's children > > # have a look at the content of the first child entry > xmltop[[1]] > # have a look at the content of the 2nd child entry > xmltop[[2]] > #Root Node's children > number <- xmlSize(xmltop[[1]]) #number of nodes in each child > name <- xmlSApply(xmltop[[1]], xmlName) #name(s) > attribute <- xmlSApply(xmltop[[1]], xmlAttrs) #attribute(s) > size <- xmlSApply(xmltop[[1]], xmlSize) #size > > > values <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue)) > ##################################################################### > > which is leading me nowhere. > Any suggestion is appreciated. > Cheers > > Lorenzo > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Dear Jim, Thanks for your reply. What you did is 100% what I need -- I now have a data frame with the relevant data and I can take up from there. Regards Lorenzo On Sun, Oct 11, 2015 at 03:54:10PM -0400, jim holtman wrote:>Not sure exactly what you want since you did not show an expected output, >but this will extract the attributes from AccVal in the structure: > >> ##################################################################### >> library(XML) >> >> xmlfile=xmlParse("/temp/account.xml") >> >> class(xmlfile) #"XMLInternalDocument" "XMLAbstractDocument" >[1] "XMLInternalDocument" "XMLAbstractDocument" >> xmltop = xmlRoot(xmlfile) #gives content of root >> >> ##### try this ############## >> >> accts <- sapply(getNodeSet(xmltop, "//AccVal"), xmlAttrs) >> >> # create data.frame >> accts_df <- as.data.frame(t(accts), stringsAsFactors = FALSE) >> str(accts_df) >'data.frame': 364 obs. of 4 variables: > $ key : chr "AccountCode" "AccountReady" "AccountType" >"AccruedCash" ... > $ val : chr "DU108063" "true" "CORPORATION" "0" ... > $ currency : chr "" "" "" "AUD" ... > $ accountName: chr "DU108063" "DU108063" "DU108063" "DU108063" ... >> head(accts_df) > key val currency accountName >1 AccountCode DU108063 DU108063 >2 AccountReady true DU108063 >3 AccountType CORPORATION DU108063 >4 AccruedCash 0 AUD DU108063 >5 AccruedCash 0 BASE DU108063 >6 AccruedCash 0 CAD DU108063 >> > > >Jim Holtman >Data Munger Guru > >What is the problem that you are trying to solve? >Tell me what you want to do, not how you want to do it. > >On Sun, Oct 11, 2015 at 3:10 PM, Lorenzo Isella <lorenzo.isella at gmail.com> >wrote: > >> Dear All, >> I am struggling with the parsing of the xml file you can find at >> >> https://www.dropbox.com/s/i4ld5qa26hwrhj7/account.xml?dl=0 >> >> Essentially, I would like to be able to convert it to a data.frame to >> manipulate it in R and detect all the attributes of an account for >> which unrealizedPNL goes above a threshold. >> I stored that file as account.xml and looking here and there on the >> web I put together the following script >> >> >> ##################################################################### >> library(XML) >> >> xmlfile=xmlParse("account.xml") >> >> class(xmlfile) #"XMLInternalDocument" "XMLAbstractDocument" >> xmltop = xmlRoot(xmlfile) #gives content of root >> class(xmltop)#"XMLInternalElementNode" "XMLInternalNode" >> "XMLAbstractNode" >> xmlName(xmltop) #give name of node, PubmedArticleSet >> xmlSize(xmltop) #how many children in node, 19 >> xmlName(xmltop[[1]]) #name of root's children >> >> # have a look at the content of the first child entry >> xmltop[[1]] >> # have a look at the content of the 2nd child entry >> xmltop[[2]] >> #Root Node's children >> number <- xmlSize(xmltop[[1]]) #number of nodes in each child >> name <- xmlSApply(xmltop[[1]], xmlName) #name(s) >> attribute <- xmlSApply(xmltop[[1]], xmlAttrs) #attribute(s) >> size <- xmlSApply(xmltop[[1]], xmlSize) #size >> >> >> values <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue)) >> ##################################################################### >> >> which is leading me nowhere. >> Any suggestion is appreciated. >> Cheers >> >> Lorenzo >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >>