search for: getnodeset

Displaying 20 results from an estimated 42 matches for "getnodeset".

2010 Aug 30
4
getNodeSet - what am I doing wrong?
Hi, Why is the following retuning a nodset of length 0: > library(XML) > test <- xmlTreeParse( > "http://www.unimod.org/xml/unimod_tables.xml",useInternalNodes=TRUE) > getNodeSet(test,"//modifications_row") Thanks for any hint. Joh
2011 May 30
1
Need help reading website info with XML package and XPath
...f "doc" that shows and highlights all the information I'm interested in (note that either url that's highligted in doc is fine). http://r.789695.n4.nabble.com/file/n3561075/relevant-section-of-doc.pdf relevant-section-of-doc.pdf I'm guessing my xpath statements are wrong or getNodeSet needs something else to get to information contained in a bubble on a webpage. Any suggestions or ideas would be GREATLY appreciated. library(XML) url <- "http://www.zillow.com/homes/511 W Lafayette St, Norristown, PA_rb" doc <- htmlTreeParse(url, useInternalNode=TRUE, isURL=TRUE...
2012 May 11
0
Using xpathapply or getnodeset to get text between two distinct tags
...;Ses=1') #Scrape the page with the links doc<-scrape(url=hansard, parse=TRUE, follow=TRUE) #Not sure what exactly this does, but it is necessary doc<-doc[[1]] #Get the xmlRoot directory doc<- xmlRoot(doc) #Get nodes that contain only the links to each day's transcripts links<- getNodeSet(doc, "//a[@class='PublicationCalendarLink']/@href") links<-matrix(links) #Paste those href links to the root URL links<-apply(links, 1, function(x) paste('http://www.parl.gc.ca', x, sep='')) #Inspect links[1] #Scrape text from first URL in 'links' one...
2010 Sep 08
1
XML getNodeSet syntax for PUBMED XML export
I am looking for the syntax to capture XML tags marked with /DescriptorName MajorTopicYN="Y"/ , but the combination of the internal space (between "Name" and "Major" and the embedded quote marks are defeating me. I can get all the "DescriptorName" tags, but these include both MajroTopicYN = "Y" and "N" variants. Any suggestions?
2012 May 28
1
Rcurl, postForm()
...ML) library(RCurl) library(scrapeR) library(RHTMLForms) #Set URL bus<-c('http://www.brantford.ca/business/LocalBusinessCommunity/Pages/BusinessDirectorySearch.aspx') #Scrape URL orig<-getURLContent(url=bus) #Parse doc doc<-htmlParse(orig[[1]], asText=TRUE) #Get The forms forms<-getNodeSet(doc, "//form") forms[[1]] #These are the input nodes getNodeSet(forms[[1]], ".//input") #These are the select nodes getNodeSet(forms[[1]], ".//select") ********************************* Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Stree...
2011 Jul 05
2
Stuck ...can't get sapply and xmlTreeParse working
...d=X1-ZWz1bup03e49vv_5kvb6&address=",x, sep="") ############## problem line is next ################################# zdoc <-xmlTreeParse(url.zill, useInternalNode=TRUE, isURL=TRUE) ############# problem line above ################################## f$zpid <- sapply(getNodeSet(zdoc, "//result/zpid"), xmlValue) f$zest.low <-sapply(getNodeSet(zdoc, "//valuationRange/low"), xmlValue) f$zest <- sapply(getNodeSet(zdoc, "//zestimate/amount"), xmlValue) rm(zdoc) return(f) } j <-sapply(new.add, FUN=hm) print(zest) -- View this mess...
2008 Sep 08
1
another XML package question
Hi there, does anybody know how to return the xmlPath from a node? For example, at several location in the xml file, I have nodes with the same name and I'd like to process only the nodes from a certain path. Any idea? Antje
2011 Apr 06
1
Treatment of xml-stylesheet processing instructions in XML module
Hello again, Another stumble here that is defeating me. I try: a<-readLines(url("http://feeds.feedburner.com/grokin")) t<-XML::xmlTreeParse(a, ignoreBlanks=TRUE, replaceEntities=FALSE, asText=TRUE) elem<- XML::getNodeSet(XML::xmlRoot(t),"/rss/channel/item")[[1]] And I get: Start tag expected, '<' not found Error: 1: Start tag expected, '<' not found When I modify the second line in "a" to remove the following (just leaving the <rss> tag with its attributes), I do no...
2012 Dec 28
0
How to apply XPath query on XML nodes separately?
...e whole document, *not* just those of the currently queried parent. I know, this is because I prefix my XPath Query with // and apparently any given XMLNode "knows" of his whole document, but I seem not to be able to find a proper solution. So, my question is: How do I restrict a call of getNodeSet to just a XMLNode and not the whole document it was retrieved from? I use the XML and RCurl packages. The document I speak of is downloaded from uniprot.org, a protein knowledge server well known to biologists. The lamentably somewhat lengthy code follows: library(XML) library(RCurl) getEntries...
2012 Aug 10
3
Parsing large XML documents in R - how to optimize the speed?
...f the XML package when parsing the xml tree; -vectorizing the parsing (i.e., replacing loops like "for(node in group.of.nodes) {...}" by "sapply(group.of.node, function(node){...}") I gained another 5 seconds by making small changes to the functions used (like replacing 'getNodeset' by 'xmlElementsByTagName' when I don't need to navigate to the children nodes). Now I am blocked at around 35 seconds and I would still like to cut this time by a 5x, but I have no clue what to do to achieve this gain. I'll try to expose as briefly as possible the relevant stru...
2011 Jul 10
1
Help with tryCatch
Having a hard time understanding the help files for tryCatch. Looking for a little help with the following statement which sits inside a for loop zest[i] <- tryCatch(sapply(getNodeSet(zdoc, "//zestimate/amount"), xmlValue), error=function() zest[i] <-"NA") zest is a numeric vector If the sapply statement evaluates to an error, I'd like to set the value of zest[i] to NA and continue with the loop. Suggestions ? -- View this message in context: http:...
2010 Mar 18
1
Do colClasses in readHTMLTable (XML Package) work?
Hi, I can't get the colClasses option to work in the readHTMLTable function of the XML package. Here's a code fragment: require("XML") doc <- "http://www.nber.org/cycles/cyclesmain.html" table <- getNodeSet(htmlParse(doc),"//table") [[2]] # The main table is the second one because it's embedded in the page table. xt <- readHTMLTable( table, header = c("peak","trough","contraction","expansion","tr...
2012 Apr 16
1
grep and XML
...le: https://raw.github.com/currencybot/open-exchange-rates/master/latest.json This is the code that I'm working with: library(RCurl) library(XML) txt<-getURL("https://raw.github.com/currencybot/open-exchange-rates/master/latest.json") txt<-htmlParse(txt, asText=TRUE) txt<- getNodeSet(txt, '//p') So, I can get the node, properly but then, if I try soething like this: grep(c('USD'), txt) I get: integer(0) Can anyone suggest a way forward? Yours, Simon KIss ********************************* Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 G...
2012 Jun 06
1
Process XML files
...um of users! I was successful in processing files uing R's XML librariy. Thank you, Rxperts! I know there are libraries like XML and SPXML available in S-Plus. Could anyone please share examples of reading an xml file and save the contents in a data frame? Are there Splus equivalents of "getNodeSet", "xmlSApply" and "xmlValue"? Thanks so much! Santosh [[alternative HTML version deleted]]
2012 May 17
1
using XML package to read RSS
Hi, I'm trying to use the XML package to read an RSS feed. To get started, I was trying to use this post as an example: http://www.r-bloggers.com/how-to-build-a-dataset-in-r-using-an-rss-feed-or-web-page/ I can replicate the beginning section of the post, but when I try to use another RSS feed I have an issue. The RSS feed I would like to use is: > URL <-
2012 Feb 10
1
Bug with memory allocation when loading Rdata files iteratively?
...ventually). It just seems like removing the object via |rm()| and firing |gc()| do not have any effect, so the memory consumption of each loaded R object cumulates until there's no more memory left :-/ Possibly, this is also related to XML package functionality (mainly |htmlTreeParse| and |getNodeSet|), but I also experience the described behavior when simply iteratively loading and removing Rdata files. I've put together a little example that illustrates the memory ballooning mentioned above which you can find here: http://stackoverflow.com/questions/9220849/significant-memory-issue-in...
2011 Feb 24
1
Objects must be passed as an argument or generated in the function, right?
...? What am I missing here? Thanks Zheng Jk parseXmlEntryNodeSet <- function(psimi25file, psimi25source, verbose=TRUE) { psimi25Doc <- xmlTreeParse(psimi25file, useInternalNodes = TRUE) psimi25NS <- getDefaultNamespace(psimi25Doc) namespaces <- c(ns = psimi25NS) entry <- getNodeSet(psimi25Doc, "/ns:entrySet/ns:entry", namespaces) if(verbose) statusDisplay(paste(length(entry),"Entries found\n",sep=" ")) entryCount <- length(nodes) entryList <- list() for(i in 1:entryCount) { entryList[[i]] <- parseXmlEntryNode(doc=psimi2...
2011 Aug 29
1
reading tables from multiple HTML pages
...go about keeping the loop running so I can parse the rest? **************************************************** library(XML) url_root<-"http://www.szrd.gov.cn/viewcommondbfc.do?id=" for(i in 700:750){ url = paste(url_root, i, sep="") doc = htmlParse(url) tableNodes = getNodeSet(doc, "//table") tbl = readHTMLTable(tableNodes[[3]]) } **************************************************** Steve Oliver Department of Political Science University of California at San Diego 9500 Gilman Dr. La Jolla, CA 92092 -- View this message in context: http://r.789695.n4.nabble.c...
2008 Jul 02
1
Removing or overwriting an XML node
...nd now imagine I want to change <first>Duncan</first> into e.g.? <initials>D.</initials>. How to do that ? I am able to add my node: library(XML) x <- xmlTreeParse("duncan.xml", useInternalNodes = TRUE) # find parent, add as last child: name <- getNodeSet(x, "//name")[[1]] newXMLNode("initials", "D.", parent=name) first <- getNodeSet(x, "//first")[[1]] ? # wanted: # deleteXMLNode(name) # or ? # replaceXMLNode("initials", "D.", replace=first) cat(saveXML(x)) free(x) As...
2008 Nov 05
2
How to extract following data
Hi everyone, I have this kind of raw dataset : - <Temp diffgr:id="Temp14" msdata:rowOrder="13"> <Date>2005-01-17T00:00:00+05:30</Date> <SecurityID>10149</SecurityID> <PriceClose>1288.40002</PriceClose> </Temp> - <Temp diffgr:id="Temp15" msdata:rowOrder="14">