Displaying 20 results from an estimated 42 matches for "getnodeset".
2010 Aug 30
4
getNodeSet - what am I doing wrong?
Hi,
Why is the following retuning a nodset of length 0:
> library(XML)
> test <- xmlTreeParse(
> "http://www.unimod.org/xml/unimod_tables.xml",useInternalNodes=TRUE)
> getNodeSet(test,"//modifications_row")
Thanks for any hint.
Joh
2011 May 30
1
Need help reading website info with XML package and XPath
...f "doc" that shows and
highlights all the information I'm interested in (note that either url
that's highligted in doc is fine).
http://r.789695.n4.nabble.com/file/n3561075/relevant-section-of-doc.pdf
relevant-section-of-doc.pdf
I'm guessing my xpath statements are wrong or getNodeSet needs something
else to get to information contained in a bubble on a webpage. Any
suggestions or ideas would be GREATLY appreciated.
library(XML)
url <- "http://www.zillow.com/homes/511 W Lafayette St, Norristown, PA_rb"
doc <- htmlTreeParse(url, useInternalNode=TRUE, isURL=TRUE...
2012 May 11
0
Using xpathapply or getnodeset to get text between two distinct tags
...;Ses=1')
#Scrape the page with the links
doc<-scrape(url=hansard, parse=TRUE, follow=TRUE)
#Not sure what exactly this does, but it is necessary
doc<-doc[[1]]
#Get the xmlRoot directory
doc<- xmlRoot(doc)
#Get nodes that contain only the links to each day's transcripts
links<- getNodeSet(doc, "//a[@class='PublicationCalendarLink']/@href")
links<-matrix(links)
#Paste those href links to the root URL
links<-apply(links, 1, function(x) paste('http://www.parl.gc.ca', x, sep=''))
#Inspect
links[1]
#Scrape text from first URL in 'links'
one...
2010 Sep 08
1
XML getNodeSet syntax for PUBMED XML export
I am looking for the syntax to capture XML tags marked with
/DescriptorName MajorTopicYN="Y"/ , but the combination of the internal
space (between "Name" and "Major" and the embedded quote marks are
defeating me. I can get all the "DescriptorName" tags, but these include
both MajroTopicYN = "Y" and "N" variants. Any suggestions?
2012 May 28
1
Rcurl, postForm()
...ML)
library(RCurl)
library(scrapeR)
library(RHTMLForms)
#Set URL
bus<-c('http://www.brantford.ca/business/LocalBusinessCommunity/Pages/BusinessDirectorySearch.aspx')
#Scrape URL
orig<-getURLContent(url=bus)
#Parse doc
doc<-htmlParse(orig[[1]], asText=TRUE)
#Get The forms
forms<-getNodeSet(doc, "//form")
forms[[1]]
#These are the input nodes
getNodeSet(forms[[1]], ".//input")
#These are the select nodes
getNodeSet(forms[[1]], ".//select")
*********************************
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Stree...
2011 Jul 05
2
Stuck ...can't get sapply and xmlTreeParse working
...d=X1-ZWz1bup03e49vv_5kvb6&address=",x,
sep="")
############## problem line is next #################################
zdoc <-xmlTreeParse(url.zill, useInternalNode=TRUE, isURL=TRUE)
############# problem line above ##################################
f$zpid <- sapply(getNodeSet(zdoc, "//result/zpid"), xmlValue)
f$zest.low <-sapply(getNodeSet(zdoc, "//valuationRange/low"), xmlValue)
f$zest <- sapply(getNodeSet(zdoc, "//zestimate/amount"), xmlValue)
rm(zdoc)
return(f)
}
j <-sapply(new.add, FUN=hm)
print(zest)
--
View this mess...
2008 Sep 08
1
another XML package question
Hi there,
does anybody know how to return the xmlPath from a node?
For example, at several location in the xml file, I have nodes with the same
name and I'd like to process only the nodes from a certain path.
Any idea?
Antje
2011 Apr 06
1
Treatment of xml-stylesheet processing instructions in XML module
Hello again,
Another stumble here that is defeating me.
I try:
a<-readLines(url("http://feeds.feedburner.com/grokin"))
t<-XML::xmlTreeParse(a, ignoreBlanks=TRUE, replaceEntities=FALSE,
asText=TRUE)
elem<- XML::getNodeSet(XML::xmlRoot(t),"/rss/channel/item")[[1]]
And I get:
Start tag expected, '<' not found
Error: 1: Start tag expected, '<' not found
When I modify the second line in "a" to remove the following (just
leaving the <rss> tag with its attributes), I do no...
2012 Dec 28
0
How to apply XPath query on XML nodes separately?
...e whole document, *not*
just those of the currently queried parent.
I know, this is because I prefix my XPath Query with // and apparently any
given XMLNode "knows" of his whole document,
but I seem not to be able to find a proper solution.
So, my question is:
How do I restrict a call of getNodeSet to just a XMLNode and not the whole
document it was retrieved from?
I use the XML and RCurl packages. The document I speak of is downloaded
from uniprot.org, a protein knowledge server well known to biologists.
The lamentably somewhat lengthy code follows:
library(XML)
library(RCurl)
getEntries...
2012 Aug 10
3
Parsing large XML documents in R - how to optimize the speed?
...f the XML package when parsing
the xml tree;
-vectorizing the parsing (i.e., replacing loops like "for(node in
group.of.nodes) {...}" by "sapply(group.of.node, function(node){...}")
I gained another 5 seconds by making small changes to the functions used
(like replacing 'getNodeset' by 'xmlElementsByTagName' when I don't need to
navigate to the children nodes).
Now I am blocked at around 35 seconds and I would still like to cut this
time by a 5x, but I have no clue what to do to achieve this gain. I'll try
to expose as briefly as possible the relevant stru...
2011 Jul 10
1
Help with tryCatch
Having a hard time understanding the help files for tryCatch. Looking for a
little help with the following statement which sits inside a for loop
zest[i] <- tryCatch(sapply(getNodeSet(zdoc, "//zestimate/amount"),
xmlValue), error=function() zest[i] <-"NA")
zest is a numeric vector
If the sapply statement evaluates to an error, I'd like to set the value of
zest[i] to NA and continue with the loop.
Suggestions ?
--
View this message in context: http:...
2010 Mar 18
1
Do colClasses in readHTMLTable (XML Package) work?
Hi,
I can't get the colClasses option to work in the readHTMLTable function
of the XML package. Here's a code fragment:
require("XML")
doc <- "http://www.nber.org/cycles/cyclesmain.html"
table <- getNodeSet(htmlParse(doc),"//table") [[2]] # The
main table is the second one because it's embedded in the page table.
xt <- readHTMLTable(
table,
header =
c("peak","trough","contraction","expansion","tr...
2012 Apr 16
1
grep and XML
...le:
https://raw.github.com/currencybot/open-exchange-rates/master/latest.json
This is the code that I'm working with:
library(RCurl)
library(XML)
txt<-getURL("https://raw.github.com/currencybot/open-exchange-rates/master/latest.json")
txt<-htmlParse(txt, asText=TRUE)
txt<- getNodeSet(txt, '//p')
So, I can get the node, properly but then, if I try soething like this:
grep(c('USD'), txt)
I get:
integer(0)
Can anyone suggest a way forward?
Yours, Simon KIss
*********************************
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 G...
2012 Jun 06
1
Process XML files
...um
of users! I was successful in processing files uing R's XML librariy. Thank
you, Rxperts!
I know there are libraries like XML and SPXML available in S-Plus. Could
anyone please share examples of reading an xml file and save the contents
in a data frame?
Are there Splus equivalents of "getNodeSet", "xmlSApply" and "xmlValue"?
Thanks so much!
Santosh
[[alternative HTML version deleted]]
2012 May 17
1
using XML package to read RSS
Hi,
I'm trying to use the XML package to read an RSS feed. To get
started, I was trying to use this post as an example:
http://www.r-bloggers.com/how-to-build-a-dataset-in-r-using-an-rss-feed-or-web-page/
I can replicate the beginning section of the post, but when I try to
use another RSS feed I have an issue. The RSS feed I would like to
use is:
> URL <-
2012 Feb 10
1
Bug with memory allocation when loading Rdata files iteratively?
...ventually).
It just seems like removing the object via |rm()| and firing |gc()| do
not have any effect, so the memory consumption of each loaded R object
cumulates until there's no more memory left :-/
Possibly, this is also related to XML package functionality (mainly
|htmlTreeParse| and |getNodeSet|), but I also experience the described
behavior when simply iteratively loading and removing Rdata files.
I've put together a little example that illustrates the memory
ballooning mentioned above which you can find here:
http://stackoverflow.com/questions/9220849/significant-memory-issue-in...
2011 Feb 24
1
Objects must be passed as an argument or generated in the function, right?
...? What am I missing here?
Thanks
Zheng Jk
parseXmlEntryNodeSet <- function(psimi25file, psimi25source, verbose=TRUE) {
psimi25Doc <- xmlTreeParse(psimi25file, useInternalNodes = TRUE)
psimi25NS <- getDefaultNamespace(psimi25Doc)
namespaces <- c(ns = psimi25NS)
entry <- getNodeSet(psimi25Doc, "/ns:entrySet/ns:entry", namespaces)
if(verbose)
statusDisplay(paste(length(entry),"Entries found\n",sep=" "))
entryCount <- length(nodes)
entryList <- list()
for(i in 1:entryCount) {
entryList[[i]] <- parseXmlEntryNode(doc=psimi2...
2011 Aug 29
1
reading tables from multiple HTML pages
...go about keeping the loop running so I can parse the
rest?
****************************************************
library(XML)
url_root<-"http://www.szrd.gov.cn/viewcommondbfc.do?id="
for(i in 700:750){
url = paste(url_root, i, sep="")
doc = htmlParse(url)
tableNodes = getNodeSet(doc, "//table")
tbl = readHTMLTable(tableNodes[[3]])
}
****************************************************
Steve Oliver
Department of Political Science
University of California at San Diego
9500 Gilman Dr.
La Jolla, CA 92092
--
View this message in context: http://r.789695.n4.nabble.c...
2008 Jul 02
1
Removing or overwriting an XML node
...nd now imagine I want to change <first>Duncan</first>
into e.g.? <initials>D.</initials>. How to do that ?
I am able to add my node:
library(XML)
x <- xmlTreeParse("duncan.xml", useInternalNodes = TRUE)
# find parent, add as last child:
name <- getNodeSet(x, "//name")[[1]]
newXMLNode("initials", "D.", parent=name)
first <- getNodeSet(x, "//first")[[1]]
? # wanted:
# deleteXMLNode(name)
# or
? # replaceXMLNode("initials", "D.", replace=first)
cat(saveXML(x))
free(x)
As...
2008 Nov 05
2
How to extract following data
Hi everyone,
I have this kind of raw dataset :
- <Temp diffgr:id="Temp14" msdata:rowOrder="13">
<Date>2005-01-17T00:00:00+05:30</Date>
<SecurityID>10149</SecurityID>
<PriceClose>1288.40002</PriceClose>
</Temp>
- <Temp diffgr:id="Temp15" msdata:rowOrder="14">