similar to: Need help reading website info with XML package and XPath

Displaying 20 results from an estimated 500 matches similar to: "Need help reading website info with XML package and XPath"

2011 May 28
1
newbie xml parsing question
I am trying to read some data off the zillow site. Newbie to xml, html, parsing and the xml package. I've been able to load the web page I'm interested with the following code but I'm not sure of the next step to get the information I'm interested in into R : library(XML) url <- "http://www.zillow.com/homes/511 W Lafayette St, Norristown, PA_rb" doc <-doc <-
2011 Jul 05
2
Stuck ...can't get sapply and xmlTreeParse working
Can't seem to get the code below working. It gets stuck on line 24 inside the function hm; comments show the line in question. The function hm is called by sapply and is at the bottom of the code. Other stuff above line 24 works correctly including the first couple of lines of the function hm. Should I be using a different apply function or am I doing something wrong with xmlTreeParse ?
2011 Jul 10
1
Help with tryCatch
Having a hard time understanding the help files for tryCatch. Looking for a little help with the following statement which sits inside a for loop zest[i] <- tryCatch(sapply(getNodeSet(zdoc, "//zestimate/amount"), xmlValue), error=function() zest[i] <-"NA") zest is a numeric vector If the sapply statement evaluates to an error, I'd like to set the value of zest[i]
2012 Aug 10
3
Parsing large XML documents in R - how to optimize the speed?
Hello everyone, I would like to parse very large xml files from MS/MS experiments and create R objects from their content. (By very large, I mean going up to 5-10Gb, although I am using a 'small' 40M file to test my code.) My first attempt at parsing the 40M file, using the XML package, took more than 2200 seconds and left me quite disappointed. I managed to cut that down to around 40
2012 Jun 06
1
Process XML files
Hello experts, Sorry for posting the SPlus related question here.. I have not found any solution yet after some attempts and hence, sending it to a wider spectrum of users! I was successful in processing files uing R's XML librariy. Thank you, Rxperts! I know there are libraries like XML and SPXML available in S-Plus. Could anyone please share examples of reading an xml file and save the
2007 Sep 01
2
Importing huge XML-Files
Dear all, for my diploma thesis I have to import huge XML-Files into R for statistical processing - huge means a size about 33 MB. I'm using the XML-Package version 1.9 As far as reading the complete file into R via xmlTreeParse doesn't work or is too slow, I'm trying to use xmlEventParse but I got completely stuck. I have many different type of nodes + <configuration>
2008 Jun 25
0
Memory allocation failed: Copying Node
Following code bugs with "Memory allocation failed: Copying Node" error after parsing n thousand files. I have included the main code(below) and functions(after the main code). I am not sure which lines are causing the copying Node which results in memory failure. Please advise. #Beginning of Code for(i in 1:nrow(newFile)) { if(i%%3000 == 0) gc()
2008 Nov 05
2
How to extract following data
Hi everyone, I have this kind of raw dataset : - <Temp diffgr:id="Temp14" msdata:rowOrder="13"> <Date>2005-01-17T00:00:00+05:30</Date> <SecurityID>10149</SecurityID> <PriceClose>1288.40002</PriceClose> </Temp> - <Temp diffgr:id="Temp15" msdata:rowOrder="14">
2012 Dec 28
0
How to apply XPath query on XML nodes separately?
Dear R experts, I try to extract certain child nodes from an XML document and construct a table in which the parent node names are the columns and the child id values, joined in a list, are the cell content. If I first apply an XPath query to extract all above parent nodes, then iterate over those nodes and again apply a XPath query to select their child nodes, I get *ALL* matching child nodes
2012 May 28
1
Rcurl, postForm()
Dear colleagues, Could I get some assistance using postForm() to scrape the business names and addresses at this website: http://www.brantford.ca/business/LocalBusinessCommunity/Pages/BusinessDirectorySearch.aspx I've read through (http://www.omegahat.org/RCurl/RCurlJSS.pdf) and scoured the web for tutorials, but I can't crack it. I'm aware that this is probably a pretty basic
2009 Dec 03
3
Scraping a web page
I would like to be able to submit a list of URLs of various webpages and extract the "content" i.e. not the mark-up of those pages. I can find plenty of examples in the XML library of extracting links from pages but I cannot seem to find a way to extract the text. Any help would be greatly appreciated - I will not know the structure of the URLs I would submit in advance. Any
2008 Jul 02
1
Removing or overwriting an XML node
Hi, I have an existing XML document on disk, which I'd like to use as a template, and exchange a subnode with my own newly created subtree: <?xml version="1.0"?> <Duncan> <name a="1" b="xyz"> <first>Duncan</first> <last>Temple Lang</last> </name> </Duncan> created by e.g. ? library(XML)
2010 Aug 30
4
getNodeSet - what am I doing wrong?
Hi, Why is the following retuning a nodset of length 0: > library(XML) > test <- xmlTreeParse( > "http://www.unimod.org/xml/unimod_tables.xml",useInternalNodes=TRUE) > getNodeSet(test,"//modifications_row") Thanks for any hint. Joh
2009 Sep 03
1
encoding problem using xml package
Dear list I tried to read an xml file using the xml package. Unfortunately, some encoding problems occure. E.g. german Umlaut will be red correctly. I assume that the occurs due to (internal?) conversion to utf-8. To illustrate the problem, I have wrote to xml files. File Test 1 ----------- <?xml version="1.0" encoding="ISO-8859-1"?> <Daten> <ITEM>
2011 Apr 06
1
Treatment of xml-stylesheet processing instructions in XML module
Hello again, Another stumble here that is defeating me. I try: a<-readLines(url("http://feeds.feedburner.com/grokin")) t<-XML::xmlTreeParse(a, ignoreBlanks=TRUE, replaceEntities=FALSE, asText=TRUE) elem<- XML::getNodeSet(XML::xmlRoot(t),"/rss/channel/item")[[1]] And I get: Start tag expected, '<' not found Error: 1: Start tag expected, '<' not
2007 May 14
1
XML parsing under R / Extracting nodes’ values
Hi, I have an XML file which contains among other nodes : ===myXMLfile.xml=== (?) <nbRelations>2</nbRelations> <nbActors>2</nbActors> (...) <nbRuns>5</nbRuns> <nbStep>2000</nbStep> (?) ===End file=== I need to extract those values and to make them R variables such as: nbRelations = 2 nbActors = 2 nbRuns = 5 nbSteps = 2000 I read the help and have
2012 Nov 01
1
How to parse xml with same key name ?
HI, I need to parse an xml where key name are same but values are different. <root> <test> Some dummy text </test> <node id="1">one</node> <node id="2">two</node> <node id="3">three</node> </root> When i use xmlGetAttr() function i always get one as value. How
2012 Dec 20
4
Memory filling up while looping
Hey, I have an double loop like this: chunk <- list(1:10, 11:20, 21:30) for(k in 1:length(chunk)){ print(chunk[k]) DummyCatcher <- NULL for(i in chunk[k]){ print("i load something") dummy <- 1 print("i do something") dummy <- dummy + 1 print("i do put it together") DummyCatcher = rbind(DummyCatcher, dummy) } print("i save a chunk
2012 Oct 26
1
Parsing very large xml datafiles with SAX: How to profile <anonymous> functions?
Hello everyone, I'm trying to parse a very large XML file using SAX with the XML package (i.e., mainly the xmlEventParsing function). This function takes as an argument a list of other functions (handlers) that will be called to handle particular xml nodes. If when I use Rprof(), all the handler functions are lumped together under the <anonymous> label, and I get something like this:
2009 Dec 31
3
XML and RCurl: problem with encoding (htmlTreeParse)
Hi, I'm trying to get data from web page and modify it in R. I have a problem with encoding. I'm not able to get encoding right in htmlTreeParse command. See below > library(RCurl) > library(XML) > > site <- getURL("http://www.aarresaari.net/jobboard/jobs.html") > txt <- readLines(tc <- textConnection(site)); close(tc) > txt <- htmlTreeParse(txt,