thr3ads.net - similar to: "Need help reading website info with XML package and XPath"

Displaying 20 results from an estimated 500 matches similar to: "Need help reading website info with XML package and XPath"

newbie xml parsing question

2011 May 28

newbie xml parsing question

I am trying to read some data off the zillow site. Newbie to xml, html, parsing and the xml package. I've been able to load the web page I'm interested with the following code but I'm not sure of the next step to get the information I'm interested in into R : library(XML) url <- "http://www.zillow.com/homes/511 W Lafayette St, Norristown, PA_rb" doc <-doc <-

Stuck ...can't get sapply and xmlTreeParse working

2011 Jul 05

Stuck ...can't get sapply and xmlTreeParse working

Can't seem to get the code below working. It gets stuck on line 24 inside the function hm; comments show the line in question. The function hm is called by sapply and is at the bottom of the code. Other stuff above line 24 works correctly including the first couple of lines of the function hm. Should I be using a different apply function or am I doing something wrong with xmlTreeParse ?

Help with tryCatch

2011 Jul 10

Help with tryCatch

Having a hard time understanding the help files for tryCatch. Looking for a little help with the following statement which sits inside a for loop zest[i] <- tryCatch(sapply(getNodeSet(zdoc, "//zestimate/amount"), xmlValue), error=function() zest[i] <-"NA") zest is a numeric vector If the sapply statement evaluates to an error, I'd like to set the value of zest[i]

Parsing large XML documents in R - how to optimize the speed?

2012 Aug 10

Parsing large XML documents in R - how to optimize the speed?

Hello everyone, I would like to parse very large xml files from MS/MS experiments and create R objects from their content. (By very large, I mean going up to 5-10Gb, although I am using a 'small' 40M file to test my code.) My first attempt at parsing the 40M file, using the XML package, took more than 2200 seconds and left me quite disappointed. I managed to cut that down to around 40

Process XML files

2012 Jun 06

Process XML files

Hello experts, Sorry for posting the SPlus related question here.. I have not found any solution yet after some attempts and hence, sending it to a wider spectrum of users! I was successful in processing files uing R's XML librariy. Thank you, Rxperts! I know there are libraries like XML and SPXML available in S-Plus. Could anyone please share examples of reading an xml file and save the

Importing huge XML-Files

2007 Sep 01

Importing huge XML-Files

Dear all, for my diploma thesis I have to import huge XML-Files into R for statistical processing - huge means a size about 33 MB. I'm using the XML-Package version 1.9 As far as reading the complete file into R via xmlTreeParse doesn't work or is too slow, I'm trying to use xmlEventParse but I got completely stuck. I have many different type of nodes + <configuration>

Memory allocation failed: Copying Node

2008 Jun 25

Memory allocation failed: Copying Node

Following code bugs with "Memory allocation failed: Copying Node" error after parsing n thousand files. I have included the main code(below) and functions(after the main code). I am not sure which lines are causing the copying Node which results in memory failure. Please advise. #Beginning of Code for(i in 1:nrow(newFile)) { if(i%%3000 == 0) gc()

How to extract following data

2008 Nov 05

How to extract following data

Hi everyone, I have this kind of raw dataset : - <Temp diffgr:id="Temp14" msdata:rowOrder="13"> <Date>2005-01-17T00:00:00+05:30</Date> <SecurityID>10149</SecurityID> <PriceClose>1288.40002</PriceClose> </Temp> - <Temp diffgr:id="Temp15" msdata:rowOrder="14">

How to apply XPath query on XML nodes separately?

2012 Dec 28

How to apply XPath query on XML nodes separately?

Dear R experts, I try to extract certain child nodes from an XML document and construct a table in which the parent node names are the columns and the child id values, joined in a list, are the cell content. If I first apply an XPath query to extract all above parent nodes, then iterate over those nodes and again apply a XPath query to select their child nodes, I get *ALL* matching child nodes

Rcurl, postForm()

2012 May 28

Rcurl, postForm()

Dear colleagues, Could I get some assistance using postForm() to scrape the business names and addresses at this website: http://www.brantford.ca/business/LocalBusinessCommunity/Pages/BusinessDirectorySearch.aspx I've read through (http://www.omegahat.org/RCurl/RCurlJSS.pdf) and scoured the web for tutorials, but I can't crack it. I'm aware that this is probably a pretty basic

Scraping a web page

2009 Dec 03

Scraping a web page

I would like to be able to submit a list of URLs of various webpages and extract the "content" i.e. not the mark-up of those pages. I can find plenty of examples in the XML library of extracting links from pages but I cannot seem to find a way to extract the text. Any help would be greatly appreciated - I will not know the structure of the URLs I would submit in advance. Any

Removing or overwriting an XML node

2008 Jul 02

Removing or overwriting an XML node

Hi, I have an existing XML document on disk, which I'd like to use as a template, and exchange a subnode with my own newly created subtree: <?xml version="1.0"?> <Duncan> <name a="1" b="xyz"> <first>Duncan</first> <last>Temple Lang</last> </name> </Duncan> created by e.g. ? library(XML)

getNodeSet - what am I doing wrong?

2010 Aug 30

getNodeSet - what am I doing wrong?

Hi, Why is the following retuning a nodset of length 0: > library(XML) > test <- xmlTreeParse( > "http://www.unimod.org/xml/unimod_tables.xml",useInternalNodes=TRUE) > getNodeSet(test,"//modifications_row") Thanks for any hint. Joh

encoding problem using xml package

2009 Sep 03

encoding problem using xml package

Dear list I tried to read an xml file using the xml package. Unfortunately, some encoding problems occure. E.g. german Umlaut will be red correctly. I assume that the occurs due to (internal?) conversion to utf-8. To illustrate the problem, I have wrote to xml files. File Test 1 ----------- <?xml version="1.0" encoding="ISO-8859-1"?> <Daten> <ITEM>

Treatment of xml-stylesheet processing instructions in XML module

2011 Apr 06

Treatment of xml-stylesheet processing instructions in XML module

Hello again, Another stumble here that is defeating me. I try: a<-readLines(url("http://feeds.feedburner.com/grokin")) t<-XML::xmlTreeParse(a, ignoreBlanks=TRUE, replaceEntities=FALSE, asText=TRUE) elem<- XML::getNodeSet(XML::xmlRoot(t),"/rss/channel/item")[[1]] And I get: Start tag expected, '<' not found Error: 1: Start tag expected, '<' not

XML parsing under R / Extracting nodes’ values

2007 May 14

XML parsing under R / Extracting nodes’ values

Hi, I have an XML file which contains among other nodes : ===myXMLfile.xml=== (?) <nbRelations>2</nbRelations> <nbActors>2</nbActors> (...) <nbRuns>5</nbRuns> <nbStep>2000</nbStep> (?) ===End file=== I need to extract those values and to make them R variables such as: nbRelations = 2 nbActors = 2 nbRuns = 5 nbSteps = 2000 I read the help and have

How to parse xml with same key name ?

2012 Nov 01

How to parse xml with same key name ?

HI, I need to parse an xml where key name are same but values are different. <root> <test> Some dummy text </test> <node id="1">one</node> <node id="2">two</node> <node id="3">three</node> </root> When i use xmlGetAttr() function i always get one as value. How

Memory filling up while looping

2012 Dec 20

Memory filling up while looping

Hey, I have an double loop like this: chunk <- list(1:10, 11:20, 21:30) for(k in 1:length(chunk)){ print(chunk[k]) DummyCatcher <- NULL for(i in chunk[k]){ print("i load something") dummy <- 1 print("i do something") dummy <- dummy + 1 print("i do put it together") DummyCatcher = rbind(DummyCatcher, dummy) } print("i save a chunk

Parsing very large xml datafiles with SAX: How to profile <anonymous> functions?

2012 Oct 26

Parsing very large xml datafiles with SAX: How to profile <anonymous> functions?

Hello everyone, I'm trying to parse a very large XML file using SAX with the XML package (i.e., mainly the xmlEventParsing function). This function takes as an argument a list of other functions (handlers) that will be called to handle particular xml nodes. If when I use Rprof(), all the handler functions are lumped together under the <anonymous> label, and I get something like this:

XML and RCurl: problem with encoding (htmlTreeParse)

2009 Dec 31

XML and RCurl: problem with encoding (htmlTreeParse)

Hi, I'm trying to get data from web page and modify it in R. I have a problem with encoding. I'm not able to get encoding right in htmlTreeParse command. See below > library(RCurl) > library(XML) > > site <- getURL("http://www.aarresaari.net/jobboard/jobs.html") > txt <- readLines(tc <- textConnection(site)); close(tc) > txt <- htmlTreeParse(txt,

similar to: Need help reading website info with XML package and XPath