thr3ads.net - similar to: "another XML package question"

Displaying 20 results from an estimated 100 matches similar to: "another XML package question"

2010 Mar 31

[PATCH] Remove v2v-snapshot

getNodeSet - what am I doing wrong?

2010 Aug 30

getNodeSet - what am I doing wrong?

Hi, Why is the following retuning a nodset of length 0: > library(XML) > test <- xmlTreeParse( > "http://www.unimod.org/xml/unimod_tables.xml",useInternalNodes=TRUE) > getNodeSet(test,"//modifications_row") Thanks for any hint. Joh

Need help reading website info with XML package and XPath

2011 May 30

Need help reading website info with XML package and XPath

Hi, I'm looking for help extracting some information of the zillow website. I'd like to do this for the general case where I manually change the address by modifying the url (see code below). With the url containing the address, I'd like to be able to extract the same information each time. The specific information I'd like to be able to extract includes the homedetails url, price

Parse XML

2008 Jun 10

Parse XML

Could someone provide a link or examples of parsing XML document in R? Few specific questions below: For instance I can retrieve specific nodes using this: node <- xpathApply(xml, "//" %+% xtag, xmlValue) 1) I want to be able to retrieve parent node for this node, how can I do this? getParentNode() does not seem to cut it. 2) How can I retrieve children nodes for a particular

Stuck ...can't get sapply and xmlTreeParse working

2011 Jul 05

Stuck ...can't get sapply and xmlTreeParse working

Can't seem to get the code below working. It gets stuck on line 24 inside the function hm; comments show the line in question. The function hm is called by sapply and is at the bottom of the code. Other stuff above line 24 works correctly including the first couple of lines of the function hm. Should I be using a different apply function or am I doing something wrong with xmlTreeParse ?

Treatment of xml-stylesheet processing instructions in XML module

2011 Apr 06

Treatment of xml-stylesheet processing instructions in XML module

Hello again, Another stumble here that is defeating me. I try: a<-readLines(url("http://feeds.feedburner.com/grokin")) t<-XML::xmlTreeParse(a, ignoreBlanks=TRUE, replaceEntities=FALSE, asText=TRUE) elem<- XML::getNodeSet(XML::xmlRoot(t),"/rss/channel/item")[[1]] And I get: Start tag expected, '<' not found Error: 1: Start tag expected, '<' not

Rcurl, postForm()

2012 May 28

Rcurl, postForm()

Dear colleagues, Could I get some assistance using postForm() to scrape the business names and addresses at this website: http://www.brantford.ca/business/LocalBusinessCommunity/Pages/BusinessDirectorySearch.aspx I've read through (http://www.omegahat.org/RCurl/RCurlJSS.pdf) and scoured the web for tutorials, but I can't crack it. I'm aware that this is probably a pretty basic

using XML package to read RSS

2012 May 17

using XML package to read RSS

Hi, I'm trying to use the XML package to read an RSS feed. To get started, I was trying to use this post as an example: http://www.r-bloggers.com/how-to-build-a-dataset-in-r-using-an-rss-feed-or-web-page/ I can replicate the beginning section of the post, but when I try to use another RSS feed I have an issue. The RSS feed I would like to use is: > URL <-

Parsing large XML documents in R - how to optimize the speed?

2012 Aug 10

Parsing large XML documents in R - how to optimize the speed?

Hello everyone, I would like to parse very large xml files from MS/MS experiments and create R objects from their content. (By very large, I mean going up to 5-10Gb, although I am using a 'small' 40M file to test my code.) My first attempt at parsing the 40M file, using the XML package, took more than 2200 seconds and left me quite disappointed. I managed to cut that down to around 40

How to extract following data

2008 Nov 05

How to extract following data

Hi everyone, I have this kind of raw dataset : - <Temp diffgr:id="Temp14" msdata:rowOrder="13"> <Date>2005-01-17T00:00:00+05:30</Date> <SecurityID>10149</SecurityID> <PriceClose>1288.40002</PriceClose> </Temp> - <Temp diffgr:id="Temp15" msdata:rowOrder="14">

XML getNodeSet syntax for PUBMED XML export

2010 Sep 08

XML getNodeSet syntax for PUBMED XML export

I am looking for the syntax to capture XML tags marked with /DescriptorName MajorTopicYN="Y"/ , but the combination of the internal space (between "Name" and "Major" and the embedded quote marks are defeating me. I can get all the "DescriptorName" tags, but these include both MajroTopicYN = "Y" and "N" variants. Any suggestions?

Importing huge XML-Files

2007 Sep 01

Importing huge XML-Files

Dear all, for my diploma thesis I have to import huge XML-Files into R for statistical processing - huge means a size about 33 MB. I'm using the XML-Package version 1.9 As far as reading the complete file into R via xmlTreeParse doesn't work or is too slow, I'm trying to use xmlEventParse but I got completely stuck. I have many different type of nodes + <configuration>

retrieve certain part from html

2009 Sep 23

retrieve certain part from html

Dear All, Can someone please guide me how to get the certain part from a long html language? e.g. "<td><a href='2005-01.html'>2005-01</a></td><td><a href='2006-01.html'>2006-01</a></td><td><a href='2007-01.html'>2007-01</a></td><td><a

is there a way to extract fata from web pages through some R function ?

2009 Jul 01

is there a way to extract fata from web pages through some R function ?

I deal with a huge amount of Biology data stored in different databases. The databases belongig to Bioconductor organization can be accessed through Bioconductor packages. Unluckily some useful data is stored in databases like, for instance, miRDB, miRecords, etc ... which offer just an interactive HTML interface. See for instance http://mirdb.org/cgi-bin/search.cgi,

Scraping a web page.

2012 May 14

Scraping a web page.

Folks, I want to scrape a series of web-page sources for strings like the following: "/en/Ships/A-8605507.html" "/en/Ships/Aalborg-8122830.html" which appear in an href inside an <a> tag inside a <div> tag inside a table. In fact all I want is the (exactly) 7-digit number before ".html". The good news is that as far as I can tell the the <a>

Do colClasses in readHTMLTable (XML Package) work?

2010 Mar 18

Do colClasses in readHTMLTable (XML Package) work?

Hi, I can't get the colClasses option to work in the readHTMLTable function of the XML package. Here's a code fragment: require("XML") doc <- "http://www.nber.org/cycles/cyclesmain.html" table <- getNodeSet(htmlParse(doc),"//table") [[2]] # The main table is the second one because it's embedded in the page table. xt

Help with tryCatch

2011 Jul 10

Help with tryCatch

Having a hard time understanding the help files for tryCatch. Looking for a little help with the following statement which sits inside a for loop zest[i] <- tryCatch(sapply(getNodeSet(zdoc, "//zestimate/amount"), xmlValue), error=function() zest[i] <-"NA") zest is a numeric vector If the sapply statement evaluates to an error, I'd like to set the value of zest[i]

grep and XML

2012 Apr 16

grep and XML

Hi all: I struggle a lot scraping web data. I still haven't got a handle on the XML package. I'd like to get particular exchange rates from this table: https://raw.github.com/currencybot/open-exchange-rates/master/latest.json This is the code that I'm working with: library(RCurl) library(XML)

Process XML files

2012 Jun 06

Process XML files

Hello experts, Sorry for posting the SPlus related question here.. I have not found any solution yet after some attempts and hence, sending it to a wider spectrum of users! I was successful in processing files uing R's XML librariy. Thank you, Rxperts! I know there are libraries like XML and SPXML available in S-Plus. Could anyone please share examples of reading an xml file and save the

Bug with memory allocation when loading Rdata files iteratively?

2012 Feb 10

Bug with memory allocation when loading Rdata files iteratively?

Dear list, when iterating over a set of Rdata files that are loaded, analyzed and then removed from memory again, I experience a *significant* increase in an R process' memory consumption (killing the process eventually). It just seems like removing the object via |rm()| and firing |gc()| do not have any effect, so the memory consumption of each loaded R object cumulates until

similar to: another XML package question