thr3ads.net - similar to: "Parsing large XML documents in R

Displaying 20 results from an estimated 1000 matches similar to: "Parsing large XML documents in R - how to optimize the speed?"

Analyzing Publications from Pubmed via XML

2007 Dec 14

Analyzing Publications from Pubmed via XML

I would like to track in which journals articles about a particular disease are being published. Creating a pubmed search is trivial. The search provides data but obviously not as an R dataframe. I can get the search to export the data as an xml feed and the xml package seems to be able to read it. xmlTreeParse("

encoding problem using xml package

2009 Sep 03

encoding problem using xml package

Dear list I tried to read an xml file using the xml package. Unfortunately, some encoding problems occure. E.g. german Umlaut will be red correctly. I assume that the occurs due to (internal?) conversion to utf-8. To illustrate the problem, I have wrote to xml files. File Test 1 ----------- <?xml version="1.0" encoding="ISO-8859-1"?> <Daten> <ITEM>

Package XML: Parse Garmin *.tcx file problems

2011 Mar 30

Package XML: Parse Garmin *.tcx file problems

I'm struggling with package XML to parse a Garmin file (named *.tcx). I wonder if it's form is incomplete, but appreciably reluctant to paste even a shortened version. The output below shows I can get nodes, but an attempt at value of a single node comes up empty (even though there is data there. One question: Has anybody succeeded parsing Garmin .tcx (xml) files? Thanks! Michael

Problem with biomaRt::getSequence.

2013 May 07

Problem with biomaRt::getSequence.

Hi, I can run the code some days ago . But cant run now. Problem 1: Output is ok ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl) utr5 = getSequence(chromosome=3, start=185514033, end=185535839, type="entrezgene",seqType="5utr", mart=ensembl) Output : 5utr

How to filter xml value in R?

2012 Nov 14

How to filter xml value in R?

Hi, I have one xml file. <Class> <Node1 code ="1"> First node </Node1> <Node2 code ="1"> Second node </Node2> <Node3 code ="1"> Third node </Node3> <Node1 code ="2"> Fourth node </Node1> </Class> for (i in 1:xmlSize()) { print(Class[i]) # how can i filter Node1 ? } by

splitting very long character string

2006 Nov 01

splitting very long character string

Hello, I've a very long character array (>500k characters) that need to split by '\n' resulting in an array of about 60k numbers. The help on strsplit says to use perl=TRUE to get better formance, but still it takes several minutes to split this string. The massive string is the return value of a call to xmlElementsByTagName from the XML library and looks like this: ... 12345

More than on loop??

2010 Jan 17

More than on loop??

hello every one, How to function more than one loop in R? I have the following problem to be solved with the a method of three loops, can you help me please? The data is attached with this message. The data is composed of two parts, cleaved (denoted by ?cleaved?) and non cleaved (denoted by ?noncleaved?). ? to access to the ith peptide, you can use X$Peptide[i] ? to access to the ith label,

Importing huge XML-Files

2007 Sep 01

Importing huge XML-Files

Dear all, for my diploma thesis I have to import huge XML-Files into R for statistical processing - huge means a size about 33 MB. I'm using the XML-Package version 1.9 As far as reading the complete file into R via xmlTreeParse doesn't work or is too slow, I'm trying to use xmlEventParse but I got completely stuck. I have many different type of nodes + <configuration>

read xml

2010 Apr 16

read xml

Hi I am trying to read selected fields from a xml file with R using xml package. So far I have learned the basics of this package by going through the manual, examples, tutorial, and so on (www.omegahat.org/RSXML) . The problem is that I am getting stuck when it comes down to more complex xml files. I am a novice in R and xml, and was wondering if someone could help me out with here.

Conflict command getSequence {biomaRt} and getSequence {seqinr} !!

2013 Feb 08

Conflict command getSequence {biomaRt} and getSequence {seqinr} !!

Hi ! Facing problem with " getSequence" commend . when only biomaRt package loaded the following example working well >mart <- useMart("ensembl",dataset="hsapiens_gene_ensembl") >seq = getSequence(id="BRCA1", type="hgnc_symbol", seqType="peptide", mart = mart) show(seq) but when i have loaded the seqinr, i got problem

legend and values do not match in ggplot

2017 Aug 04

legend and values do not match in ggplot

I have following codes for ggplots. The legends are given in the plot do not match with the values specified in the codes given below. Your helps highly appreciated. Greg library(ggplot2) p <- ggplot(a,aes(x=NO_BMI_FI_beta ,y=FI_beta ,color= Super.Pathway))+ theme_bw() +theme(panel.border=element_blank()) + geom_point(size=3) p2<-p+scale_color_manual(name="Super.Pathway",

retrieve certain part from html

2009 Sep 23

retrieve certain part from html

Dear All, Can someone please guide me how to get the certain part from a long html language? e.g. "<td><a href='2005-01.html'>2005-01</a></td><td><a href='2006-01.html'>2006-01</a></td><td><a href='2007-01.html'>2007-01</a></td><td><a

XML and RCurl: problem with encoding (htmlTreeParse)

2009 Dec 31

XML and RCurl: problem with encoding (htmlTreeParse)

Hi, I'm trying to get data from web page and modify it in R. I have a problem with encoding. I'm not able to get encoding right in htmlTreeParse command. See below > library(RCurl) > library(XML) > > site <- getURL("http://www.aarresaari.net/jobboard/jobs.html") > txt <- readLines(tc <- textConnection(site)); close(tc) > txt <- htmlTreeParse(txt,

Re ad HTML table

2007 Nov 18

Re ad HTML table

You can use htmlTreeParse and xpathApply from the XML library. something like: xpathApply( htmlTreeParse("http://blabla", useInt=T), "//td", function(x) xmlValue(x)) should do it. Gamma wrote: > > anyone care to explain how to read a html table, it's streaming data > (updated every second) and i am looking for a suitable function. > > The imported html

How to parse XML

2008 May 02

How to parse XML

I would like to learn how to parse a mixed text/xml document I downloaded from the sec.gov website (see example below). I would like to parse this to get the value for each xml tag and then access it within R, but I don't know much about xml so I don't even know where to start debugging the errors I am getting in this example code. Can anyone help me get started? Thanks, Roger ftp

XML - get node by name

2008 Sep 07

XML - get node by name

Hi there, I try to rewrite some Java-code with R. It deals with reading XML files. I started with the XML package. In Java, I had a very useful method which gave me a node by using: name of the node index of appearance start point: global (false) / local (true) So, I could do something like this. setCurrentChildNode("data", 0); getValueOfElement("val",1,true); -->

Creating a Data Frame from an XML

2013 Jan 22

Creating a Data Frame from an XML

Hello, I'm attempting to read information from an XML into a data frame in R using the "XML" package. I am unable to get the data into a data frame as I would like. I have some sample code below. *XML Code:* Header... Data I want in a data frame: <data> <row BRAND="GMC" NUM="1" YEAR="1999" VALUE="10000" /> <row

XML parameters to Column Headers for importing into a dataset

2008 Jun 12

XML parameters to Column Headers for importing into a dataset

Dear List, Do you know any way I can convert XML parameters into column headers. My data is in a csv file with each row containing a xml form of data , and multiple parameters ( <param1> data_val1 </param2> , <param2> data_val2 </param2> ) I want to convert it so each row caters to one record and each parameter becomes a different column. param1

Webscraping - How to Scrape Out Text Into R As If Copied & Pasted From Webpage?

2011 Oct 26

Webscraping - How to Scrape Out Text Into R As If Copied & Pasted From Webpage?

Greetings, I am trying to get all of the text from a web page as if I "selected all" on the page, pasted into a text file, and then read in the text file with read.csv(). # this is the actual page I'm trying to acquire text from: web.pg <- readLines("http://www.airweb.org/?page=574") # then parsed in hopes of an easier structure to work with: web.pg <-

Scrap java scripts and styles from an html document

2011 Mar 29

Scrap java scripts and styles from an html document

Hi, I am working on developing a web crawler in R and I needed some help with regard to removal of javascripts and style sheets from the html document of a web page. i tried using the xml package, hence the function xpathApply library(XML) txt = xpathApply(html,"//body//text()[not(ancestor::script)][not(ancestor::style)]", xmlValue) The output comes out as text lines, without any html

similar to: Parsing large XML documents in R - how to optimize the speed?