similar to: Parsing large XML documents in R - how to optimize the speed?

Displaying 20 results from an estimated 1000 matches similar to: "Parsing large XML documents in R - how to optimize the speed?"

2007 Dec 14
6
Analyzing Publications from Pubmed via XML
I would like to track in which journals articles about a particular disease are being published. Creating a pubmed search is trivial. The search provides data but obviously not as an R dataframe. I can get the search to export the data as an xml feed and the xml package seems to be able to read it. xmlTreeParse("
2009 Sep 03
1
encoding problem using xml package
Dear list I tried to read an xml file using the xml package. Unfortunately, some encoding problems occure. E.g. german Umlaut will be red correctly. I assume that the occurs due to (internal?) conversion to utf-8. To illustrate the problem, I have wrote to xml files. File Test 1 ----------- <?xml version="1.0" encoding="ISO-8859-1"?> <Daten> <ITEM>
2011 Mar 30
1
Package XML: Parse Garmin *.tcx file problems
I'm struggling with package XML to parse a Garmin file (named *.tcx). I wonder if it's form is incomplete, but appreciably reluctant to paste even a shortened version. The output below shows I can get nodes, but an attempt at value of a single node comes up empty (even though there is data there. One question: Has anybody succeeded parsing Garmin .tcx (xml) files? Thanks! Michael
2013 May 07
1
Problem with biomaRt::getSequence.
Hi, I can run the code some days ago . But cant run now.  Problem 1: Output is ok ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl) utr5 = getSequence(chromosome=3, start=185514033, end=185535839, type="entrezgene",seqType="5utr", mart=ensembl)  Output :                                                                                                5utr
2012 Nov 14
2
How to filter xml value in R?
Hi, I have one xml file. <Class> <Node1 code ="1"> First node </Node1> <Node2 code ="1"> Second node </Node2> <Node3 code ="1"> Third node </Node3> <Node1 code ="2"> Fourth node </Node1> </Class> for (i in 1:xmlSize()) { print(Class[i]) # how can i filter Node1 ? } by
2006 Nov 01
4
splitting very long character string
Hello, I've a very long character array (>500k characters) that need to split by '\n' resulting in an array of about 60k numbers. The help on strsplit says to use perl=TRUE to get better formance, but still it takes several minutes to split this string. The massive string is the return value of a call to xmlElementsByTagName from the XML library and looks like this: ... 12345
2010 Jan 17
6
More than on loop??
hello every one, How to function more than one loop in R? I have the following problem to be solved with the a method of three loops, can you help me please? The data is attached with this message. The data is composed of two parts, cleaved (denoted by ?cleaved?) and non cleaved (denoted by ?noncleaved?). ? to access to the ith peptide, you can use X$Peptide[i] ? to access to the ith label,
2007 Sep 01
2
Importing huge XML-Files
Dear all, for my diploma thesis I have to import huge XML-Files into R for statistical processing - huge means a size about 33 MB. I'm using the XML-Package version 1.9 As far as reading the complete file into R via xmlTreeParse doesn't work or is too slow, I'm trying to use xmlEventParse but I got completely stuck. I have many different type of nodes + <configuration>
2010 Apr 16
0
read xml
Hi I am trying to read selected fields from a xml file with R using xml package. So far I have learned the basics of this package by going through the manual, examples, tutorial, and so on (www.omegahat.org/RSXML) . The problem is that I am getting stuck when it comes down to more complex xml files. I am a novice in R and xml, and was wondering if someone could help me out with here.
2013 Feb 08
1
Conflict command getSequence {biomaRt} and getSequence {seqinr} !!
Hi !  Facing problem with " getSequence" commend .  when only biomaRt package loaded the following example working well  >mart <- useMart("ensembl",dataset="hsapiens_gene_ensembl") >seq = getSequence(id="BRCA1", type="hgnc_symbol", seqType="peptide", mart = mart) show(seq) but when i have loaded the seqinr, i got problem
2017 Aug 04
1
legend and values do not match in ggplot
I have following codes for ggplots. The legends are given in the plot do not match with the values specified in the codes given below. Your helps highly appreciated. Greg library(ggplot2) p <- ggplot(a,aes(x=NO_BMI_FI_beta ,y=FI_beta ,color= Super.Pathway))+ theme_bw() +theme(panel.border=element_blank()) + geom_point(size=3) p2<-p+scale_color_manual(name="Super.Pathway",
2009 Sep 23
3
retrieve certain part from html
Dear All, Can someone please guide me how to get the certain part from a long html language? e.g. "<td><a href='2005-01.html'>2005-01</a></td><td><a href='2006-01.html'>2006-01</a></td><td><a href='2007-01.html'>2007-01</a></td><td><a
2009 Dec 31
3
XML and RCurl: problem with encoding (htmlTreeParse)
Hi, I'm trying to get data from web page and modify it in R. I have a problem with encoding. I'm not able to get encoding right in htmlTreeParse command. See below > library(RCurl) > library(XML) > > site <- getURL("http://www.aarresaari.net/jobboard/jobs.html") > txt <- readLines(tc <- textConnection(site)); close(tc) > txt <- htmlTreeParse(txt,
2007 Nov 18
4
Re ad HTML table
You can use htmlTreeParse and xpathApply from the XML library. something like: xpathApply( htmlTreeParse("http://blabla", useInt=T), "//td", function(x) xmlValue(x)) should do it. Gamma wrote: > > anyone care to explain how to read a html table, it's streaming data > (updated every second) and i am looking for a suitable function. > > The imported html
2008 May 02
1
How to parse XML
I would like to learn how to parse a mixed text/xml document I downloaded from the sec.gov website (see example below). I would like to parse this to get the value for each xml tag and then access it within R, but I don't know much about xml so I don't even know where to start debugging the errors I am getting in this example code. Can anyone help me get started? Thanks, Roger ftp
2008 Sep 07
4
XML - get node by name
Hi there, I try to rewrite some Java-code with R. It deals with reading XML files. I started with the XML package. In Java, I had a very useful method which gave me a node by using: name of the node index of appearance start point: global (false) / local (true) So, I could do something like this. setCurrentChildNode("data", 0); getValueOfElement("val",1,true); -->
2013 Jan 22
2
Creating a Data Frame from an XML
Hello, I'm attempting to read information from an XML into a data frame in R using the "XML" package. I am unable to get the data into a data frame as I would like. I have some sample code below. *XML Code:* Header... Data I want in a data frame: <data> <row BRAND="GMC" NUM="1" YEAR="1999" VALUE="10000" /> <row
2008 Jun 12
1
XML parameters to Column Headers for importing into a dataset
Dear List, Do you know any way I can convert XML parameters into column headers. My data is in a csv file with each row containing a xml form of data , and multiple parameters ( <param1> data_val1 </param2> , <param2> data_val2 </param2> ) I want to convert it so each row caters to one record and each parameter becomes a different column. param1
2011 Oct 26
1
Webscraping - How to Scrape Out Text Into R As If Copied & Pasted From Webpage?
Greetings, I am trying to get all of the text from a web page as if I "selected all" on the page, pasted into a text file, and then read in the text file with read.csv(). # this is the actual page I'm trying to acquire text from: web.pg <- readLines("http://www.airweb.org/?page=574") # then parsed in hopes of an easier structure to work with: web.pg <-
2011 Mar 29
2
Scrap java scripts and styles from an html document
Hi, I am working on developing a web crawler in R and I needed some help with regard to removal of javascripts and style sheets from the html document of a web page. i tried using the xml package, hence the function xpathApply library(XML) txt = xpathApply(html,"//body//text()[not(ancestor::script)][not(ancestor::style)]", xmlValue) The output comes out as text lines, without any html