similar to: Re ad HTML table

Displaying 20 results from an estimated 1000 matches similar to: "Re ad HTML table"

2008 Oct 06
3
Extracting text from html code using the RCurl package.
Dear R-help, I want to download the text from a web page, however what i end up with is the html code. Is there some option that i am missing in the RCurl package? Or is there another way to achieve this? This is the code i am using: > library(RCurl) > my.url <- 'https://stat.ethz.ch/mailman/listinfo/r-help' > html.file <- getURI(my.url, ssl.verifyhost = FALSE,
2008 Dec 31
1
Chinese characters encoding problem with XML
XML is a good tool reading data from web within R. But I wonder how could get the encoding correctly. library(XML) url <- 'http://www.szitic.com/docc/jz-lmzq.html' xml <- htmlTreeParse(url, useInternal=TRUE) q <- "//tbody/tr/td" dat <- unlist(xpathApply(xml, q, xmlValue)) df <- as.data.frame(t(matrix(dat, 4))) dt<-as.character(df[15,1]) The first column of df
2007 Dec 14
6
Analyzing Publications from Pubmed via XML
I would like to track in which journals articles about a particular disease are being published. Creating a pubmed search is trivial. The search provides data but obviously not as an R dataframe. I can get the search to export the data as an xml feed and the xml package seems to be able to read it. xmlTreeParse("
2011 Oct 26
1
Webscraping - How to Scrape Out Text Into R As If Copied & Pasted From Webpage?
Greetings, I am trying to get all of the text from a web page as if I "selected all" on the page, pasted into a text file, and then read in the text file with read.csv(). # this is the actual page I'm trying to acquire text from: web.pg <- readLines("http://www.airweb.org/?page=574") # then parsed in hopes of an easier structure to work with: web.pg <-
2011 Mar 30
1
Package XML: Parse Garmin *.tcx file problems
I'm struggling with package XML to parse a Garmin file (named *.tcx). I wonder if it's form is incomplete, but appreciably reluctant to paste even a shortened version. The output below shows I can get nodes, but an attempt at value of a single node comes up empty (even though there is data there. One question: Has anybody succeeded parsing Garmin .tcx (xml) files? Thanks! Michael
2009 Sep 03
1
encoding problem using xml package
Dear list I tried to read an xml file using the xml package. Unfortunately, some encoding problems occure. E.g. german Umlaut will be red correctly. I assume that the occurs due to (internal?) conversion to utf-8. To illustrate the problem, I have wrote to xml files. File Test 1 ----------- <?xml version="1.0" encoding="ISO-8859-1"?> <Daten> <ITEM>
2009 Dec 31
3
XML and RCurl: problem with encoding (htmlTreeParse)
Hi, I'm trying to get data from web page and modify it in R. I have a problem with encoding. I'm not able to get encoding right in htmlTreeParse command. See below > library(RCurl) > library(XML) > > site <- getURL("http://www.aarresaari.net/jobboard/jobs.html") > txt <- readLines(tc <- textConnection(site)); close(tc) > txt <- htmlTreeParse(txt,
2016 Jan 18
3
Extraccion de datos de una Web
Buenas tardes, Quiero extraer datos de una web en la que ser relaciona la semana con la puntuación obtenida por un jugador. Ahora mismo llego a obtener elnodo en el que se relacionan la semana con la puntuación obtenida, pero no soy capaz de extraer esa informacion en una tabla de dos columna (semana, puntuacion) teniendo en cuenta que puede que haya semanas que no haya puntuado (en el ejemplo,
2007 Sep 01
2
Importing huge XML-Files
Dear all, for my diploma thesis I have to import huge XML-Files into R for statistical processing - huge means a size about 33 MB. I'm using the XML-Package version 1.9 As far as reading the complete file into R via xmlTreeParse doesn't work or is too slow, I'm trying to use xmlEventParse but I got completely stuck. I have many different type of nodes + <configuration>
2012 Feb 29
2
Using a FOR LOOP to name objects
Hello, I am trying to use a for loop to name objects in each iteraction. As in the following example (which doesn't work quite well) my_list<-c("A","B","C","D","E","F") for(i in c(1:length(my_list))){ url<- "http://finance.yahoo.com" doc = htmlTreeParse(url, useInternalNodes = T) tab_nodes = xpathApply(doc,
2012 Aug 10
3
Parsing large XML documents in R - how to optimize the speed?
Hello everyone, I would like to parse very large xml files from MS/MS experiments and create R objects from their content. (By very large, I mean going up to 5-10Gb, although I am using a 'small' 40M file to test my code.) My first attempt at parsing the 40M file, using the XML package, took more than 2200 seconds and left me quite disappointed. I managed to cut that down to around 40
2011 Mar 29
2
Scrap java scripts and styles from an html document
Hi, I am working on developing a web crawler in R and I needed some help with regard to removal of javascripts and style sheets from the html document of a web page. i tried using the xml package, hence the function xpathApply library(XML) txt = xpathApply(html,"//body//text()[not(ancestor::script)][not(ancestor::style)]", xmlValue) The output comes out as text lines, without any html
2008 Jun 10
1
Parse XML
Could someone provide a link or examples of parsing XML document in R? Few specific questions below: For instance I can retrieve specific nodes using this: node <- xpathApply(xml, "//" %+% xtag, xmlValue) 1) I want to be able to retrieve parent node for this node, how can I do this? getParentNode() does not seem to cut it. 2) How can I retrieve children nodes for a particular
2011 May 30
1
Need help reading website info with XML package and XPath
Hi, I'm looking for help extracting some information of the zillow website. I'd like to do this for the general case where I manually change the address by modifying the url (see code below). With the url containing the address, I'd like to be able to extract the same information each time. The specific information I'd like to be able to extract includes the homedetails url, price
2008 Apr 12
1
Extracting a data.frame from HTML code
Dear all, I'd like to use R to read in data from the web. I need some help finding an efficient way to strip the HTML tags and reformat the data as a data.frame to analyze in R. I'm currently using readLines() to read in the HTML code and then grep() to isolate the block of HTML code I want from each page, but this may not be the best approach. A short example: x1 <- readLines("
2016 Jan 19
2
Extraccion de datos de una Web
Muchas gracias a ambos!!!! Las dos soluciones me han funcionado. Un saludo. El día 18 de enero de 2016, 18:35, Carlos Ortega <cof en qualityexcellence.es> escribió: > Hola, > > Pero, si ya casi lo tienes... te quedan un par de pasos y ya está... > > Simplemente tienes que transformar "puntos_nodo" a bien un data.frame aunque > quedará lleno de cosas que no te
2010 Jul 03
1
XML and RCurl: problem with encoding (htmlTreeParse)
Hi All, First method:- >library(XML) >theurl <- "http://home.sina.com" >download.file(theurl, "tmp.html") >txt <- readLines("tmp.html") >txt <- htmlTreeParse(txt, error=function(...){}, useInternalNodes = TRUE) >g <- xpathSApply(txt, "//p", function(x) xmlValue(x)) >head(grep(" ", g, value=T)) [1] " |
2008 May 02
1
How to parse XML
I would like to learn how to parse a mixed text/xml document I downloaded from the sec.gov website (see example below). I would like to parse this to get the value for each xml tag and then access it within R, but I don't know much about xml so I don't even know where to start debugging the errors I am getting in this example code. Can anyone help me get started? Thanks, Roger ftp
2008 Jun 12
1
XML parameters to Column Headers for importing into a dataset
Dear List, Do you know any way I can convert XML parameters into column headers. My data is in a csv file with each row containing a xml form of data , and multiple parameters ( <param1> data_val1 </param2> , <param2> data_val2 </param2> ) I want to convert it so each row caters to one record and each parameter becomes a different column. param1
2012 Apr 21
1
how to write html output (webscraped using RCurl package) into file?
i want "http://scop.berkeley.edu/astral/pdbstyle/?id=d1fjgc2&output=html",showing information in webpage to be written in .txt file as it is(i don't want any html tag) i am using "RCurl" package >marathi<-htmlTreeParse("http://scop.berkeley.edu/astral/pdbstyle/?id=d1fjgc2&output=html") >marathi