thr3ads.net - similar to: "My First Attempt at Screen Scraping with R"

Displaying 20 results from an estimated 4000 matches similar to: "My First Attempt at Screen Scraping with R"

Scraping data from website---Error in htmlParse: error in creating parser

2013 Feb 28

Scraping data from website---Error in htmlParse: error in creating parser

I'm trying to scrape football projections from accuscore.com for the different positions (right now the projections are set to zeros, but that will change). I can get the QB projections, but I can't get the projections for any of the other positions (e.g., RB). How can I get the RB projections? I'm not sure what the actual website for the RB and other projections is. When I go to

Using R htmlParse() for manipulating URLs to access multiple pages

2018 May 23

Using R htmlParse() for manipulating URLs to access multiple pages

I am trying to scrape a manual from web. For privacy reasons, I cannot write here the exact URL, anyway, the structure is as follows: https://home.lala.com/bibi/blabla/chapter_i_organization/101_contracts/whatever/,DanaInfo=intranet.lala.com+ https://home.lala.com/bibi/blabla/chapter_i_organization/125_bills/,DanaInfo=intranet.lala.com+

Scraping a web page.

2012 May 14

Scraping a web page.

Folks, I want to scrape a series of web-page sources for strings like the following: "/en/Ships/A-8605507.html" "/en/Ships/Aalborg-8122830.html" which appear in an href inside an <a> tag inside a <div> tag inside a table. In fact all I want is the (exactly) 7-digit number before ".html". The good news is that as far as I can tell the the <a>

reading tables from multiple HTML pages

2011 Aug 29

reading tables from multiple HTML pages

Hi, beginner to R and was having some problems scraping data from tables in html using the XML package. I have included some code below. I am trying to loop through a series of html pages, each of which contains a single table from which I want to scrape data. However, some of the pages are blank - and so it throws me an error message when it gets to htmlParse(). The loop then closes out and I

Rcurl, postForm()

2012 May 28

Rcurl, postForm()

Dear colleagues, Could I get some assistance using postForm() to scrape the business names and addresses at this website: http://www.brantford.ca/business/LocalBusinessCommunity/Pages/BusinessDirectorySearch.aspx I've read through (http://www.omegahat.org/RCurl/RCurlJSS.pdf) and scoured the web for tutorials, but I can't crack it. I'm aware that this is probably a pretty basic

Reordering a numeric variable

2011 Dec 15

Reordering a numeric variable

I'm running a linear model in R using the car package. I have a variable education, which i have recoded and regrouped to my wishes. However, R seems to place each element of that variable in alphabetical order. When I am running the model, don't I need the model order from lowest to highest to make an inference that a one unit change in one variable produced a one unit change in

Caching from screen scraping

2006 Jan 27

Caching from screen scraping

Hi all, I need to do some screen scraping from my rails app. Given an ethernet (MAC) adress, I scrape results from an internal web page that returns location and hostname. How can I cache the result from that screen scraping as to be polite to the scrapee? I would like to expire the results daily. In perl, I would use Cache::File. Can I use rails caching for this? What''s the best

R as a web scraping tool using RCurl

2009 Feb 18

R as a web scraping tool using RCurl

Hi List, I am trying to leverage my knowledge of R in trying to use it for tasks that may not make R the best choice for these tasks. I wish to automate a web scraping task, which requires a multi-step procedure: 1) log in to a website 2) Go to a particular page 3) From the drop down menu, click on a particular link 4) From the tabulated data presented, choose relevant information based on a

Web scraping different levels of a website

2018 Jan 18

Web scraping different levels of a website

I am web scraping a page at http://catalog.ihsn.org/index.php/catalog#_r=&collection=&country=&dtype=&from=1890&page=1&ps=100&sid=&sk=&sort_by=nation&sort_order=&to=2017&topic=&view=s&vk= From this url, I have built up a dataframe through the following code: dflist <- map(.x = 1:417, .f = function(x) { Sys.sleep(5) url <-

grep and XML

2012 Apr 16

grep and XML

Hi all: I struggle a lot scraping web data. I still haven't got a handle on the XML package. I'd like to get particular exchange rates from this table: https://raw.github.com/currencybot/open-exchange-rates/master/latest.json This is the code that I'm working with: library(RCurl) library(XML)

scraping with session cookies

2012 Sep 19

scraping with session cookies

Hi, I am starting coding in r and one of the things that i want to do is to scrape some data from the web. The problem that I am having is that I cannot get passed the disclaimer page (which produces a session cookie). I have been able to collect some ideas and combine them in the code below but I dont get passed the disclaimer page. I am trying to agree the disclaimer with the postForm and write

Try Giving Invalid Argument Type Error

2012 May 19

Try Giving Invalid Argument Type Error

Dear R Helpers, I am getting an error message from the try function that I don't understand so I am hoping that someone can help. I am scraping from web pages, but sometimes they disappear. When that happens I need to control for it with some sort of function. This web page is parsed without a problem. exh<-"NASDAQ" tic<-"EGHT"

Remove superscripts from HTML objects

2012 Apr 12

Remove superscripts from HTML objects

Is there some way to remove superscripts from objects returned by html/xmlParse (XML package)? h <- "<html>CataDog</html>" doc <- htmlParse(h) xpathSApply(doc, "//p", xmlValue) [1] "Cata" "Dog" I could probably remove the tags from the "h" object above,

Scraping a web page

2009 Dec 03

Scraping a web page

I would like to be able to submit a list of URLs of various webpages and extract the "content" i.e. not the mark-up of those pages. I can find plenty of examples in the XML library of extracting links from pages but I cannot seem to find a way to extract the text. Any help would be greatly appreciated - I will not know the structure of the URLs I would submit in advance. Any

Scraping from different level URLs website

2018 Jan 23

Scraping from different level URLs website

I am doing a research on World Bank (WB) projects on developing countries. To do so, I am scraping their website in order to collect the data I am interested in. The structure of the webpage I want to scrape is the following: 1. List of countries the list of all countries in which WB has developed projects<http://projects.worldbank.org/country?lang=en&page=> 1.1. By clicking on a

XML and RCurl: problem with encoding (htmlTreeParse)

2010 Jul 03

XML and RCurl: problem with encoding (htmlTreeParse)

Hi All, First method:- >library(XML) >theurl <- "http://home.sina.com" >download.file(theurl, "tmp.html") >txt <- readLines("tmp.html") >txt <- htmlTreeParse(txt, error=function(...){}, useInternalNodes = TRUE) >g <- xpathSApply(txt, "//p", function(x) xmlValue(x)) >head(grep(" ", g, value=T)) [1] " |

Scraping AOL Webmail to login and fetch contacts?

2007 Oct 10

Scraping AOL Webmail to login and fetch contacts?

I''m helping with a gem that is going to published under the contentfree project on rubyforge (http://rubyforge.org/projects/contentfree/). The gem is called "blackbook" and basically it will go and fetch your contacts from the major webmail providers. So far Gmail, Yahoo!, and MSN have been completed. We are trying to finish up with fetching contacts from AOL Webmail. However

postForm() in RCurl and library RHTMLForms

2012 Oct 17

postForm() in RCurl and library RHTMLForms

Hi R Users, I want to get the data from the url given from 10/09/2012 to 15/10/2012. I don't know how to pass the parameters . ....................................................................................................................................... library(RHTMLForms) > > ff = getHTMLFormDescription("

Scraping and saving.

2007 Apr 03

Scraping and saving.

Hi, I''m working to scrape and save some ebooks. Mechanize has been wonderful so far. The link I''m having trouble with is this one. http://www.webscription.net/SendZip.aspx?SKU=0671578499&ProductID=379&format=H When I click that in the browser it saves it to a file named H_1632.zip. How do I get that name from the page. I suspect to save this to a file I would just do

Scraping info from a web site?

2018 Jan 31

Scraping info from a web site?

Hi, All: ????? What would you suggest one use to read the data on members of the US Congress and their positions on net neutrality from "https://www.battleforthenet.com/scoreboard" into R? ????? I found recommendations for the "rvest" package to "Easily Harvest (Scrape) Web Pages".? I tried the following: URL <-

similar to: My First Attempt at Screen Scraping with R