thr3ads.net - R help - [R] My First Attempt at Screen Scraping with R [May 2011]

If this information is useful, please help other people find it:
Share via:

Twitter
Facebook
Email

Abraham Mathew

2011-May-06 23:11 UTC

[R] My First Attempt at Screen Scraping with R

Hello Folks,

I'm working on trying to scrape my first web site and ran into a issue
because I'm really
don't know anything about regular expressions in R.

library(XML)
library(RCurl)

site <- "thisorthat.com/leader/month"
site.doc <- htmlParse(site, ?, xmlValue)


At the ?, I realize that I need to insert a regex command which will
decipher the contents of the web page...right?

First, I'm not sure if the contents of the site would be considered a table
and I'm also not sure how to disregard pictures
when scraping the site.


> sessionInfo()R version 2.13.0 (2011-04-13)
Platform: i686-pc-linux-gnu (32-bit)


Please Help!
Abraham

	[[alternative HTML version deleted]]

Possibly Parallel Threads

Scraping data from website---Error in htmlParse: error in creating parser
Using R htmlParse() for manipulating URLs to access multiple pages
Scraping a web page.
reading tables from multiple HTML pages
Rcurl, postForm()

Search for more apparently analagous threads

R help - May 2011 - My First Attempt at Screen Scraping with R

[R] My First Attempt at Screen Scraping with R

Possibly Parallel Threads

Wisdom of the Ancients