Hello: What tools would you recommend for extracting the table of members of the US House of representatives from "http://house.gov/representatives/" and "http://en.wikipedia.org/wiki/List_of_current_members_of_the_United_States_House_of_Representatives_by_age"? I started writing something using getURL{RCurl}. However, I'm getting bogged down manually selecting character sequences to search for and split on. Thanks, Spencer Graves
On 25 April 2013 at 13:00, Spencer Graves wrote: | Hello: | | | What tools would you recommend for extracting the table of | members of the US House of representatives from | "http://house.gov/representatives/" and | "http://en.wikipedia.org/wiki/List_of_current_members_of_the_United_States_House_of_Representatives_by_age"? | | | | I started writing something using getURL{RCurl}. However, I'm | getting bogged down manually selecting character sequences to search for | and split on. You could try your own sos package to search what others have done here; the XML package is popular for it but the whole scheme is fraught with little pitfalls as html very definitely is not a good format for data-delivery, and an html page clearly is no API for data access. Dirk -- Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
Hello, The following seems to work. library(XML) url <- "http://house.gov/representatives/" dat <- readHTMLTable(readLines(url), which=1, header=TRUE) str(dat) head(dat) Hope this helps, Rui Barradas Em 25-04-2013 21:00, Spencer Graves escreveu:> Hello: > > > What tools would you recommend for extracting the table of > members of the US House of representatives from > "http://house.gov/representatives/" and > "http://en.wikipedia.org/wiki/List_of_current_members_of_the_United_States_House_of_Representatives_by_age"? > > > > I started writing something using getURL{RCurl}. However, I'm > getting bogged down manually selecting character sequences to search for > and split on. > > > Thanks, > Spencer Graves > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel