Hi, I want to mine web pages and decided to use tm and scrapeR. The example given in scrapeR's manual runs as follows: library(scrapeR) pageSource<-scrape(url="http://cran.r-project.org/web/packages/",headers=TRUE, parse=FALSE) if(attributes(pageSource)$headers["status"]==200) { page<-scrape(object="pageSource") xpathSApply(page,"//table//td/a",xmlValue) } else { cat("There was an error with the page. \n") } and returns a list and an error str(pageSource) gives List of 1 $ http://cran.r-project.org/web/packages/: atomic [1:1] <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> . ## I have left out most of the html that was returned. . ..- attr(*, "headers")= Named chr "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\n<html xmlns="| __truncated__ .. ..- attr(*, "names")= chr "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\n<html xmlns="| __truncated__ I seem to be missing the status attribute in the returned list and the scrape(object="pageSource") returns a list causing xpathSApply indigestion! I am running R 2.15.3 (2013-03-01) on Ubuntu 12.04 with RCurl 1.95-4.1 and libcurl4-gnutls-dev (version 7.22.0-3ubuntu4.1) and libcurl3 (version 7.22.0-3ubuntu4.1). RCurl's basicHeaderGather() function returns a status of 200 for http://cran.r-project.org/web/packages/index.html I assume I have a problem with my libcurl setup....... Any pointers to fixing this? Andrew Andrew Roberts Oswestry UK [[alternative HTML version deleted]]