Hi I wrote a script which retrieves links from websites and loads them with scan: ... website<-tolower(scan(current.pages[i], what="character", sep="\n", quiet=TRUE)) ... However occasionally, the script finds broken links, such as <http://www.google.com/test>. when the script tries to access such websites, the repeat loop breaks and I get the error message Error in file(file, "r") : unable to open connection In addition: Warning message: cannot open: HTTP status was '404 Not Found' Now my question: is there a way to test whether the target of a link exists that does not result in an error and, thus, discontinues my loop? I looked at the help files for files, scans, connections, and did a search for "404?' in th archives but couldn't find anything. I work with R 2.3.1 patched on Windows XP (both Home and Prof) and would appreciate any pointers ... Thanks a lot, STG
See ?try as in this example: current.pages <- c("http://www.google.com", "http://www.google.com/test", "http://www.yahoo.com") for(i in seq(along = current.pages)) { website <- try(tolower(scan(current.pages[i], what="character", sep="\n", quiet=TRUE))) if (inherits(website, "try-error")) cat(current.pages[i], "bad\n") else cat(current.pages[i], "ok\n") } On 9/18/06, Stefan Th. Gries <stgries_lists at arcor.de> wrote:> Hi > > I wrote a script which retrieves links from websites and loads them with scan: > > ... > website<-tolower(scan(current.pages[i], what="character", sep="\n", quiet=TRUE)) > ... > > However occasionally, the script finds broken links, such as <http://www.google.com/test>. when the script tries to access such websites, the repeat loop breaks and I get the error message > > Error in file(file, "r") : unable to open connection > In addition: Warning message: > cannot open: HTTP status was '404 Not Found' > > Now my question: is there a way to test whether the target of a link exists that does not result in an error and, thus, discontinues my loop? I looked at the help files for files, scans, connections, and did a search for "404?' in th archives but couldn't find anything. I work with R 2.3.1 patched on Windows XP (both Home and Prof) and would appreciate any pointers ... > Thanks a lot, > STG > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
See ?try ?tryCatch On Mon, 18 Sep 2006, Stefan Th. Gries wrote:> Hi > > I wrote a script which retrieves links from websites and loads them with scan: > > ... > website<-tolower(scan(current.pages[i], what="character", sep="\n", quiet=TRUE)) > ... > > However occasionally, the script finds broken links, such as <http://www.google.com/test>. when the script tries to access such websites, the repeat loop breaks and I get the error message > > Error in file(file, "r") : unable to open connection > In addition: Warning message: > cannot open: HTTP status was '404 Not Found' > > Now my question: is there a way to test whether the target of a link exists that does not result in an error and, thus, discontinues my loop? I looked at the help files for files, scans, connections, and did a search for "404?' in th archives but couldn't find anything. I work with R 2.3.1 patched on Windows XP (both Home and Prof) and would appreciate any pointers ... > Thanks a lot, > STG-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Seemingly Similar Threads
- Word boundaries and gregexpr in R 2.2.1
- parts of data frames: subset vs. [-c()]
- Named backreferences in replacement patterns
- gregexpr in R 2.3.0 != gregexpr in R 2.4.0
- RfW 2.3.1: regular expressions to detect pairs of identical word-final character sequences