Jenny Vander Pluym - NOAA Federal
2016-Jun-21  14:42 UTC
[R] check broken links in a column
Hello all, I am having trouble finding code that will check links in a table, not all of the links on a specific web page. I have csv files that include links to images which are stored on the web. I have over 1,000 of them to check. I see things for curl, but that appears to be specific to pulling information from a website vs. just using a column of url in a table. I would like the results in a readable table format that tells me which links do not work. I do not want all of the images opened on my machine, just the links checked. Thank you so much for your time. Jenny VP -- Jenny Vander Pluym NOAA's National Centers for Coastal Ocean Science Research and Communications Specialist 101 Pivers Island Rd. Beaufort, NC 28516-9722 cell: 252.728.8777 What is NCCOS up to? <http://coastalscience.noaa.gov/news/> "The contents of this message are mine personally and do not necessarily reflect any position of the Government or the National Oceanic and Atmospheric Administration." [[alternative HTML version deleted]]
I don't know about R for this but how about wget: http://www.createdbypete.com/articles/simple-way-to-find-broken-links-with-wget/ You could store the list of links in a file and additionally use the -i flag. HTH Ulrik On Tue, 21 Jun 2016 at 16:47 Jenny Vander Pluym - NOAA Federal < jenny.vanderpluym at noaa.gov> wrote:> Hello all, > > I am having trouble finding code that will check links in a table, not all > of the links on a specific web page. > > I have csv files that include links to images which are stored on the web. > I have over 1,000 of them to check. > > I see things for curl, but that appears to be specific to pulling > information from a website vs. just using a column of url in a table. I > would like the results in a readable table format that tells me which links > do not work. I do not want all of the images opened on my machine, just the > links checked. > > Thank you so much for your time. > > Jenny VP > > -- > Jenny Vander Pluym > NOAA's National Centers for Coastal Ocean Science > Research and Communications Specialist > 101 Pivers Island Rd. > Beaufort, NC 28516-9722 cell: 252.728.8777 > What is NCCOS up to? <http://coastalscience.noaa.gov/news/> > > "The contents of this message are mine personally and do not necessarily > reflect any position of the Government or the National Oceanic and > Atmospheric Administration." > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
On 21/06/2016 10:42 AM, Jenny Vander Pluym - NOAA Federal wrote:> Hello all, > > I am having trouble finding code that will check links in a table, not all > of the links on a specific web page. > > I have csv files that include links to images which are stored on the web. > I have over 1,000 of them to check. > > I see things for curl, but that appears to be specific to pulling > information from a website vs. just using a column of url in a table. I > would like the results in a readable table format that tells me which links > do not work. I do not want all of the images opened on my machine, just the > links checked. > > Thank you so much for your time. > > Jenny VP >You could loop through the entries, and try to read from each. If the link doesn't exist, you'll get an error. For example: urls <- c("http://www.r-project.org", "http://foo.bar") result <- rep(NA, length(urls)) for (i in seq_along(urls)) { if (inherits(try(readLines(urls[i], 1), silent = TRUE), "try-error")) result[i] <- FALSE else result[i] <- TRUE } You can put together something more sophisticated with tryCatch(), which would make it easier to catch the warning messages when the reads fail. Duncan Murdoch