thr3ads.net - R help - [R] web scraping image [Jun 2015]

If this information is useful, please help other people find it:
Share via:

Curtis DeGasperi

2015-Jun-04 16:31 UTC

[R] web scraping image

I'm working on a script that downloads data from the USGS NWIS server.
dataRetrieval makes it easy to quickly get the data in a neat tabular
format, but I was also interested in getting the tabular text files -
also fairly easy for me using download.file.

However, I'm not skilled enough to work out how to download the nice
graphic files that can be produced dynamically from the USGS NWIS
server (for example:
http://nwis.waterdata.usgs.gov/nwis/peak?site_no=12144500&agency_cd=USGS&format=img)

My question is how do I get the image from this web page and save it
to a local directory? scrapeR returns the information from the page
and I suspect this is a possible solution path, but I don't know what
the next step is.

My code provided below works from a list I've created of USGS flow
gauging stations.

Curtis

## Code to process USGS daily flow data for high and low flow analysis
## Need to start with list of gauge ids to process
## Can't figure out how to automate download of images

require(dataRetrieval)
require(data.table)
require(scrapeR)

df <- read.csv("usgs_stations.csv", header=TRUE)

lstas <-length(df$siteno) #length of locator list

print(paste('Processsing...',df$name[1],' ',df$siteno[1], sep =
""))

datall <-  readNWISpeak(df$siteno[1])

for (a in 2:lstas) {
  # Print station being processed
  print(paste('Processsing...',df$name[a],' ',df$siteno[a], sep
= ""))

  dat<-  readNWISpeak(df$siteno[a])

  datall <- rbind(datall,dat)

}

write.csv(datall, file = "usgs_peaks.csv")

# Retrieve ascii text files and graphics

for (a in 1:lstas) {

  print(paste('Processsing...',df$name[1],' ',df$siteno[1], sep
= ""))

  graphic.url <-
paste('http://nwis.waterdata.usgs.gov/nwis/peak?site_no=',df$siteno[a],'&agency_cd=USGS&format=img',
sep = "")
  peakfq.url <-
paste('http://nwis.waterdata.usgs.gov/nwis/peak?site_no=',df$siteno[a],'&agency_cd=USGS&format=hn2',
sep = "")
  tab.url  <-
paste('http://nwis.waterdata.usgs.gov/nwis/peak?site_no=',df$siteno[a],'&agency_cd=USGS&format=rdb',
sep = "")

  graphic.fn <- paste('graphic_',df$siteno[a],'.gif', sep =
"")
  peakfq.fn <- paste('peakfq_',df$siteno[a],'.txt', sep =
"")
  tab.fn  <- paste('tab_',df$siteno[a],'.txt', sep =
"")

  download.file(graphic.url,graphic.fn,mode='wb') # This apparently
doesn't work - file is empty
  download.file(peakfq.url,peakfq.fn)
  download.file(tab.url,tab.fn)
}

# scrapeR
pageSource<-scrape(url="http://nwis.waterdata.usgs.gov/nwis/peak?site_no=12144500&agency_cd=USGS&format=img",headers=TRUE,
parse=FALSE)
page<-scrape(object="pageSource")

Jim Lemon

2015-Jun-04 22:59 UTC

head link

[R] web scraping image

Hi Chris,
I don't have the packages you are using, but tracing this indicates
that the page source contains the relative path of the graphic, in
this case:

/nwisweb/data/img/USGS.12144500.19581112.20140309..0.peak.pres.gif

and you already have the server URL:

nwis.waterdata.usgs.gov

getting the path out of the page source isn't difficult, just split
the text at double quotes and get the token following "img src=". If I
understand the arguments of "download.file" correctly, the path is the
graphic.fn argument and the server URL is the graphic.url argument. I
would paste them together and display the result to make sure that it
matches the image you want. When I did this, the correct image
appeared in my browser. I'm using Google Chrome, so I don't have to
prepend the http://

Jim

On Fri, Jun 5, 2015 at 2:31 AM, Curtis DeGasperi
<curtis.degasperi at gmail.com> wrote:> I'm working on a script that downloads data from the USGS NWIS server.
> dataRetrieval makes it easy to quickly get the data in a neat tabular
> format, but I was also interested in getting the tabular text files -
> also fairly easy for me using download.file.
>
> However, I'm not skilled enough to work out how to download the nice
> graphic files that can be produced dynamically from the USGS NWIS
> server (for example:
>
http://nwis.waterdata.usgs.gov/nwis/peak?site_no=12144500&agency_cd=USGS&format=img)
>
> My question is how do I get the image from this web page and save it
> to a local directory? scrapeR returns the information from the page
> and I suspect this is a possible solution path, but I don't know what
> the next step is.
>
> My code provided below works from a list I've created of USGS flow
> gauging stations.
>
> Curtis
>
> ## Code to process USGS daily flow data for high and low flow analysis
> ## Need to start with list of gauge ids to process
> ## Can't figure out how to automate download of images
>
> require(dataRetrieval)
> require(data.table)
> require(scrapeR)
>
> df <- read.csv("usgs_stations.csv", header=TRUE)
>
> lstas <-length(df$siteno) #length of locator list
>
> print(paste('Processsing...',df$name[1],' ',df$siteno[1],
sep = ""))
>
> datall <-  readNWISpeak(df$siteno[1])
>
> for (a in 2:lstas) {
>   # Print station being processed
>   print(paste('Processsing...',df$name[a],' ',df$siteno[a],
sep = ""))
>
>   dat<-  readNWISpeak(df$siteno[a])
>
>   datall <- rbind(datall,dat)
>
> }
>
> write.csv(datall, file = "usgs_peaks.csv")
>
> # Retrieve ascii text files and graphics
>
> for (a in 1:lstas) {
>
>   print(paste('Processsing...',df$name[1],' ',df$siteno[1],
sep = ""))
>
>   graphic.url <-
>
paste('http://nwis.waterdata.usgs.gov/nwis/peak?site_no=',df$siteno[a],'&agency_cd=USGS&format=img',
> sep = "")
>   peakfq.url <-
>
paste('http://nwis.waterdata.usgs.gov/nwis/peak?site_no=',df$siteno[a],'&agency_cd=USGS&format=hn2',
> sep = "")
>   tab.url  <-
paste('http://nwis.waterdata.usgs.gov/nwis/peak?site_no=',df$siteno[a],'&agency_cd=USGS&format=rdb',
> sep = "")
>
>   graphic.fn <- paste('graphic_',df$siteno[a],'.gif',
sep = "")
>   peakfq.fn <- paste('peakfq_',df$siteno[a],'.txt', sep
= "")
>   tab.fn  <- paste('tab_',df$siteno[a],'.txt', sep =
"")
>
>   download.file(graphic.url,graphic.fn,mode='wb') # This apparently
> doesn't work - file is empty
>   download.file(peakfq.url,peakfq.fn)
>   download.file(tab.url,tab.fn)
> }
>
> # scrapeR
>
pageSource<-scrape(url="http://nwis.waterdata.usgs.gov/nwis/peak?site_no=12144500&agency_cd=USGS&format=img",headers=TRUE,
> parse=FALSE)
> page<-scrape(object="pageSource")
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Jun 2015 - web scraping image

[R] web scraping image

[R] web scraping image