thr3ads.net - R help - [R] How to choose a button and scrape the website data [Mar 2012]

If this information is useful, please help other people find it:
Share via:

Guang Dai

2012-Mar-05 18:38 UTC

[R] How to choose a button and scrape the website data

hi all, 
I'm working on scrapping some website data to build a database. 
Under most cases, I can use package XML to get the dataset. 
However, some of the website doesn't give a explicit address of the
downloaded tables.

To be more specific, for example, I'm interested in the website
http://ets.aeso.ca/
The data we are scraping is the "Pool Weekly Summary" under the
category of "Historical".
However, after clicking "historical" and choose the "Pool Weekly
Summary"  item on the website,
the address is always http://ets.aeso.ca/ and doesn't change. 

In this case, I guess I need to tell R first click the "historical"
button then choose the item before
scraping the data. But, the question is how?

Any suggestions are welcome. 
Guang

Tyler Ritchie

2012-Mar-05 20:40 UTC

head link

[R] How to choose a button and scrape the website data

That website uses javascript to submit the form (and doesn't work in
Chrome). You could build a javascript interpreter in R, have parse the
page, and then use the various javascript to submit the form. R just isn't
the right tool for that type of interaction.

Performing the task you want--as described--is possible, just not
reasonable with R. There are better tools for automating webpages such
as Automato [1] or Sikuli [2] which are handy tools.

But better would be to query the site directly. Checking the source of the
page each of the different report types stems from a different URL, passing
it arguments in the form of:

beginDate=03012012&endDate=03032012&SelectFormat=CSV

results in values from March 1st to 3rd of this year in a csv. To find the
URLs of interest go view the source and search for "Select a Report"

Easier still might be to contact AESO and ask them for the data.

[1] http://automa.to/
[2] http://sikuli.org/

-Tyler

On Mon, Mar 5, 2012 at 10:38 AM, Guang Dai <Guang.Dai@albertamsa.ca>
wrote:
> hi all,
> I'm working on scrapping some website data to build a database.
> Under most cases, I can use package XML to get the dataset.
> However, some of the website doesn't give a explicit address of the
> downloaded tables.
>
> To be more specific, for example, I'm interested in the website
> http://ets.aeso.ca/
> The data we are scraping is the "Pool Weekly Summary" under the
category
> of "Historical".
> However, after clicking "historical" and choose the "Pool
Weekly Summary"
>  item on the website,
> the address is always http://ets.aeso.ca/ and doesn't change.
>
> In this case, I guess I need to tell R first click the
"historical" button
> then choose the item before
> scraping the data. But, the question is how?
>
> Any suggestions are welcome.
> Guang
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Guang Dai

2012-Mar-05 21:30 UTC

head link

[R] How to choose a button and scrape the website data

Thank you, Tyler.
Just have a quick read on automa.to and sikuli.org, seems very promising.
Since I anticipate there many other cases where a similar issue can arise, I
don't mind spending sometime to learn something that is very efficient for
the purpose. any suggestions?
________________________________
From: Tyler Ritchie [mailto:tyler.ritchie@gmail.com]
Sent: Monday, March 05, 2012 1:40 PM
To: Guang Dai
Cc: r-help@r-project.org
Subject: Re: [R] How to choose a button and scrape the website data

That website uses javascript to submit the form (and doesn't work in
Chrome). You could build a javascript interpreter in R, have parse the page, and
then use the various javascript to submit the form. R just isn't the right
tool for that type of interaction.

Performing the task you want--as described--is possible, just not reasonable
with R. There are better tools for automating webpages such as Automato [1] or
Sikuli [2] which are handy tools.

But better would be to query the site directly. Checking the source of the page
each of the different report types stems from a different URL, passing it
arguments in the form of:

beginDate=03012012&endDate=03032012&SelectFormat=CSV

results in values from March 1st to 3rd of this year in a csv. To find the URLs
of interest go view the source and search for "Select a Report"

Easier still might be to contact AESO and ask them for the data.

[1] http://automa.to/
[2] http://sikuli.org/

-Tyler

On Mon, Mar 5, 2012 at 10:38 AM, Guang Dai
<Guang.Dai@albertamsa.ca<mailto:Guang.Dai@albertamsa.ca>> wrote:
hi all,
I'm working on scrapping some website data to build a database.
Under most cases, I can use package XML to get the dataset.
However, some of the website doesn't give a explicit address of the
downloaded tables.

To be more specific, for example, I'm interested in the website
http://ets.aeso.ca/
The data we are scraping is the "Pool Weekly Summary" under the
category of "Historical".
However, after clicking "historical" and choose the "Pool Weekly
Summary"  item on the website,
the address is always http://ets.aeso.ca/ and doesn't change.

In this case, I guess I need to tell R first click the "historical"
button then choose the item before
scraping the data. But, the question is how?

Any suggestions are welcome.
Guang
______________________________________________
R-help@r-project.org<mailto:R-help@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Mar 2012 - How to choose a button and scrape the website data

[R] How to choose a button and scrape the website data

[R] How to choose a button and scrape the website data

[R] How to choose a button and scrape the website data

Seemingly Similar Threads