thr3ads.net - R help - [R] Web scraping - Having trouble figuring out how to approach this problem [Feb 2017]

If this information is useful, please help other people find it:
Share via:

henrique monte

2017-Feb-22 22:52 UTC

[R] Web scraping - Having trouble figuring out how to approach this problem

Sometimes I need to get some data from the web organizing it into a
dataframe and waste a lot of time doing it manually. I've been trying to
figure out how to optimize this proccess, and I've tried with some R
scraping approaches, but couldn't get to do it right and I thought there
could be an easier way to do this, can anyone help me out with this?

Fictional example:

Here's a webpage with countries listed by continents:
https://simple.wikipedia.org/wiki/List_of_countries_by_continents

Each country name is also a link that leads to another webpage (specific of
each country, e.g. https://simple.wikipedia.org/wiki/Angola).

I would like as a final result to get a data frame with number of
observations (rows) = number of countries listed and 4 variables (colums)
as ID=Country Name, Continent=Continent it belongs to, Language=Official
language (from the specific webpage of the Countries) and Population = most
recent population count (from the specific webpage of the Countries).

...

The main issue I'm trying to figure out is handling several webpages, like,
would it be possible to scrape from the first link of the problem the
countries as a list with the links of the countries webpages and then
create and run a function to run a scraping command in each of those links
from the list to get the specific data I'm looking for?

	[[alternative HTML version deleted]]

Jeff Newmiller

2017-Feb-23 18:03 UTC

head link

[R] Web scraping - Having trouble figuring out how to approach this problem

The answer is yes, and does not seem like a big step from where you are now, so
seeing what you already know how to do (reproducible example, or RE) would help
focus the assistance. There are quite a few ways to do this kind of thing, and
what you already know would be clarified with a RE.
-- 
Sent from my phone. Please excuse my brevity.

On February 22, 2017 2:52:55 PM PST, henrique monte <henrique.monte66 at
gmail.com> wrote:>Sometimes I need to get some data from the web organizing it into a
>dataframe and waste a lot of time doing it manually. I've been trying
>to
>figure out how to optimize this proccess, and I've tried with some R
>scraping approaches, but couldn't get to do it right and I thought
>there
>could be an easier way to do this, can anyone help me out with this?
>
>Fictional example:
>
>Here's a webpage with countries listed by continents:
>https://simple.wikipedia.org/wiki/List_of_countries_by_continents
>
>Each country name is also a link that leads to another webpage
>(specific of
>each country, e.g. https://simple.wikipedia.org/wiki/Angola).
>
>I would like as a final result to get a data frame with number of
>observations (rows) = number of countries listed and 4 variables
>(colums)
>as ID=Country Name, Continent=Continent it belongs to,
>Language=Official
>language (from the specific webpage of the Countries) and Population
>most
>recent population count (from the specific webpage of the Countries).
>
>...
>
>The main issue I'm trying to figure out is handling several webpages,
>like,
>would it be possible to scrape from the first link of the problem the
>countries as a list with the links of the countries webpages and then
>create and run a function to run a scraping command in each of those
>links
>from the list to get the specific data I'm looking for?
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

R help - Feb 2017 - Web scraping - Having trouble figuring out how to approach this problem

[R] Web scraping - Having trouble figuring out how to approach this problem

[R] Web scraping - Having trouble figuring out how to approach this problem