On Tue, Mar 10, 2015 at 12:56 PM, Hui <hui.du at savvyrookies.com> wrote:
> Thanks. However I got http error 999.
>
There is an additional complication here that linkedin doesn't want you to
scrape the website and denies requests form non-browser clients. To get
around this you need to set the "User-Agent" header to something that
looks
like a browser. Try this:
devtools::install_github("jeroenooms/curl")
h <- new_handle()
handle_setheaders(h, "User-Agent" = "Mozilla/5.0 (Windows NT 6.3;
rv:36.0)
Gecko/20100101 Firefox/36.0")
txt <- readLines(curl("https://www.linkedin.com/in/huidu", handle =
h))
>
> Hui
>
> Sent from my iPhone
>
> On Mar 10, 2015, at 12:07 PM, Jeroen Ooms <jeroen.ooms at
stat.ucla.edu>
> wrote:
>
>
>
> On Mon, Mar 9, 2015 at 3:39 PM, Hui Du <hui.du at savvyrookies.com>
wrote:
>
>> > readLines(url)
>> Error in file(con, "r") : cannot open the connection
>> In addition: Warning message:
>> In file(con, "r") : unsupported URL scheme
>>
>
> Try:
>
> library(curl)
> readLines(curl(url))
>
>
>
[[alternative HTML version deleted]]