Hi All, I am trying to parse some information from website, say, a linkedin page. The linkedin url was url = "http://www.linkedin.com/in/huidu" I had no problem to use readLines and XML package to collect the information I need. However, that url became " https://www.linkedin.com/in/huidu" now. url = "https://www.linkedin.com/in/huidu" It failed readLines function.> readLines(url)Error in file(con, "r") : cannot open the connection In addition: Warning message: In file(con, "r") : unsupported URL scheme Do you know any way to read-in web information if the url is https? Thanks a lot. Hui [[alternative HTML version deleted]]
Hi Hui, I have used the source_url function in the devtools package with good results. Give it a shot! Best, Jorge.- On Tue, Mar 10, 2015 at 9:39 AM, Hui Du <hui.du at savvyrookies.com> wrote:> Hi All, > > I am trying to parse some information from website, say, a linkedin page. > The linkedin url was > > url = "http://www.linkedin.com/in/huidu" > > I had no problem to use readLines and XML package to collect the > information I need. However, that url became " > https://www.linkedin.com/in/huidu" now. > > url = "https://www.linkedin.com/in/huidu" > > It failed readLines function. > > > readLines(url) > Error in file(con, "r") : cannot open the connection > In addition: Warning message: > In file(con, "r") : unsupported URL scheme > > > Do you know any way to read-in web information if the url is https? Thanks > a lot. > > Hui > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
On 09/03/2015 22:39, Hui Du wrote:> Hi All, > > I am trying to parse some information from website, say, a linkedin page. > The linkedin url was > > url = "http://www.linkedin.com/in/huidu" > > I had no problem to use readLines and XML package to collect the > information I need. However, that url became " > https://www.linkedin.com/in/huidu" now. > > url = "https://www.linkedin.com/in/huidu" > > It failed readLines function. > >> readLines(url) > Error in file(con, "r") : cannot open the connection > In addition: Warning message: > In file(con, "r") : unsupported URL scheme > > > Do you know any way to read-in web information if the url is https? Thanks > a lot.Try R-devel, soon to become R 3.2.0. That has support for this on platforms where libcurl is installed (which should be possible almost everywhere). You did not give the 'at a minimum' information required by the posting guide. This has long been possible on Windows with --internet2.> > Hui > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Emeritus Professor of Applied Statistics, University of Oxford 1 South Parks Road, Oxford OX1 3TG, UK
On Mon, Mar 9, 2015 at 3:39 PM, Hui Du <hui.du at savvyrookies.com> wrote:> > readLines(url) > Error in file(con, "r") : cannot open the connection > In addition: Warning message: > In file(con, "r") : unsupported URL scheme >Try: library(curl) readLines(curl(url)) [[alternative HTML version deleted]]