Hi all, I am using RCurl to try and download data from a website, but I'm having trouble finding out what URL to use. Here is the site: http://www.invescopowershares.com/products/holdings.aspx?ticker=PGX See how in the upper right, above the displayed sheet, there's a link to download the data as a .csv file? When I hit "copy url" and paste into getURL in R, it doesn't work. That's no surprise because there isn't a URL in what gets pasted. I was just wondering if there's any way around this. Thanks in advance, Andrew -- View this message in context: http://r.789695.n4.nabble.com/Finding-the-right-url-for-RCurl-tp2314163p2314163.html Sent from the R help mailing list archive at Nabble.com.
On 8/4/2010 2:07 PM, AndrewPage wrote:> > Hi all, > > I am using RCurl to try and download data from a website, but I'm having > trouble finding out what URL to use. Here is the site: > > http://www.invescopowershares.com/products/holdings.aspx?ticker=PGX > > See how in the upper right, above the displayed sheet, there's a link to > download the data as a .csv file? When I hit "copy url" and paste into > getURL in R, it doesn't work. That's no surprise because there isn't a URL > in what gets pasted. I was just wondering if there's any way around this. > > Thanks in advance, > > AndrewI looked at the page. The link you mentioned runs some javascript which alters some values in a form and posts that form, the result of which is the CSV file. There is not a simple URL that points to the file. I don't know if RCurl can post forms, but if it can you may be able to mimic the form. The structure of the form starts on line 191 of the page source (or search for "aspnetForm") and appropriate values for __EVENTTARGET are given in the doPostBack call on line 258. Some understanding of HTML and HTTP may be necessary to know what is going on. I don't know if this would work or not. Also, the site has not made it easy to directly download the CSV file. That may be intentional. The Terms & Services of the site may have something to say about doing this as well. -- Brian Diggs Senior Research Associate, Department of Surgery, Oregon Health & Science University
Try this: library(XML) readHTMLTable(' http://www.invescopowershares.com/products/holdings.aspx?ticker=PGX', which = 13, header = TRUE) On Wed, Aug 4, 2010 at 6:07 PM, AndrewPage <savejarvis@yahoo.com> wrote:> > Hi all, > > I am using RCurl to try and download data from a website, but I'm having > trouble finding out what URL to use. Here is the site: > > http://www.invescopowershares.com/products/holdings.aspx?ticker=PGX > > See how in the upper right, above the displayed sheet, there's a link to > download the data as a .csv file? When I hit "copy url" and paste into > getURL in R, it doesn't work. That's no surprise because there isn't a URL > in what gets pasted. I was just wondering if there's any way around this. > > Thanks in advance, > > Andrew > -- > View this message in context: > http://r.789695.n4.nabble.com/Finding-the-right-url-for-RCurl-tp2314163p2314163.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]