R-3.0.1, Rstudio, Win7 x64 Dear list, I would like to download all the webpages of the Journal Citations Report (science edition), for a given year. I can do so manually, but that it very time intensive, so I would like to use R for that. I have tried many things, including: download.file(url = "http://admin-apps.webofknowledge.com/JCR/JCR?RQ=SELECT_ALL&cursor=21", destfile = "test.htm", method = "internal") which would get the page starting with journal number 21. However, test.htm only includes the message: >>> You do not have a session. You will need to establish a new session <http://admin-router.webofknowledge.com/?DestApp=JCR> if you wish to continue. <<< I do, however, already have a session that I manually started (through the web browser), but R does not recognize that. Does anyone have an idea how to download the webpages for JCR science edition for a given year? I have no idea what to try next. thanks, Peter Verbeet [[alternative HTML version deleted]]
Wet Bell Diver <wetbelldiver <at> gmail.com> writes:> > R-3.0.1, Rstudio, Win7 x64 > > Dear list, > > I would like to download all the webpages of the Journal Citations > Report (science edition), for a given year. I can do so manually, but > that it very time intensive, so I would like to use R for that. > > I have tried many things, including: > download.file(url = > "http://admin-apps.webofknowledge.com/JCR/JCR?RQ=SELECT_ALL&cursor=21", > destfile = "test.htm", method = "internal") > which would get the page starting with journal number 21. > However, test.htm only includes the message: > > >>>You need to review the RCurl package and look for "cookies", which will allow you (once you have established a session in a browser) to copy the cookies (tokens which allow you access) into your R session. However, you will probably be violating the terms of service of JCR. You should talk to your librarian about this. When I wanted to do a similar project I worked out a system where I generated the URLs automatically and got a student assistant to (efficiently) go to the URLs and paste the results into output files. Ben Bolker