Francisco Zambrano
2012-Oct-11 18:37 UTC
[R] Problems with getURL (RCurl) to obtain list files of an ftp directory
Dear all, I have a problem with the command 'getURL' from the RCurl package, which I have been using to obtain a ftp directory list from the MOD16 (ET, DSI) products, and then to download them. (part of the script by Tomislav Hengl, spatial-analyst). Instead of the list of files (from ftp), I am getting the complete html code. Anyone knows why this might happen? This are the steps i have been doing:> MOD16A2.doy<- 'ftp://ftp.ntsg.umt.edu/pub/MODIS/Mirror/MOD16/MOD16A2.105_MERRAGMAO/'> items <- strsplit(getURL(MOD16A2.doy,.opts=curlOptions(ftplistonly=TRUE)), "\n")[[1]]>items #results[1] "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \" http://www.w3.org/TR/html4/loose.dtd\">\n<!-- HTML listing generated by Squid 2.7.STABLE9 -->\n<!-- Wed, 10 Oct 2012 13:43:53 GMT -->\n<HTML><HEAD><TITLE>\nFTP Directory: ftp://ftp.ntsg.umt.edu/pub/MODIS/Mirror/MOD16/MOD16A2.105_MERRAGMAO/\n</TITLE>\n<STYLE type=\"text/css\"><!--BODY{background-color:#ffffff;font-family:verdana,sans-serif}--></STYLE>\n</HEAD><BODY>\n<H2>\nFTP Directory: <A HREF=\"/\">ftp://ftp.ntsg.umt.edu</A>/<A HREF=\"/pub/\">pub</A>/<A HREF=\"/pub/MODIS/\">MODIS</A>/<A HREF=\"/pub/MODIS/Mirror/\">Mirror</A>/<A HREF=\"/pub/MODIS/Mirror/MOD16/\">MOD16</A>/<A HREF=\"/pub/MODIS/Mirror/MOD16/MOD16A2.105_MERRAGMAO/\">MOD16A2.105_MERRAGMAO</A>/</H2>\n<PRE>\n<A HREF=\"../\"><IMG border=\"0\" SRC=\" http://localhost:3128/squid-internal-static/icons/anthony-dirup.gif\" ALT=\"[DIRUP]\"></A> <A HREF=\"../\">Parent Directory</A> \n<A HREF=\"GEOTIFF_0.05degree/\"><IMG border=\"0\" SRC=\" http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" ALT=\"[DIR] \"></A> <A HREF=\"GEOTIFF_0.05degree/\">GEOTIFF_0.05degree</A> . . . . . . . Jun 3 18:00 \n<A HREF=\"GEOTIFF_0.5degree/\"><IMG border=\"0\" SRC=\" http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" ALT=\"[DIR] \"></A> <A HREF=\"GEOTIFF_0.5degree/\">GEOTIFF_0.5degree</A>. . . . . . . . Jun 3 18:01 \n<A HREF=\"Y2000/\"><IMG border=\"0\" SRC=\"http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" ALT=\"[DIR] \"></A> <A HREF=\"Y2000/\">Y2000</A>. . . . . . . . . . . . . . Dec 23 2010 \n<A HREF=\"Y2001/\"><IMG border=\"0\" SRC=\" http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" ALT=\"[DIR] \"></A> <A HREF=\"Y2001/\">Y2001</A>. . . . . . . . . . . . . . Dec 23 2010 \n<A HREF=\"Y2002/\"><IMG border=\"0\" SRC=\" http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" ALT=\"[DIR] \"></A> <A HREF=\"Y2002/\">Y2002</A>. . . . . . . . . . . . . . Dec 23 2010 \n<A HREF=\"Y2003/\"><IMG border=\"0\" SRC=\" http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" ALT=\"[DIR] \"></A> <A HREF=\"Y2003/\">Y2003</A>. . . . . . . . . . . . . . Dec 23 2010 \n<A HREF=\"Y2004/\"><IMG border=\"0\" SRC=\" http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" ALT=\"[DIR] \"></A> <A HREF=\"Y2004/\">Y2004</A>. . . . . . . . . . . . . . Dec 23 2010 \n<A HREF=\"Y2005/\"><IMG border=\"0\" SRC=\" http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" ALT=\"[DIR] \"></A> <A HREF=\"Y2005/\">Y2005</A>. . . . . . . . . . . . . . Dec 23 2010 \n<A HREF=\"Y2006/\"><IMG border=\"0\" SRC=\" http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" ALT=\"[DIR] \"></A> <A HREF=\"Y2006/\">Y2006</A>. . . . . . . . . . . . . . Dec 23 2010 \n<A HREF=\"Y2007/\"><IMG border=\"0\" SRC=\" http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" ALT=\"[DIR] \"></A> <A HREF=\"Y2007/\">Y2007</A>. . . . . . . . . . . . . . Dec 23 2010 \n<A HREF=\"Y2008/\"><IMG border=\"0\" SRC=\" http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" ALT=\"[DIR] \"></A> <A HREF=\"Y2008/\">Y2008</A>. . . . . . . . . . . . . . Dec 23 2010 \n<A HREF=\"Y2009/\"><IMG border=\"0\" SRC=\" http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" ALT=\"[DIR] \"></A> <A HREF=\"Y2009/\">Y2009</A>. . . . . . . . . . . . . . Dec 23 2010 \n<A HREF=\"Y2010/\"><IMG border=\"0\" SRC=\" http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" ALT=\"[DIR] \"></A> <A HREF=\"Y2010/\">Y2010</A>. . . . . . . . . . . . . . Feb 20 2011 \n<A HREF=\"Y2011/\"><IMG border=\"0\" SRC=\" http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" ALT=\"[DIR] \"></A> <A HREF=\"Y2011/\">Y2011</A>. . . . . . . . . . . . . . Mar 12 2012 \n</PRE>\n<HR noshade size=\"1px\">\n<ADDRESS>\nGenerated Wed, 10 Oct 2012 13:43:53 GMT by localhost (squid/2.7.STABLE9)\n</ADDRESS></BODY></HTML>\n" The curious is that the command getURL was working well until I don't know what happened. And using the same command in Windows works fine. The sessionInfo() have given me the next: R version 2.14.1 (2011-12-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] MODIS_0.5-8 maptools_0.8-16 lattice_0.20-0 foreign_0.8-48 date_1.2-32 [6] RCurl_1.95-0.1 bitops_1.0-4.1 rgdal_0.7-19 raster_2.0-12 sp_0.9-99 loaded via a namespace (and not attached): [1] grid_2.14.1 tools_2.14.1 Kind regard for all Francisco Zambrano Bigiarini INIA Quilamapu, Chillán, *Chile* [[alternative HTML version deleted]]
Duncan Temple Lang
2012-Oct-12 16:41 UTC
[R] Problems with getURL (RCurl) to obtain list files of an ftp directory
Hi Francisco The code gives me the correct results, and it works for you on a Windows machine. So while it could be different versions of software (e.g. libcurl, RCurl, etc.), the presence of the word "squid" in the HTML suggests to me that your machine/network is using the proxy/caching software Squid. This intercepts requests and caches the results locally and shares them across local users. So if squid has retrieved that page for an HTML target (e.g. a browser or with a Content-Type set to text/html), it may be using that cached copy for your FTP request. One thing I like to do when debugging RCurl calls is to add verbose = TRUE to the .opts argument and then see the information about the communication. D. On 10/11/12 11:37 AM, Francisco Zambrano wrote:> Dear all, > > I have a problem with the command 'getURL' from the RCurl package, which I > have been using to obtain a ftp directory list from the MOD16 (ET, DSI) > products, and then to download them. (part of the script by Tomislav > Hengl, spatial-analyst). Instead of the list of files (from ftp), I am > getting the complete html code. Anyone knows why this might happen? > > This are the steps i have been doing: > >> MOD16A2.doy<- ' > ftp://ftp.ntsg.umt.edu/pub/MODIS/Mirror/MOD16/MOD16A2.105_MERRAGMAO/' > >> items <- strsplit(getURL(MOD16A2.doy, > .opts=curlOptions(ftplistonly=TRUE)), "\n")[[1]] > >> items #results > > [1] "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \" > http://www.w3.org/TR/html4/loose.dtd\">\n<!-- HTML listing generated by > Squid 2.7.STABLE9 -->\n<!-- Wed, 10 Oct 2012 13:43:53 GMT > -->\n<HTML><HEAD><TITLE>\nFTP Directory: > ftp://ftp.ntsg.umt.edu/pub/MODIS/Mirror/MOD16/MOD16A2.105_MERRAGMAO/\n</TITLE>\n<STYLE > type=\"text/css\"><!--BODY{background-color:#ffffff;font-family:verdana,sans-serif}--></STYLE>\n</HEAD><BODY>\n<H2>\nFTP > Directory: <A HREF=\"/\">ftp://ftp.ntsg.umt.edu</A>/<A > HREF=\"/pub/\">pub</A>/<A HREF=\"/pub/MODIS/\">MODIS</A>/<A > HREF=\"/pub/MODIS/Mirror/\">Mirror</A>/<A > HREF=\"/pub/MODIS/Mirror/MOD16/\">MOD16</A>/<A > HREF=\"/pub/MODIS/Mirror/MOD16/MOD16A2.105_MERRAGMAO/\">MOD16A2.105_MERRAGMAO</A>/</H2>\n<PRE>\n<A > HREF=\"../\"><IMG border=\"0\" SRC=\" > http://localhost:3128/squid-internal-static/icons/anthony-dirup.gif\" > ALT=\"[DIRUP]\"></A> <A HREF=\"../\">Parent Directory</A> \n<A > HREF=\"GEOTIFF_0.05degree/\"><IMG border=\"0\" SRC=\" > http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" > ALT=\"[DIR] \"></A> <A HREF=\"GEOTIFF_0.05degree/\">GEOTIFF_0.05degree</A> > . . . . . . . Jun 3 18:00 \n<A HREF=\"GEOTIFF_0.5degree/\"><IMG > border=\"0\" SRC=\" > http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" > ALT=\"[DIR] \"></A> <A HREF=\"GEOTIFF_0.5degree/\">GEOTIFF_0.5degree</A>. . > . . . . . . Jun 3 18:01 \n<A HREF=\"Y2000/\"><IMG border=\"0\" > SRC=\"http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" > ALT=\"[DIR] \"></A> <A HREF=\"Y2000/\">Y2000</A>. . . . . . . . . . . . . . > Dec 23 2010 \n<A HREF=\"Y2001/\"><IMG border=\"0\" SRC=\" > http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" > ALT=\"[DIR] \"></A> <A HREF=\"Y2001/\">Y2001</A>. . . . . . . . . . . . . . > Dec 23 2010 \n<A HREF=\"Y2002/\"><IMG border=\"0\" SRC=\" > http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" > ALT=\"[DIR] \"></A> <A HREF=\"Y2002/\">Y2002</A>. . . . . . . . . . . . . . > Dec 23 2010 \n<A HREF=\"Y2003/\"><IMG border=\"0\" SRC=\" > http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" > ALT=\"[DIR] \"></A> <A HREF=\"Y2003/\">Y2003</A>. . . . . . . . . . . . . . > Dec 23 2010 \n<A HREF=\"Y2004/\"><IMG border=\"0\" SRC=\" > http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" > ALT=\"[DIR] \"></A> <A HREF=\"Y2004/\">Y2004</A>. . . . . . . . . . . . . . > Dec 23 2010 \n<A HREF=\"Y2005/\"><IMG border=\"0\" SRC=\" > http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" > ALT=\"[DIR] \"></A> <A HREF=\"Y2005/\">Y2005</A>. . . . . . . . . . . . . . > Dec 23 2010 \n<A HREF=\"Y2006/\"><IMG border=\"0\" SRC=\" > http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" > ALT=\"[DIR] \"></A> <A HREF=\"Y2006/\">Y2006</A>. . . . . . . . . . . . . . > Dec 23 2010 \n<A HREF=\"Y2007/\"><IMG border=\"0\" SRC=\" > http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" > ALT=\"[DIR] \"></A> <A HREF=\"Y2007/\">Y2007</A>. . . . . . . . . . . . . . > Dec 23 2010 \n<A HREF=\"Y2008/\"><IMG border=\"0\" SRC=\" > http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" > ALT=\"[DIR] \"></A> <A HREF=\"Y2008/\">Y2008</A>. . . . . . . . . . . . . . > Dec 23 2010 \n<A HREF=\"Y2009/\"><IMG border=\"0\" SRC=\" > http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" > ALT=\"[DIR] \"></A> <A HREF=\"Y2009/\">Y2009</A>. . . . . . . . . . . . . . > Dec 23 2010 \n<A HREF=\"Y2010/\"><IMG border=\"0\" SRC=\" > http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" > ALT=\"[DIR] \"></A> <A HREF=\"Y2010/\">Y2010</A>. . . . . . . . . . . . . . > Feb 20 2011 \n<A HREF=\"Y2011/\"><IMG border=\"0\" SRC=\" > http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\" > ALT=\"[DIR] \"></A> <A HREF=\"Y2011/\">Y2011</A>. . . . . . . . . . . . . . > Mar 12 2012 \n</PRE>\n<HR noshade > size=\"1px\">\n<ADDRESS>\nGenerated Wed, 10 Oct 2012 13:43:53 GMT by > localhost (squid/2.7.STABLE9)\n</ADDRESS></BODY></HTML>\n" > > The curious is that the command getURL was working well until I don't know > what happened. And using the same command in Windows works fine. > > The sessionInfo() have given me the next: > > R version 2.14.1 (2011-12-22) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > LC_TIME=en_US.UTF-8 > [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 > LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > LC_ADDRESS=C > [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 > LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] MODIS_0.5-8 maptools_0.8-16 lattice_0.20-0 foreign_0.8-48 > date_1.2-32 > [6] RCurl_1.95-0.1 bitops_1.0-4.1 rgdal_0.7-19 raster_2.0-12 > sp_0.9-99 > > loaded via a namespace (and not attached): > [1] grid_2.14.1 tools_2.14.1 > > Kind regard for all > > Francisco Zambrano Bigiarini > INIA Quilamapu, Chill?n, *Chile* > > [[alternative HTML version deleted]] > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >