thr3ads.net - R help - [R] How to download and unzip data in a loop [Feb 2015]

If this information is useful, please help other people find it:
Share via:

Jon Skoien

2015-Feb-05 11:11 UTC

[R] How to download and unzip data in a loop

In addition to following Jim's suggestion, you should probably also use 
full.names = TRUE, otherwise you will try to open a connection to files 
in your current directory, not in tmpdir.
Another thing is that the unzipped files appear irregular with respect 
to columns, so read.table might not work too well.

Jon

On 2/5/2015 11:30 AM, jim holtman wrote:> try taking the quotes off of 'files'
>
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> On Wed, Feb 4, 2015 at 5:24 PM, Alexandra Catena <amc5981 at
gmail.com> wrote:
>
>> Hi All,
>>
>> I need to loop through and download the past 10 years of met data to a
>> temporary directory.  I then need to unzip it and place it into another
>> directory.
>>
>>
>> year = (2005:2015)
>>
>> for (i in year)
>>    tmpdir = tempdir()
>>    file[i] = file.path(tmpdir, sprintf('724927-23285-%4i.gz',
i))
>>    url = sprintf('
>> ftp://ftp.ncdc.noaa.gov/pub/data/noaa/%4i/724927-23285-%4i.gz', i,
i)
>>    #file = basename(url)
>>    download.file(url, file[i])
>>    files = dir(tmpdir, '*.gz', full.names=FALSE)
>>    read.table(gzfile('files'))
>>
>>
>>
>> 'file' returns 2015 indices with
"/tmp/RtmpKvB4Wz/724927-23285-2015.gz"
>> next to 2015. and files returns 724927-23285-2015.gz.  However, when I
try
>> to unzip the gz file using the last line, it says it cannot open the
>> connection and the probable reason is that there is no such file or
>> directory.
>>
>>
>>
>> Thanks,
>> Alexandra
>>
>>          [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Jon Olav Sk?ien
Joint Research Centre - European Commission
Institute for Environment and Sustainability (IES)
Climate Risk Management Unit

Via Fermi 2749, TP 100-01,  I-21027 Ispra (VA), ITALY

jon.skoien at jrc.ec.europa.eu
Tel:  +39 0332 789205

Disclaimer: Views expressed in this email are those of the individual 
and do not necessarily represent official views of the European Commission.

Alexandra Catena

2015-Feb-05 18:03 UTC

head link

[R] How to download and unzip data in a loop

Thank you guys for the response.

I'm trying to download the last ten years of meteorology data from a
weather station in Livermore from the URL:
ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2015/724927-23285-2015.gz
The Livermore station code is 724927-23285.  If I wanted to download data
from 2005, the URL would be:
ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2005/724927-23285-2005.gz

Once I download the data into a temporary file, I want to unzip it and
store it into another directory where I can access it.

Also, why are there 2015 indices instead of just 10 when I'm only looping
through 2005:2015?

Thanks,
Alexandra

On Thu, Feb 5, 2015 at 3:11 AM, Jon Skoien <jon.skoien at
jrc.ec.europa.eu>
wrote:
> In addition to following Jim's suggestion, you should probably also use
> full.names = TRUE, otherwise you will try to open a connection to files in
> your current directory, not in tmpdir.
> Another thing is that the unzipped files appear irregular with respect to
> columns, so read.table might not work too well.
>
> Jon
>
>
> On 2/5/2015 11:30 AM, jim holtman wrote:
>
>> try taking the quotes off of 'files'
>>
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>> On Wed, Feb 4, 2015 at 5:24 PM, Alexandra Catena <amc5981 at
gmail.com>
>> wrote:
>>
>>  Hi All,
>>>
>>> I need to loop through and download the past 10 years of met data
to a
>>> temporary directory.  I then need to unzip it and place it into
another
>>> directory.
>>>
>>>
>>> year = (2005:2015)
>>>
>>> for (i in year)
>>>    tmpdir = tempdir()
>>>    file[i] = file.path(tmpdir,
sprintf('724927-23285-%4i.gz', i))
>>>    url = sprintf('
>>> ftp://ftp.ncdc.noaa.gov/pub/data/noaa/%4i/724927-23285-%4i.gz',
i, i)
>>>    #file = basename(url)
>>>    download.file(url, file[i])
>>>    files = dir(tmpdir, '*.gz', full.names=FALSE)
>>>    read.table(gzfile('files'))
>>>
>>>
>>>
>>> 'file' returns 2015 indices with
"/tmp/RtmpKvB4Wz/724927-23285-2015.gz"
>>> next to 2015. and files returns 724927-23285-2015.gz.  However,
when I
>>> try
>>> to unzip the gz file using the last line, it says it cannot open
the
>>> connection and the probable reason is that there is no such file or
>>> directory.
>>>
>>>
>>>
>>> Thanks,
>>> Alexandra
>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> --
> Jon Olav Sk?ien
> Joint Research Centre - European Commission
> Institute for Environment and Sustainability (IES)
> Climate Risk Management Unit
>
> Via Fermi 2749, TP 100-01,  I-21027 Ispra (VA), ITALY
>
> jon.skoien at jrc.ec.europa.eu
> Tel:  +39 0332 789205
>
> Disclaimer: Views expressed in this email are those of the individual and
> do not necessarily represent official views of the European Commission.
>
>
	[[alternative HTML version deleted]]

Jeff Newmiller

2015-Feb-05 20:16 UTC

head link

[R] How to download and unzip data in a loop

Dunno. Try posting your current code that fixes the previously mentioned
problems, but this time use plain text so the HTML doesn't corrupt it.

Usually you can solve this kind of issue by executing one line at a time and
looking at each result to make sure it is what you think it is. You can also
wrap it up in a function and set debug mode for that function and then you can
single step through it.

One thing that looks wrong is lack or braces surrounding the body of the for
loop.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

On February 5, 2015 10:03:34 AM PST, Alexandra Catena <amc5981 at
gmail.com> wrote:>Thank you guys for the response.
>
>I'm trying to download the last ten years of meteorology data from a
>weather station in Livermore from the URL:
>ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2015/724927-23285-2015.gz
>The Livermore station code is 724927-23285.  If I wanted to download
>data
>from 2005, the URL would be:
>ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2005/724927-23285-2005.gz
>
>Once I download the data into a temporary file, I want to unzip it and
>store it into another directory where I can access it.
>
>Also, why are there 2015 indices instead of just 10 when I'm only
>looping
>through 2005:2015?
>
>Thanks,
>Alexandra
>
>On Thu, Feb 5, 2015 at 3:11 AM, Jon Skoien
><jon.skoien at jrc.ec.europa.eu>
>wrote:
>
>> In addition to following Jim's suggestion, you should probably also
>use
>> full.names = TRUE, otherwise you will try to open a connection to
>files in
>> your current directory, not in tmpdir.
>> Another thing is that the unzipped files appear irregular with
>respect to
>> columns, so read.table might not work too well.
>>
>> Jon
>>
>>
>> On 2/5/2015 11:30 AM, jim holtman wrote:
>>
>>> try taking the quotes off of 'files'
>>>
>>>
>>> Jim Holtman
>>> Data Munger Guru
>>>
>>> What is the problem that you are trying to solve?
>>> Tell me what you want to do, not how you want to do it.
>>>
>>> On Wed, Feb 4, 2015 at 5:24 PM, Alexandra Catena <amc5981 at
gmail.com>
>>> wrote:
>>>
>>>  Hi All,
>>>>
>>>> I need to loop through and download the past 10 years of met
data
>to a
>>>> temporary directory.  I then need to unzip it and place it into
>another
>>>> directory.
>>>>
>>>>
>>>> year = (2005:2015)
>>>>
>>>> for (i in year)
>>>>    tmpdir = tempdir()
>>>>    file[i] = file.path(tmpdir,
sprintf('724927-23285-%4i.gz', i))
>>>>    url = sprintf('
>>>>
ftp://ftp.ncdc.noaa.gov/pub/data/noaa/%4i/724927-23285-%4i.gz', i,
>i)
>>>>    #file = basename(url)
>>>>    download.file(url, file[i])
>>>>    files = dir(tmpdir, '*.gz', full.names=FALSE)
>>>>    read.table(gzfile('files'))
>>>>
>>>>
>>>>
>>>> 'file' returns 2015 indices with
>"/tmp/RtmpKvB4Wz/724927-23285-2015.gz"
>>>> next to 2015. and files returns 724927-23285-2015.gz.  However,
>when I
>>>> try
>>>> to unzip the gz file using the last line, it says it cannot
open
>the
>>>> connection and the probable reason is that there is no such
file or
>>>> directory.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> Alexandra
>>>>
>>>>          [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>>
>>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/
>>> posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>> --
>> Jon Olav Sk?ien
>> Joint Research Centre - European Commission
>> Institute for Environment and Sustainability (IES)
>> Climate Risk Management Unit
>>
>> Via Fermi 2749, TP 100-01,  I-21027 Ispra (VA), ITALY
>>
>> jon.skoien at jrc.ec.europa.eu
>> Tel:  +39 0332 789205
>>
>> Disclaimer: Views expressed in this email are those of the individual
>and
>> do not necessarily represent official views of the European
>Commission.
>>
>>
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

David Winsemius

2015-Feb-05 21:18 UTC

head link

[R] How to download and unzip data in a loop

On Feb 5, 2015, at 10:03 AM, Alexandra Catena wrote:
> Thank you guys for the response.
> 
> I'm trying to download the last ten years of meteorology data from a
> weather station in Livermore from the URL:
> ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2015/724927-23285-2015.gz
> The Livermore station code is 724927-23285.  If I wanted to download data
> from 2005, the URL would be:
> ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2005/724927-23285-2005.gz
> 
> Once I download the data into a temporary file, I want to unzip it and
> store it into another directory where I can access it.
> 
> Also, why are there 2015 indices instead of just 10 when I'm only
looping
> through 2005:2015?
When you assign to file[2005], R fills in the positions from 1 to 2004 with
NA's, and then adds to that vector with each further run through the loop.

The quotes around 'files' are preventing evaluation of your (very poorly
named) 'files'-object.

The error I get after correcting those semantic errors is:
>   read.table(gzfile(files))Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
  line 1 did not have 17 elements

... thus validating Jon's warning.

> 
> Thanks,
> Alexandra
> 
> On Thu, Feb 5, 2015 at 3:11 AM, Jon Skoien <jon.skoien at
jrc.ec.europa.eu>
> wrote:
> 
>> In addition to following Jim's suggestion, you should probably also
use
>> full.names = TRUE, otherwise you will try to open a connection to files
in
>> your current directory, not in tmpdir.
>> Another thing is that the unzipped files appear irregular with respect
to
>> columns, so read.table might not work too well.
>> 
>> Jon
>> 
>> 
>> On 2/5/2015 11:30 AM, jim holtman wrote:
>> 
>>> try taking the quotes off of 'files'
>>> 
>>> 
>>> Jim Holtman
>>> Data Munger Guru
>>> 
>>> What is the problem that you are trying to solve?
>>> Tell me what you want to do, not how you want to do it.
>>> 
>>> On Wed, Feb 4, 2015 at 5:24 PM, Alexandra Catena <amc5981 at
gmail.com>
>>> wrote:
>>> 
>>> Hi All,
>>>> 
>>>> I need to loop through and download the past 10 years of met
data to a
>>>> temporary directory.  I then need to unzip it and place it into
another
>>>> directory.
>>>> 
>>>> 
>>>> year = (2005:2015)
>>>> 
>>>> for (i in year)
>>>>   tmpdir = tempdir()
>>>>   file[i] = file.path(tmpdir,
sprintf('724927-23285-%4i.gz', i))
>>>>   url = sprintf('
>>>>
ftp://ftp.ncdc.noaa.gov/pub/data/noaa/%4i/724927-23285-%4i.gz', i, i)
>>>>   #file = basename(url)
>>>>   download.file(url, file[i])
>>>>   files = dir(tmpdir, '*.gz', full.names=FALSE)
>>>>   read.table(gzfile('files'))
>>>> 
>>>> 
>>>> 
>>>> 'file' returns 2015 indices with
"/tmp/RtmpKvB4Wz/724927-23285-2015.gz"
>>>> next to 2015. and files returns 724927-23285-2015.gz.  However,
when I
>>>> try
>>>> to unzip the gz file using the last line, it says it cannot
open the
>>>> connection and the probable reason is that there is no such
file or
>>>> directory.
>>>> 
>>>> 
>>>> 
>>>> Thanks,
>>>> Alexandra
>>>> 
>>>>         [[alternative HTML version deleted]]
>>>> 
>>>> 

David Winsemius
Alameda, CA, USA

R help - Feb 2015 - How to download and unzip data in a loop

[R] How to download and unzip data in a loop

[R] How to download and unzip data in a loop

[R] How to download and unzip data in a loop

[R] How to download and unzip data in a loop