herr dittmann
2012-Dec-21 14:51 UTC
[R] how can I import op.gz files with read.csv or otherwise
Dear R-users, I am struggling to directly read an "op.gz" file into R. NOAA kindly provides daily weather data on their FTP server for download.> sessionInfo()R version 2.15.1 (2012-06-22) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.15.1 Here is the data set in question: x <- read.csv(file="ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2012/285880-99999-2012.op.gz", skip = 1, sep = "") and "structure" returns some incomprehensible gibberish:> str(x)'data.frame': 70 obs. of 6 variables: $ X4àtYd...ÂÚ8.åWD...ëü.ïß.X.QTñVP.. : Factor w/ 70 levels "\005~4èdŒÚíE\031y\020ÿíº\035JëïÒI˜Æí\021†BÆÌR{žYkv\035`“ì°a\017Z\021ÅsP’eÏhÞÇ""| __truncated__,..: 44 13 56 64 28 23 67 3 2 33 ... $ X.öoMýßT..º_...Öígß7.â..Tþ.üÔ.ª...5ÕJ...žíÕj.QÊ..ã.eŃÎGmòýçºg..a...Õæ.J..Å.ãsç.êŠ.û.ÊklsUDÙ.Š..âU...u1.zË.WÜ..x...3._..E.ò.ÊZD.ïoÚÇ.dvæ....òk.C.y...8h: Factor w/ 41 levels "","\025ç\016\vβi;§4ƒñ\002\001P–á\0025¶•âÆ{ÐÄ™¹=¤4&ðw\\\\Q´›Ü´\"hnÅ™‰I¨ÅŠb*ªš\035b\b>6ÆÙ$W!ÖËR=¨\022“Pqˆ[»j\004$TÄ•3²*Ó±%àN”›\"| __truncated__,..: 1 1 1 39 1 1 1 8 1 5 ... $ X.iŒ.yÐû..ý2.h..ûÉ7.ãJ.3k..jLm...Q..uYÓJä.K.zkU.8.öÖ..Y.7.3...üÊîA.Ê.3ûÄ..Z..5...âš. : Factor w/ 29 levels "","\001qË^+nê1",..: 1 1 1 4 1 1 1 14 1 6 ... $ X.îFd.m.Š..v. : Factor w/ 17 levels ""," ´SL\ašÜÝ°\035d)‚¼$íZƼò¶îßJÉ‚äþ†B6å\006%5l[‘¼š\025a\024ñï+gT+3",..: 1 1 1 16 1 1 1 1 1 1 ... $ ÿA..E.ŸJkEZ.ÐÊ.á.ë.......œ.z..Â.z..œ..òË.ÙÖãg.ö : Factor w/ 8 levels "","\001S\177¿\017iSÞÖiÓÈ#\017\"UgË:i´í\016pÝ\031UÄéD""| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ... $ X..oðçnPÔoÎWõj.éÔ..B..Âÿë.Qêù...ø While I can manually open and read the op.gz file in a text editor, read.csv() or read.table() the imported file is simply unreadable. How can I best get the job done? Any pointers, suggestions, ideas most welcome!! Thanks in advance! Bernd [[alternative HTML version deleted]]
HI, May be this link helps you: http://stackoverflow.com/questions/5764499/decompress-gz-file-using-r A.K. ----- Original Message ----- From: herr dittmann <herrdittmann at yahoo.co.uk> To: "r-help at r-project.org" <r-help at r-project.org> Cc: Sent: Friday, December 21, 2012 9:51 AM Subject: [R] how can I import op.gz files with read.csv or otherwise Dear R-users, I am struggling to directly read an "op.gz" file into R. NOAA kindly provides daily weather data on their FTP server for download.> sessionInfo()R version 2.15.1 (2012-06-22) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United Kingdom.1252? LC_CTYPE=English_United Kingdom.1252??? LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C??????????????????????????? LC_TIME=English_United Kingdom.1252?? ? attached base packages: [1] stats???? graphics? grDevices utils???? datasets? methods?? base??? ? loaded via a namespace (and not attached): [1] tools_2.15.1 Here is the data set in question: x <- read.csv(file="ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2012/285880-99999-2012.op.gz", skip = 1, sep = "") and "structure" returns some incomprehensible gibberish:> str(x)'data.frame':?? 70 obs. of? 6 variables: ?$ X4?tYd...??8.?WD...??.??.X.QT?VP..????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? : Factor w/ 70 levels "\005~4?d???E\031y\020???\035J???I???\021?B??R{?Ykv\035`???a\017Z\021?sP?e?h??""| __truncated__,..: 44 13 56 64 28 23 67 3 2 33 ... ?$ X.?oM??T..??_...??g?7.?..T?.??.?...5?J...???j.Q?..?.e???Gm????g..a...??.J..?.?s?.??.?.?klsUD?.?..?U...u1.z?.W?..x...3._..E.?.?ZD.?o??.dv?....?k.C.y...8h: Factor w/ 41 levels "","\025?\016\v??i;?4??\002\001P??\0025????{????=??4&?w\\\\Q??????\"hn???I???b*??\035b\b>6??$W!??R=?\022?Pq?[?j\004$T??3?*??%??N??\"| __truncated__,..: 1 1 1 39 1 1 1 8 1 5 ... ?$ X.i?.y??..?2.h..??7.?J.3k..jLm...Q..uY?J?.K.zkU.8.??..Y.7.3...???A.?.3??..Z..5...??.??????????????????????????????????????????????????????????????????? : Factor w/ 29 levels "","\001q?^+n?1",..: 1 1 1 4 1 1 1 14 1 6 ... ?$ X.?Fd.m.?..v.?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? : Factor w/ 17 levels "","??SL\a????\035d)??$?Z??????J?????B6?\006%5l[???\025a\024??+gT+3",..: 1 1 1 16 1 1 1 1 1 1 ... ?$ ?A..E.?JkEZ.??.?.?.......?.z..?.z..?..??.???g.????????????????????????????????????????????????????????????????????????????????????????????????????????? : Factor w/ 8 levels "","\001S\177?\017iS??i??#\017\"Ug?:i??\016p?\031U??D""| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ... ?$ X..o??nP?o?W?j.??..B..???.Q??...????? ? While I can manually open and read the op.gz file in a text editor, read.csv() or read.table() the imported file is simply unreadable. How can I best get the job done? Any pointers, suggestions, ideas most welcome!! Thanks in advance! Bernd ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
John Kane
2012-Dec-21 17:44 UTC
[R] how can I import op.gz files with read.csv or otherwise
Try downloading it and decompress it: url <- "ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2012/285880-99999-2012.op.gz" dest <- "/home/john/rdata/weather.op.gz" download.file(url, dest) However it does not look like a nicely formatted file and you may have to do some cleanup in a text editior or perhaps load it into a spreadsheet before you read it into R. I tried the method from the link arun provided and it did not work. It looks like the headers and data are not consistant John Kane Kingston ON Canada> -----Original Message----- > From: herrdittmann at yahoo.co.uk > Sent: Fri, 21 Dec 2012 14:51:05 +0000 (GMT) > To: r-help at r-project.org > Subject: [R] how can I import op.gz files with read.csv or otherwise > > Dear R-users, > > I am struggling to directly read an "op.gz" file into R. NOAA kindly > provides daily weather data on their FTP server for download. > > >> sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: x86_64-pc-mingw32/x64 (64-bit) > locale: > [1] LC_COLLATE=English_United Kingdom.1252B LC_CTYPE=English_United > Kingdom.1252B B B LC_MONETARY=English_United Kingdom.1252 > [4] LC_NUMERIC=CB B B B B B B B B B B B B B B B B B B B B B B B B B B > LC_TIME=English_United Kingdom.1252B B B > attached base packages: > [1] statsB B B B graphicsB grDevices utilsB B B B datasetsB methodsB > B baseB B B B > loaded via a namespace (and not attached): > [1] tools_2.15.1 > > Here is the data set in question: > x <-read.csv(file="ftp://ftp.ncdc.noaa.gov/pub/data/gsod/2012/285880-99999-2012.op.gz",> skip = 1, sep = "") > > and "structure" returns some incomprehensible gibberish: > >> str(x) > 'data.frame':B B 70 obs. ofB 6 variables: > B $ X4C tYd...C?C?8.C%WD...C+C<.C/C?.X.QTC1VP..B B B B B B B B B B B B B > B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B > B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B > B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B : Factor w/ > 70 levels "\005~4C(dE?C?C-E\031y\020C?C-B:\035JC+C/C?IK?C?C-\021b? > BC?C?R{E>Ykv\035`b??C,B0a\017Z\021C?sPb??eC?hC?C?""| __truncated__,..: 44 > 13 56 64 28 23 67 3 2 33 ... > B $ X.C6oMC=C?T..C?B:_...C?C-gC?7.C"..TC>.C<C?.B*...5C?J...E>C-C?j.QC > ..C#.eC?F?C?GmC2C=C'B:g..a...C?C&.J..C?.C#sC'.C*E .C;.C > klsUDC?.E ..C"U...u1.zC?.WC?..x...3._..E.C2.C > ZD.C/oC?C?.dvC&....C2k.C.y...8h: Factor w/ 41 levels"","\025C'\016\vC?B2i;B'4F?C1\002\001Pb??C!\0025B6b?"C"C?{C?C?b?"B9=C?B$4&C0w\\\\QB-B4b?:B-C?B4\"hnC?b?"b?0IB(C?E> b*B*E!\035b\b>6C?C?$W!C?C?R=B(\022b??PqK?[B;j\004$TC?b?"3B2*C?B1%C > B-Nb??b?:\"| __truncated__,..: 1 1 1 39 1 1 1 8 1 5 ... > B $ X.iE?.yC?C;..C=2.h..C;C > 7.C#J.3k..jLm...Q..uYC?JC$.K.zkU.8.C6C?..Y.7.3...C<C > C.A.C > .3C;C?..Z..5...C"E!.B B B B B B B B B B B B B B B B B B B B B B B B B B B > B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B > B B B : Factor w/ 29 levels "","\001qC?^+nC*1",..: 1 1 1 4 1 1 1 14 1 6 > ... > B $ X.C.Fd.m.E ..v.B B B B B B B B B B B B B B B B B B B B B B B B B B B > B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B > B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B > B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B > : Factor w/ 17 levels "","B B4SL\aE!C?C?B0\035d)b??B<$C-ZC?B<C2B6C.C?JC > b??C$C>b? B6C%\006%5l[b??B<E!\025a\024C1C/+gT+3",..: 1 1 1 16 1 1 1 1 1 1 > ... > B $ C?A..E.E8JkEZ.C?C > .C!.C+.......E?.z..C?.z..E?..C2C?.C?C?C#g.C6B B B B B B B B B B B B B B B > B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B > B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B > B B B B B B B B B B B B B B B : Factor w/ 8 levels > "","\001S\177B?\017iSC?C?iC?C?#\017\"UgC?:iB4C-\016pC?\031UC?C)D""| > __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ... > B $ X..oC0C'nPC?oC?WC5j.C)C?..B..C?C?C+.QC*C9...C8B B B B B > > > While I can manually open and read the op.gz file in a text editor, > read.csv() or read.table() the imported file is simply unreadable. > > How can I best get the job done? Any pointers, suggestions, ideas most > welcome!! > > Thanks in advance! > > Bernd > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.____________________________________________________________ FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!