Paolo Innocenti
2011-Jan-21 00:38 UTC
[R] Reading gz compressed csv file - 'incomplete line found'
Hi all, I am trying to download, decompress and read a csv file. My code: myurl <- "ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/series/GSE24729/GSE24729_MitoNuclear_suppl_male_stats.csv.gz" # myfile <- "GSE24729_MitoNuclear_suppl_male_stats.csv.gz" # download.file(myurl, destfile=myfile, mode="w") # mycon <- gzcon(gzfile(myfile, open="r")) # mydata <- read.csv(textConnection(readLines(mycon))) # close(mycon) works under my linux distribution, but under windows, I get the following warning: > myurl <- "ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/series/GSE24729/GSE24729_MitoNuclear_suppl_male_stats.csv.gz" > myfile <- "GSE24729_MitoNuclear_suppl_male_stats.csv.gz" > download.file(myurl, destfile=myfile, mode="w") trying URL 'ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/series/GSE24729/GSE24729_MitoNuclear_suppl_male_stats.csv.gz' ftp data connection made, file length 535641 bytes opened URL downloaded 523 Kb > mycon <- gzcon(gzfile(myfile, open="r")) > mydata <- read.csv(textConnection(readLines(mycon))) Warning message: In readLines(mycon) : incomplete final line found on 'gzcon(GSE24729_MitoNuclear_suppl_male_stats.csv.gz)' > close(mycon) I can read only 30 lines, and then stops working. Does anyone have any suggestion? I suspect the problem lies in gzcon/gzfile not decompressing properly, or in some other problem with the end of line/end of file, but the help files are a bit above my level of understanding. Thanks, paolo > sessionInfo() R version 2.12.1 (2010-12-16) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] grid stats graphics grDevices utils datasets methods [8] base other attached packages: [1] lattice_0.19-13 drosophila2.db_2.4.5 org.Dm.eg.db_2.4.6 [4] GOstats_2.16.0 RSQLite_0.9-4 DBI_0.2-5 [7] graph_1.28.0 Category_2.16.0 AnnotationDbi_1.12.0 [10] xtable_1.5-6 GEOquery_2.16.3 ellipse_0.3-5 [13] RColorBrewer_1.0-2 hopach_2.10.0 cluster_1.13.2 [16] limma_3.6.9 genefilter_1.32.0 vsn_3.18.0 [19] affy_1.28.0 Biobase_2.10.0 loaded via a namespace (and not attached): [1] affyio_1.18.0 annotate_1.28.0 GO.db_2.4.5 [4] GSEABase_1.12.2 preprocessCore_1.12.0 RBGL_1.26.0 [7] RCurl_1.5-0.1 splines_2.12.1 survival_2.36-2 [10] tools_2.12.1 XML_3.2-0.2
Paolo Innocenti
2011-Jan-21 05:07 UTC
[R] Reading gz compressed csv file - 'incomplete line found'
That worked! download.file(myurl, destfile=myfile, mode="wb") Thanks a lot, paolo On 01/21/2011 02:53 PM, William Dunlap wrote:> Try mode="wb" ('b' for binary mode) in the > call to download.file(). It should make a > difference on Windows (& Mac?) and be innocuous on > Unix. > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > >> -----Original Message----- >> From: r-help-bounces at r-project.org >> [mailto:r-help-bounces at r-project.org] On Behalf Of Paolo Innocenti >> Sent: Thursday, January 20, 2011 4:39 PM >> To: r-help at r-project.org >> Subject: [R] Reading gz compressed csv file - 'incomplete line found' >> >> Hi all, >> >> I am trying to download, decompress and read a csv file. My code: >> >> myurl<- >> "ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/series/GSE2 >> 4729/GSE24729_MitoNuclear_suppl_male_stats.csv.gz" >> >> # >> myfile<- "GSE24729_MitoNuclear_suppl_male_stats.csv.gz" >> # >> download.file(myurl, destfile=myfile, mode="w") >> # >> mycon<- gzcon(gzfile(myfile, open="r")) >> # >> mydata<- read.csv(textConnection(readLines(mycon))) >> # >> close(mycon) >> >> works under my linux distribution, but under windows, I get the >> following warning: >> >> > myurl<- >> "ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/series/GSE2 >> 4729/GSE24729_MitoNuclear_suppl_male_stats.csv.gz" >> >> > myfile<- "GSE24729_MitoNuclear_suppl_male_stats.csv.gz" >> > download.file(myurl, destfile=myfile, mode="w") >> trying URL >> 'ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/series/GSE2 >> 4729/GSE24729_MitoNuclear_suppl_male_stats.csv.gz' >> >> ftp data connection made, file length 535641 bytes >> opened URL >> downloaded 523 Kb >> >> > mycon<- gzcon(gzfile(myfile, open="r")) >> > mydata<- read.csv(textConnection(readLines(mycon))) >> Warning message: >> In readLines(mycon) : >> incomplete final line found on >> 'gzcon(GSE24729_MitoNuclear_suppl_male_stats.csv.gz)' >> > close(mycon) >> >> I can read only 30 lines, and then stops working. Does anyone >> have any >> suggestion? I suspect the problem lies in gzcon/gzfile not >> decompressing >> properly, or in some other problem with the end of line/end >> of file, but >> the help files are a bit above my level of understanding. >> >> Thanks, >> paolo >> >> > sessionInfo() >> R version 2.12.1 (2010-12-16) >> Platform: i386-pc-mingw32/i386 (32-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 >> [2] LC_CTYPE=English_United States.1252 >> [3] LC_MONETARY=English_United States.1252 >> [4] LC_NUMERIC=C >> [5] LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] grid stats graphics grDevices utils >> datasets methods >> [8] base >> >> other attached packages: >> [1] lattice_0.19-13 drosophila2.db_2.4.5 org.Dm.eg.db_2.4.6 >> [4] GOstats_2.16.0 RSQLite_0.9-4 DBI_0.2-5 >> [7] graph_1.28.0 Category_2.16.0 AnnotationDbi_1.12.0 >> [10] xtable_1.5-6 GEOquery_2.16.3 ellipse_0.3-5 >> [13] RColorBrewer_1.0-2 hopach_2.10.0 cluster_1.13.2 >> [16] limma_3.6.9 genefilter_1.32.0 vsn_3.18.0 >> [19] affy_1.28.0 Biobase_2.10.0 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.18.0 annotate_1.28.0 GO.db_2.4.5 >> [4] GSEABase_1.12.2 preprocessCore_1.12.0 RBGL_1.26.0 >> [7] RCurl_1.5-0.1 splines_2.12.1 survival_2.36-2 >> [10] tools_2.12.1 XML_3.2-0.2 >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >
Maybe Matching Threads
- Print plot to pdf, jpg or any other format when using scatter3d error
- Error in M[, 1] : incorrect number of dimensions when trying to plot hexbin
- kruskal's MONANOVA algorithm
- PJSIP does not qualify contacts after starting Asterisk
- Suggestion for big files [was: Re: A comment about R:]