I already posted a similar issue on bioconductor. https://support.bioconductor.org/p/112607/#112634 Couldn't find a solution. On Fri, Sep 7, 2018 at 3:45 PM Martin Morgan <mtmorgan.bioc at gmail.com> wrote:> Ask on the Bioconductor support site https://support.bioconductor.org > > Provide (on the support site) the output of the R commands > > library(GEOquery) > sessionInfo() > > Also include (copy and paste) the output of the command that fails. I have > > > gseEset2 <- getGEO('GSE76896')[[1]] > Found 1 file(s) > GSE76896_series_matrix.txt.gz > trying URL > ' > https://ftp.ncbi.nlm.nih.gov/geo/series/GSE76nnn/GSE76896/matrix/GSE76896_series_matrix.txt.gz > ' > Content type 'application/x-gzip' length 40561936 bytes (38.7 MB) > =================================================> downloaded 38.7 MB > > Parsed with column specification: > cols( > .default = col_double(), > ID_REF = col_character() > ) > See spec(...) for full column specifications. > |=================================================================| 100% > 84 MB > File stored at: > /tmp/Rtmpe4NWji/GPL570.soft > |=================================================================| 100% > 75 MB > > sessionInfo() > R version 3.5.1 Patched (2018-08-22 r75177) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu 16.04.5 LTS > > Matrix products: default > BLAS: /home/mtmorgan/bin/R-3-5-branch/lib/libRblas.so > LAPACK: /home/mtmorgan/bin/R-3-5-branch/lib/libRlapack.so > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] bindrcpp_0.2.2 GEOquery_2.49.1 Biobase_2.41.2 > [4] BiocGenerics_0.27.1 BiocManager_1.30.2 > > loaded via a namespace (and not attached): > [1] Rcpp_0.12.18 tidyr_0.8.1 crayon_1.3.4 dplyr_0.7.6 > [5] assertthat_0.2.0 R6_2.2.2 magrittr_1.5 pillar_1.3.0 > [9] stringi_1.2.4 rlang_0.2.2 curl_3.2 limma_3.37.4 > [13] xml2_1.2.0 tools_3.5.1 readr_1.1.1 glue_1.3.0 > [17] purrr_0.2.5 hms_0.4.2 compiler_3.5.1 pkgconfig_2.0.2 > [21] tidyselect_0.2.4 bindr_0.1.1 tibble_1.4.2 > > On 09/07/2018 06:08 AM, Deepa wrote: > > Hello, > > > > I am using a bioconductor package in R. > > The command that I use reads the contents of a file downloaded from a > > database and creates an expression object. > > > > The syntax works perfectly fine when the input size is of 10 MB. Whereas, > > when the file size is around 40MB the object isn't created. > > > > Is there an efficient way of loading a large input file to create the > > expression object? > > > > This is my code, > > > > > > library(gcrma) > > library(limma) > > library(biomaRt) > > library(GEOquery) > > library(Biobase) > > require(GEOquery) > > require(Biobase) > > gseEset1 <- getGEO('GSE53454')[[1]] #filesize 10MB > > gseEset2 <- getGEO('GSE76896')[[1]] #file size 40MB > > > > ##gseEset2 doesn't load and isn't created > > > > Many thanks > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > >[[alternative HTML version deleted]]
I am also providing the output that I obtain for your kind reference, gseEset2 <- getGEO('GSE76896', destdir = "data/")[[1]] Found 1 file(s) GSE76896_series_matrix.txt.gz Using locally cached version: /data//GSE76896_series_matrix.txt.gz Parsed with column specification: cols( .default = col_double(), ID_REF = col_character() ) See spec(...) for full column specifications. Using locally cached version of GPL570 found here: /data//GPL570.soft After this I don't see any output. I had to forcefully stop the execution. On Fri, Sep 7, 2018 at 4:05 PM Deepa <deepamahm.iisc at gmail.com> wrote:> I already posted a similar issue on bioconductor. > https://support.bioconductor.org/p/112607/#112634 > Couldn't find a solution. > > > On Fri, Sep 7, 2018 at 3:45 PM Martin Morgan <mtmorgan.bioc at gmail.com> > wrote: > >> Ask on the Bioconductor support site https://support.bioconductor.org >> >> Provide (on the support site) the output of the R commands >> >> library(GEOquery) >> sessionInfo() >> >> Also include (copy and paste) the output of the command that fails. I have >> >> > gseEset2 <- getGEO('GSE76896')[[1]] >> Found 1 file(s) >> GSE76896_series_matrix.txt.gz >> trying URL >> ' >> https://ftp.ncbi.nlm.nih.gov/geo/series/GSE76nnn/GSE76896/matrix/GSE76896_series_matrix.txt.gz >> ' >> Content type 'application/x-gzip' length 40561936 bytes (38.7 MB) >> =================================================>> downloaded 38.7 MB >> >> Parsed with column specification: >> cols( >> .default = col_double(), >> ID_REF = col_character() >> ) >> See spec(...) for full column specifications. >> |=================================================================| 100% >> 84 MB >> File stored at: >> /tmp/Rtmpe4NWji/GPL570.soft >> |=================================================================| 100% >> 75 MB >> > sessionInfo() >> R version 3.5.1 Patched (2018-08-22 r75177) >> Platform: x86_64-pc-linux-gnu (64-bit) >> Running under: Ubuntu 16.04.5 LTS >> >> Matrix products: default >> BLAS: /home/mtmorgan/bin/R-3-5-branch/lib/libRblas.so >> LAPACK: /home/mtmorgan/bin/R-3-5-branch/lib/libRlapack.so >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] bindrcpp_0.2.2 GEOquery_2.49.1 Biobase_2.41.2 >> [4] BiocGenerics_0.27.1 BiocManager_1.30.2 >> >> loaded via a namespace (and not attached): >> [1] Rcpp_0.12.18 tidyr_0.8.1 crayon_1.3.4 dplyr_0.7.6 >> [5] assertthat_0.2.0 R6_2.2.2 magrittr_1.5 pillar_1.3.0 >> [9] stringi_1.2.4 rlang_0.2.2 curl_3.2 limma_3.37.4 >> [13] xml2_1.2.0 tools_3.5.1 readr_1.1.1 glue_1.3.0 >> [17] purrr_0.2.5 hms_0.4.2 compiler_3.5.1 pkgconfig_2.0.2 >> [21] tidyselect_0.2.4 bindr_0.1.1 tibble_1.4.2 >> >> On 09/07/2018 06:08 AM, Deepa wrote: >> > Hello, >> > >> > I am using a bioconductor package in R. >> > The command that I use reads the contents of a file downloaded from a >> > database and creates an expression object. >> > >> > The syntax works perfectly fine when the input size is of 10 MB. >> Whereas, >> > when the file size is around 40MB the object isn't created. >> > >> > Is there an efficient way of loading a large input file to create the >> > expression object? >> > >> > This is my code, >> > >> > >> > library(gcrma) >> > library(limma) >> > library(biomaRt) >> > library(GEOquery) >> > library(Biobase) >> > require(GEOquery) >> > require(Biobase) >> > gseEset1 <- getGEO('GSE53454')[[1]] #filesize 10MB >> > gseEset2 <- getGEO('GSE76896')[[1]] #file size 40MB >> > >> > ##gseEset2 doesn't load and isn't created >> > >> > Many thanks >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >[[alternative HTML version deleted]]
Martin, I forgot to mention. The same command works fine when I try,gseEset2 <- getGEO('GSE76896') , without saving the file to a destination folder . Output: Found 1 file(s) GSE76896_series_matrix.txt.gz trying URL ' https://ftp.ncbi.nlm.nih.gov/geo/series/GSE76nnn/GSE76896/matrix/GSE76896_series_matrix.txt.gz ' Content type 'application/x-gzip' length 40561936 bytes (38.7 MB) =================================================downloaded 38.7 MB Parsed with column specification: cols( .default = col_double(), ID_REF = col_character() ) See spec(...) for full column specifications. |=================================================================| 100% 84 MB File stored at: /tmp/RtmprygqGb/GPL570.soft |=================================================================| 100% 80 MB |=================================================================| 100% 75 MB The problem occurs when I fetch the file from destination folder using gseEset2 <- getGEO('GSE76896', destdir = "/data/")[[1]] Found 1 file(s) GSE76896_series_matrix.txt.gz Using locally cached version: /data//GSE76896_series_matrix.txt.gz Parsed with column specification: cols( .default = col_double(), ID_REF = col_character() ) See spec(...) for full column specifications. |=================================================================| 100% 84 MB Using locally cached version of GPL570 found here: /data//GPL570.soft ^C On Fri, Sep 7, 2018 at 4:08 PM Deepa <deepamahm.iisc at gmail.com> wrote:> I am also providing the output that I obtain for your kind reference, > > gseEset2 <- getGEO('GSE76896', destdir = "data/")[[1]] > Found 1 file(s) > GSE76896_series_matrix.txt.gz > Using locally cached version: /data//GSE76896_series_matrix.txt.gz > Parsed with column specification: > cols( > .default = col_double(), > ID_REF = col_character() > ) > See spec(...) for full column specifications. > Using locally cached version of GPL570 found here: > /data//GPL570.soft > > After this I don't see any output. I had to forcefully stop the execution. > > On Fri, Sep 7, 2018 at 4:05 PM Deepa <deepamahm.iisc at gmail.com> wrote: > >> I already posted a similar issue on bioconductor. >> https://support.bioconductor.org/p/112607/#112634 >> Couldn't find a solution. >> >> >> On Fri, Sep 7, 2018 at 3:45 PM Martin Morgan <mtmorgan.bioc at gmail.com> >> wrote: >> >>> Ask on the Bioconductor support site https://support.bioconductor.org >>> >>> Provide (on the support site) the output of the R commands >>> >>> library(GEOquery) >>> sessionInfo() >>> >>> Also include (copy and paste) the output of the command that fails. I >>> have >>> >>> > gseEset2 <- getGEO('GSE76896')[[1]] >>> Found 1 file(s) >>> GSE76896_series_matrix.txt.gz >>> trying URL >>> ' >>> https://ftp.ncbi.nlm.nih.gov/geo/series/GSE76nnn/GSE76896/matrix/GSE76896_series_matrix.txt.gz >>> ' >>> Content type 'application/x-gzip' length 40561936 bytes (38.7 MB) >>> =================================================>>> downloaded 38.7 MB >>> >>> Parsed with column specification: >>> cols( >>> .default = col_double(), >>> ID_REF = col_character() >>> ) >>> See spec(...) for full column specifications. >>> |=================================================================| 100% >>> 84 MB >>> File stored at: >>> /tmp/Rtmpe4NWji/GPL570.soft >>> |=================================================================| 100% >>> 75 MB >>> > sessionInfo() >>> R version 3.5.1 Patched (2018-08-22 r75177) >>> Platform: x86_64-pc-linux-gnu (64-bit) >>> Running under: Ubuntu 16.04.5 LTS >>> >>> Matrix products: default >>> BLAS: /home/mtmorgan/bin/R-3-5-branch/lib/libRblas.so >>> LAPACK: /home/mtmorgan/bin/R-3-5-branch/lib/libRlapack.so >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods >>> [8] base >>> >>> other attached packages: >>> [1] bindrcpp_0.2.2 GEOquery_2.49.1 Biobase_2.41.2 >>> [4] BiocGenerics_0.27.1 BiocManager_1.30.2 >>> >>> loaded via a namespace (and not attached): >>> [1] Rcpp_0.12.18 tidyr_0.8.1 crayon_1.3.4 dplyr_0.7.6 >>> [5] assertthat_0.2.0 R6_2.2.2 magrittr_1.5 pillar_1.3.0 >>> [9] stringi_1.2.4 rlang_0.2.2 curl_3.2 limma_3.37.4 >>> [13] xml2_1.2.0 tools_3.5.1 readr_1.1.1 glue_1.3.0 >>> [17] purrr_0.2.5 hms_0.4.2 compiler_3.5.1 pkgconfig_2.0.2 >>> [21] tidyselect_0.2.4 bindr_0.1.1 tibble_1.4.2 >>> >>> On 09/07/2018 06:08 AM, Deepa wrote: >>> > Hello, >>> > >>> > I am using a bioconductor package in R. >>> > The command that I use reads the contents of a file downloaded from a >>> > database and creates an expression object. >>> > >>> > The syntax works perfectly fine when the input size is of 10 MB. >>> Whereas, >>> > when the file size is around 40MB the object isn't created. >>> > >>> > Is there an efficient way of loading a large input file to create the >>> > expression object? >>> > >>> > This is my code, >>> > >>> > >>> > library(gcrma) >>> > library(limma) >>> > library(biomaRt) >>> > library(GEOquery) >>> > library(Biobase) >>> > require(GEOquery) >>> > require(Biobase) >>> > gseEset1 <- getGEO('GSE53454')[[1]] #filesize 10MB >>> > gseEset2 <- getGEO('GSE76896')[[1]] #file size 40MB >>> > >>> > ##gseEset2 doesn't load and isn't created >>> > >>> > Many thanks >>> > >>> > [[alternative HTML version deleted]] >>> > >>> > ______________________________________________ >>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> > https://stat.ethz.ch/mailman/listinfo/r-help >>> > PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> > and provide commented, minimal, self-contained, reproducible code. >>> > >>> >>[[alternative HTML version deleted]]