Hello, I am using a bioconductor package in R. The command that I use reads the contents of a file downloaded from a database and creates an expression object. The syntax works perfectly fine when the input size is of 10 MB. Whereas, when the file size is around 40MB the object isn't created. Is there an efficient way of loading a large input file to create the expression object? This is my code, library(gcrma) library(limma) library(biomaRt) library(GEOquery) library(Biobase) require(GEOquery) require(Biobase) gseEset1 <- getGEO('GSE53454')[[1]] #filesize 10MB gseEset2 <- getGEO('GSE76896')[[1]] #file size 40MB ##gseEset2 doesn't load and isn't created Many thanks [[alternative HTML version deleted]]
The following is the system configuration: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 142 Model name: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz Stepping: 9 CPU MHz: 2844.008 CPU max MHz: 3500.0000 CPU min MHz: 400.0000 BogoMIPS: 5808.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 4096K NUMA node0 CPU(s): 0-3 On Fri, Sep 7, 2018 at 3:38 PM Deepa <deepamahm.iisc at gmail.com> wrote:> Hello, > > I am using a bioconductor package in R. > The command that I use reads the contents of a file downloaded from a > database and creates an expression object. > > The syntax works perfectly fine when the input size is of 10 MB. Whereas, > when the file size is around 40MB the object isn't created. > > Is there an efficient way of loading a large input file to create the > expression object? > > This is my code, > > > library(gcrma) > library(limma) > library(biomaRt) > library(GEOquery) > library(Biobase) > require(GEOquery) > require(Biobase) > gseEset1 <- getGEO('GSE53454')[[1]] #filesize 10MB > gseEset2 <- getGEO('GSE76896')[[1]] #file size 40MB > > ##gseEset2 doesn't load and isn't created > > Many thanks > > >[[alternative HTML version deleted]]
getgeo() seems to be a custom routine. Import the file in reader and confirm that's a CSV file from Excel. If this is a non standard input, custom subroutine is creating new constraints. Usually R has no problem till workspace is 1 gb On Fri 7 Sep, 2018, 15:38 Deepa, <deepamahm.iisc at gmail.com> wrote:> Hello, > > I am using a bioconductor package in R. > The command that I use reads the contents of a file downloaded from a > database and creates an expression object. > > The syntax works perfectly fine when the input size is of 10 MB. Whereas, > when the file size is around 40MB the object isn't created. > > Is there an efficient way of loading a large input file to create the > expression object? > > This is my code, > > > library(gcrma) > library(limma) > library(biomaRt) > library(GEOquery) > library(Biobase) > require(GEOquery) > require(Biobase) > gseEset1 <- getGEO('GSE53454')[[1]] #filesize 10MB > gseEset2 <- getGEO('GSE76896')[[1]] #file size 40MB > > ##gseEset2 doesn't load and isn't created > > Many thanks > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- ______________________________ Amit Mittal Pursuing Ph.D. in Finance and Accounting Indian Institute of Management, Lucknow Visit my SSRN author page: http://ssrn.com/author=2665511 * Top 10% Downloaded Author on SSRN Mob: +91 7525023664 This message has been sent from a mobile device. I may contact you again. _________________ [[alternative HTML version deleted]]
Ask on the Bioconductor support site https://support.bioconductor.org Provide (on the support site) the output of the R commands library(GEOquery) sessionInfo() Also include (copy and paste) the output of the command that fails. I have > gseEset2 <- getGEO('GSE76896')[[1]] Found 1 file(s) GSE76896_series_matrix.txt.gz trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE76nnn/GSE76896/matrix/GSE76896_series_matrix.txt.gz' Content type 'application/x-gzip' length 40561936 bytes (38.7 MB) =================================================downloaded 38.7 MB Parsed with column specification: cols( .default = col_double(), ID_REF = col_character() ) See spec(...) for full column specifications. |=================================================================| 100% 84 MB File stored at: /tmp/Rtmpe4NWji/GPL570.soft |=================================================================| 100% 75 MB > sessionInfo() R version 3.5.1 Patched (2018-08-22 r75177) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.5 LTS Matrix products: default BLAS: /home/mtmorgan/bin/R-3-5-branch/lib/libRblas.so LAPACK: /home/mtmorgan/bin/R-3-5-branch/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] bindrcpp_0.2.2 GEOquery_2.49.1 Biobase_2.41.2 [4] BiocGenerics_0.27.1 BiocManager_1.30.2 loaded via a namespace (and not attached): [1] Rcpp_0.12.18 tidyr_0.8.1 crayon_1.3.4 dplyr_0.7.6 [5] assertthat_0.2.0 R6_2.2.2 magrittr_1.5 pillar_1.3.0 [9] stringi_1.2.4 rlang_0.2.2 curl_3.2 limma_3.37.4 [13] xml2_1.2.0 tools_3.5.1 readr_1.1.1 glue_1.3.0 [17] purrr_0.2.5 hms_0.4.2 compiler_3.5.1 pkgconfig_2.0.2 [21] tidyselect_0.2.4 bindr_0.1.1 tibble_1.4.2 On 09/07/2018 06:08 AM, Deepa wrote:> Hello, > > I am using a bioconductor package in R. > The command that I use reads the contents of a file downloaded from a > database and creates an expression object. > > The syntax works perfectly fine when the input size is of 10 MB. Whereas, > when the file size is around 40MB the object isn't created. > > Is there an efficient way of loading a large input file to create the > expression object? > > This is my code, > > > library(gcrma) > library(limma) > library(biomaRt) > library(GEOquery) > library(Biobase) > require(GEOquery) > require(Biobase) > gseEset1 <- getGEO('GSE53454')[[1]] #filesize 10MB > gseEset2 <- getGEO('GSE76896')[[1]] #file size 40MB > > ##gseEset2 doesn't load and isn't created > > Many thanks > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
I already posted a similar issue on bioconductor. https://support.bioconductor.org/p/112607/#112634 Couldn't find a solution. On Fri, Sep 7, 2018 at 3:45 PM Martin Morgan <mtmorgan.bioc at gmail.com> wrote:> Ask on the Bioconductor support site https://support.bioconductor.org > > Provide (on the support site) the output of the R commands > > library(GEOquery) > sessionInfo() > > Also include (copy and paste) the output of the command that fails. I have > > > gseEset2 <- getGEO('GSE76896')[[1]] > Found 1 file(s) > GSE76896_series_matrix.txt.gz > trying URL > ' > https://ftp.ncbi.nlm.nih.gov/geo/series/GSE76nnn/GSE76896/matrix/GSE76896_series_matrix.txt.gz > ' > Content type 'application/x-gzip' length 40561936 bytes (38.7 MB) > =================================================> downloaded 38.7 MB > > Parsed with column specification: > cols( > .default = col_double(), > ID_REF = col_character() > ) > See spec(...) for full column specifications. > |=================================================================| 100% > 84 MB > File stored at: > /tmp/Rtmpe4NWji/GPL570.soft > |=================================================================| 100% > 75 MB > > sessionInfo() > R version 3.5.1 Patched (2018-08-22 r75177) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu 16.04.5 LTS > > Matrix products: default > BLAS: /home/mtmorgan/bin/R-3-5-branch/lib/libRblas.so > LAPACK: /home/mtmorgan/bin/R-3-5-branch/lib/libRlapack.so > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] bindrcpp_0.2.2 GEOquery_2.49.1 Biobase_2.41.2 > [4] BiocGenerics_0.27.1 BiocManager_1.30.2 > > loaded via a namespace (and not attached): > [1] Rcpp_0.12.18 tidyr_0.8.1 crayon_1.3.4 dplyr_0.7.6 > [5] assertthat_0.2.0 R6_2.2.2 magrittr_1.5 pillar_1.3.0 > [9] stringi_1.2.4 rlang_0.2.2 curl_3.2 limma_3.37.4 > [13] xml2_1.2.0 tools_3.5.1 readr_1.1.1 glue_1.3.0 > [17] purrr_0.2.5 hms_0.4.2 compiler_3.5.1 pkgconfig_2.0.2 > [21] tidyselect_0.2.4 bindr_0.1.1 tibble_1.4.2 > > On 09/07/2018 06:08 AM, Deepa wrote: > > Hello, > > > > I am using a bioconductor package in R. > > The command that I use reads the contents of a file downloaded from a > > database and creates an expression object. > > > > The syntax works perfectly fine when the input size is of 10 MB. Whereas, > > when the file size is around 40MB the object isn't created. > > > > Is there an efficient way of loading a large input file to create the > > expression object? > > > > This is my code, > > > > > > library(gcrma) > > library(limma) > > library(biomaRt) > > library(GEOquery) > > library(Biobase) > > require(GEOquery) > > require(Biobase) > > gseEset1 <- getGEO('GSE53454')[[1]] #filesize 10MB > > gseEset2 <- getGEO('GSE76896')[[1]] #file size 40MB > > > > ##gseEset2 doesn't load and isn't created > > > > Many thanks > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > >[[alternative HTML version deleted]]