I already posted a similar issue on bioconductor. https://support.bioconductor.org/p/112607/#112634 Couldn't find a solution. On Fri, Sep 7, 2018 at 3:45 PM Martin Morgan <mtmorgan.bioc at gmail.com> wrote:> Ask on the Bioconductor support site https://support.bioconductor.org > > Provide (on the support site) the output of the R commands > > library(GEOquery) > sessionInfo() > > Also include (copy and paste) the output of the command that fails. I have > > > gseEset2 <- getGEO('GSE76896')[[1]] > Found 1 file(s) > GSE76896_series_matrix.txt.gz > trying URL > ' > https://ftp.ncbi.nlm.nih.gov/geo/series/GSE76nnn/GSE76896/matrix/GSE76896_series_matrix.txt.gz > ' > Content type 'application/x-gzip' length 40561936 bytes (38.7 MB) > =================================================> downloaded 38.7 MB > > Parsed with column specification: > cols( > .default = col_double(), > ID_REF = col_character() > ) > See spec(...) for full column specifications. > |=================================================================| 100% > 84 MB > File stored at: > /tmp/Rtmpe4NWji/GPL570.soft > |=================================================================| 100% > 75 MB > > sessionInfo() > R version 3.5.1 Patched (2018-08-22 r75177) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu 16.04.5 LTS > > Matrix products: default > BLAS: /home/mtmorgan/bin/R-3-5-branch/lib/libRblas.so > LAPACK: /home/mtmorgan/bin/R-3-5-branch/lib/libRlapack.so > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] bindrcpp_0.2.2 GEOquery_2.49.1 Biobase_2.41.2 > [4] BiocGenerics_0.27.1 BiocManager_1.30.2 > > loaded via a namespace (and not attached): > [1] Rcpp_0.12.18 tidyr_0.8.1 crayon_1.3.4 dplyr_0.7.6 > [5] assertthat_0.2.0 R6_2.2.2 magrittr_1.5 pillar_1.3.0 > [9] stringi_1.2.4 rlang_0.2.2 curl_3.2 limma_3.37.4 > [13] xml2_1.2.0 tools_3.5.1 readr_1.1.1 glue_1.3.0 > [17] purrr_0.2.5 hms_0.4.2 compiler_3.5.1 pkgconfig_2.0.2 > [21] tidyselect_0.2.4 bindr_0.1.1 tibble_1.4.2 > > On 09/07/2018 06:08 AM, Deepa wrote: > > Hello, > > > > I am using a bioconductor package in R. > > The command that I use reads the contents of a file downloaded from a > > database and creates an expression object. > > > > The syntax works perfectly fine when the input size is of 10 MB. Whereas, > > when the file size is around 40MB the object isn't created. > > > > Is there an efficient way of loading a large input file to create the > > expression object? > > > > This is my code, > > > > > > library(gcrma) > > library(limma) > > library(biomaRt) > > library(GEOquery) > > library(Biobase) > > require(GEOquery) > > require(Biobase) > > gseEset1 <- getGEO('GSE53454')[[1]] #filesize 10MB > > gseEset2 <- getGEO('GSE76896')[[1]] #file size 40MB > > > > ##gseEset2 doesn't load and isn't created > > > > Many thanks > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > >[[alternative HTML version deleted]]
I am also providing the output that I obtain for your kind reference,
gseEset2 <- getGEO('GSE76896', destdir = "data/")[[1]]
Found 1 file(s)
GSE76896_series_matrix.txt.gz
Using locally cached version: /data//GSE76896_series_matrix.txt.gz
Parsed with column specification:
cols(
.default = col_double(),
ID_REF = col_character()
)
See spec(...) for full column specifications.
Using locally cached version of GPL570 found here:
/data//GPL570.soft
After this I don't see any output. I had to forcefully stop the execution.
On Fri, Sep 7, 2018 at 4:05 PM Deepa <deepamahm.iisc at gmail.com> wrote:
> I already posted a similar issue on bioconductor.
> https://support.bioconductor.org/p/112607/#112634
> Couldn't find a solution.
>
>
> On Fri, Sep 7, 2018 at 3:45 PM Martin Morgan <mtmorgan.bioc at
gmail.com>
> wrote:
>
>> Ask on the Bioconductor support site https://support.bioconductor.org
>>
>> Provide (on the support site) the output of the R commands
>>
>> library(GEOquery)
>> sessionInfo()
>>
>> Also include (copy and paste) the output of the command that fails. I
have
>>
>> > gseEset2 <- getGEO('GSE76896')[[1]]
>> Found 1 file(s)
>> GSE76896_series_matrix.txt.gz
>> trying URL
>> '
>>
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE76nnn/GSE76896/matrix/GSE76896_series_matrix.txt.gz
>> '
>> Content type 'application/x-gzip' length 40561936 bytes (38.7
MB)
>> =================================================>> downloaded
38.7 MB
>>
>> Parsed with column specification:
>> cols(
>> .default = col_double(),
>> ID_REF = col_character()
>> )
>> See spec(...) for full column specifications.
>> |=================================================================|
100%
>> 84 MB
>> File stored at:
>> /tmp/Rtmpe4NWji/GPL570.soft
>> |=================================================================|
100%
>> 75 MB
>> > sessionInfo()
>> R version 3.5.1 Patched (2018-08-22 r75177)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Ubuntu 16.04.5 LTS
>>
>> Matrix products: default
>> BLAS: /home/mtmorgan/bin/R-3-5-branch/lib/libRblas.so
>> LAPACK: /home/mtmorgan/bin/R-3-5-branch/lib/libRlapack.so
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets methods
>> [8] base
>>
>> other attached packages:
>> [1] bindrcpp_0.2.2 GEOquery_2.49.1 Biobase_2.41.2
>> [4] BiocGenerics_0.27.1 BiocManager_1.30.2
>>
>> loaded via a namespace (and not attached):
>> [1] Rcpp_0.12.18 tidyr_0.8.1 crayon_1.3.4 dplyr_0.7.6
>> [5] assertthat_0.2.0 R6_2.2.2 magrittr_1.5 pillar_1.3.0
>> [9] stringi_1.2.4 rlang_0.2.2 curl_3.2 limma_3.37.4
>> [13] xml2_1.2.0 tools_3.5.1 readr_1.1.1 glue_1.3.0
>> [17] purrr_0.2.5 hms_0.4.2 compiler_3.5.1 pkgconfig_2.0.2
>> [21] tidyselect_0.2.4 bindr_0.1.1 tibble_1.4.2
>>
>> On 09/07/2018 06:08 AM, Deepa wrote:
>> > Hello,
>> >
>> > I am using a bioconductor package in R.
>> > The command that I use reads the contents of a file downloaded
from a
>> > database and creates an expression object.
>> >
>> > The syntax works perfectly fine when the input size is of 10 MB.
>> Whereas,
>> > when the file size is around 40MB the object isn't created.
>> >
>> > Is there an efficient way of loading a large input file to create
the
>> > expression object?
>> >
>> > This is my code,
>> >
>> >
>> > library(gcrma)
>> > library(limma)
>> > library(biomaRt)
>> > library(GEOquery)
>> > library(Biobase)
>> > require(GEOquery)
>> > require(Biobase)
>> > gseEset1 <- getGEO('GSE53454')[[1]] #filesize 10MB
>> > gseEset2 <- getGEO('GSE76896')[[1]] #file size 40MB
>> >
>> > ##gseEset2 doesn't load and isn't created
>> >
>> > Many thanks
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>
[[alternative HTML version deleted]]
Martin,
I forgot to mention.
The same command works fine when I try,gseEset2 <- getGEO('GSE76896')
,
without saving the file to a destination folder .
Output:
Found 1 file(s)
GSE76896_series_matrix.txt.gz
trying URL '
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE76nnn/GSE76896/matrix/GSE76896_series_matrix.txt.gz
'
Content type 'application/x-gzip' length 40561936 bytes (38.7 MB)
=================================================downloaded 38.7 MB
Parsed with column specification:
cols(
.default = col_double(),
ID_REF = col_character()
)
See spec(...) for full column specifications.
|=================================================================| 100%
84 MB
File stored at:
/tmp/RtmprygqGb/GPL570.soft
|=================================================================| 100%
80 MB
|=================================================================| 100%
75 MB
The problem occurs when I fetch the file from destination folder using
gseEset2 <- getGEO('GSE76896', destdir = "/data/")[[1]]
Found 1 file(s)
GSE76896_series_matrix.txt.gz
Using locally cached version: /data//GSE76896_series_matrix.txt.gz
Parsed with column specification:
cols(
.default = col_double(),
ID_REF = col_character()
)
See spec(...) for full column specifications.
|=================================================================| 100%
84 MB
Using locally cached version of GPL570 found here:
/data//GPL570.soft
^C
On Fri, Sep 7, 2018 at 4:08 PM Deepa <deepamahm.iisc at gmail.com> wrote:
> I am also providing the output that I obtain for your kind reference,
>
> gseEset2 <- getGEO('GSE76896', destdir = "data/")[[1]]
> Found 1 file(s)
> GSE76896_series_matrix.txt.gz
> Using locally cached version: /data//GSE76896_series_matrix.txt.gz
> Parsed with column specification:
> cols(
> .default = col_double(),
> ID_REF = col_character()
> )
> See spec(...) for full column specifications.
> Using locally cached version of GPL570 found here:
> /data//GPL570.soft
>
> After this I don't see any output. I had to forcefully stop the
execution.
>
> On Fri, Sep 7, 2018 at 4:05 PM Deepa <deepamahm.iisc at gmail.com>
wrote:
>
>> I already posted a similar issue on bioconductor.
>> https://support.bioconductor.org/p/112607/#112634
>> Couldn't find a solution.
>>
>>
>> On Fri, Sep 7, 2018 at 3:45 PM Martin Morgan <mtmorgan.bioc at
gmail.com>
>> wrote:
>>
>>> Ask on the Bioconductor support site
https://support.bioconductor.org
>>>
>>> Provide (on the support site) the output of the R commands
>>>
>>> library(GEOquery)
>>> sessionInfo()
>>>
>>> Also include (copy and paste) the output of the command that fails.
I
>>> have
>>>
>>> > gseEset2 <- getGEO('GSE76896')[[1]]
>>> Found 1 file(s)
>>> GSE76896_series_matrix.txt.gz
>>> trying URL
>>> '
>>>
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE76nnn/GSE76896/matrix/GSE76896_series_matrix.txt.gz
>>> '
>>> Content type 'application/x-gzip' length 40561936 bytes
(38.7 MB)
>>> =================================================>>>
downloaded 38.7 MB
>>>
>>> Parsed with column specification:
>>> cols(
>>> .default = col_double(),
>>> ID_REF = col_character()
>>> )
>>> See spec(...) for full column specifications.
>>> |=================================================================|
100%
>>> 84 MB
>>> File stored at:
>>> /tmp/Rtmpe4NWji/GPL570.soft
>>> |=================================================================|
100%
>>> 75 MB
>>> > sessionInfo()
>>> R version 3.5.1 Patched (2018-08-22 r75177)
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>> Running under: Ubuntu 16.04.5 LTS
>>>
>>> Matrix products: default
>>> BLAS: /home/mtmorgan/bin/R-3-5-branch/lib/libRblas.so
>>> LAPACK: /home/mtmorgan/bin/R-3-5-branch/lib/libRlapack.so
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] parallel stats graphics grDevices utils datasets
methods
>>> [8] base
>>>
>>> other attached packages:
>>> [1] bindrcpp_0.2.2 GEOquery_2.49.1 Biobase_2.41.2
>>> [4] BiocGenerics_0.27.1 BiocManager_1.30.2
>>>
>>> loaded via a namespace (and not attached):
>>> [1] Rcpp_0.12.18 tidyr_0.8.1 crayon_1.3.4
dplyr_0.7.6
>>> [5] assertthat_0.2.0 R6_2.2.2 magrittr_1.5
pillar_1.3.0
>>> [9] stringi_1.2.4 rlang_0.2.2 curl_3.2
limma_3.37.4
>>> [13] xml2_1.2.0 tools_3.5.1 readr_1.1.1 glue_1.3.0
>>> [17] purrr_0.2.5 hms_0.4.2 compiler_3.5.1
pkgconfig_2.0.2
>>> [21] tidyselect_0.2.4 bindr_0.1.1 tibble_1.4.2
>>>
>>> On 09/07/2018 06:08 AM, Deepa wrote:
>>> > Hello,
>>> >
>>> > I am using a bioconductor package in R.
>>> > The command that I use reads the contents of a file downloaded
from a
>>> > database and creates an expression object.
>>> >
>>> > The syntax works perfectly fine when the input size is of 10
MB.
>>> Whereas,
>>> > when the file size is around 40MB the object isn't
created.
>>> >
>>> > Is there an efficient way of loading a large input file to
create the
>>> > expression object?
>>> >
>>> > This is my code,
>>> >
>>> >
>>> > library(gcrma)
>>> > library(limma)
>>> > library(biomaRt)
>>> > library(GEOquery)
>>> > library(Biobase)
>>> > require(GEOquery)
>>> > require(Biobase)
>>> > gseEset1 <- getGEO('GSE53454')[[1]] #filesize 10MB
>>> > gseEset2 <- getGEO('GSE76896')[[1]] #file size 40MB
>>> >
>>> > ##gseEset2 doesn't load and isn't created
>>> >
>>> > Many thanks
>>> >
>>> > [[alternative HTML version deleted]]
>>> >
>>> > ______________________________________________
>>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> > and provide commented, minimal, self-contained, reproducible
code.
>>> >
>>>
>>
[[alternative HTML version deleted]]