Dear all, I tried to read in a 3.8Gb RDS file on a computer with 16Gb available memory. To my astonishment, the memory footprint of R rises quickly to over 13Gb and the attempt ends with an error that says "cannot allocate vector of size 5.8Gb". I would expect that 3 times the memory would be enough to read in that file, but apparently I was wrong. I checked the memory.limit() and that one gave me a value of more than 13Gb. So I wondered if this was to be expected, or if there could be an underlying reason why this file doesn't want to open. Thank you in advance Joris -- Joris Meys Statistical consultant Department of Data Analysis and Mathematical Modelling Ghent University Coupure Links 653, B-9000 Gent (Belgium) <https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g> ----------- Biowiskundedagen 2017-2018 http://www.biowiskundedagen.ugent.be/ ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]]
Your RDS file is likely compressed, and could have compression of 10x or more depending on the composition of the data that is in it and the compression method used. 'gzip' compression is used by default. -- Brian G. Peterson http://braverock.com/brian/ Ph: 773-459-4973 IM: bgpbraverock On Tue, 2018-09-18 at 17:28 +0200, Joris Meys wrote:> Dear all, > > I tried to read in a 3.8Gb RDS file on a computer with 16Gb available > memory. To my astonishment, the memory footprint of R rises quickly > to over 13Gb and the attempt ends with an error that says "cannot > allocate vector of size 5.8Gb". > > I would expect that 3 times the memory would be enough to read in > that file, but apparently I was wrong. I checked the memory.limit() > and that one gave me a value of more than 13Gb. So I wondered if this > was to be expected, or if there could be an underlying reason why > this file doesn't want to open. > > Thank you in advance > Joris >
The ratio of object size to rds file size depends on the object. Some variation is due to how header information is stored in memory and in the file but I suspect most is due to how compression works (e.g., a vector of repeated values can be compressed into a smaller file than a bunch of random bytes). f <- function (data, ...) { force(data) tf <- tempfile() on.exit(unlink(tf)) save(data, file = tf) c(`obj/file size` = as.numeric(object.size(data)/file.size(tf))) }> f(rep(0,1e6))obj/file size 1021.456> f(rep(0,1e6), compress=FALSE)obj/file size 0.9999986> f(rep(89.7,1e6))obj/file size 682.6555> f(log(1:1e6))obj/file size 1.309126> f(vector("list",1e6))obj/file size 2021.744> f(as.list(log(1:1e6)))obj/file size 8.907579> f(sample(as.raw(0:255),size=8e6,replace=TRUE))obj/file size 0.9998433> f(rep(as.raw(0:255),length=8e6))obj/file size 254.5595> f(as.character(1:1e6))obj/file size 23.5567 Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Sep 18, 2018 at 8:28 AM, Joris Meys <jorismeys at gmail.com> wrote:> Dear all, > > I tried to read in a 3.8Gb RDS file on a computer with 16Gb available > memory. To my astonishment, the memory footprint of R rises quickly to over > 13Gb and the attempt ends with an error that says "cannot allocate vector > of size 5.8Gb". > > I would expect that 3 times the memory would be enough to read in that > file, but apparently I was wrong. I checked the memory.limit() and that one > gave me a value of more than 13Gb. So I wondered if this was to be > expected, or if there could be an underlying reason why this file doesn't > want to open. > > Thank you in advance > Joris > > -- > Joris Meys > Statistical consultant > > Department of Data Analysis and Mathematical Modelling > Ghent University > Coupure Links 653, B-9000 Gent (Belgium) > <https://maps.google.com/?q=Coupure+links+653,%C2%A0B- > 9000+Gent,%C2%A0Belgium&entry=gmail&source=g> > > ----------- > Biowiskundedagen 2017-2018 > http://www.biowiskundedagen.ugent.be/ > > ------------------------------- > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
Thx William and Brian for your swift responses, very insightful. I'll have to hunt for more memory. Cheers Joris On Tue, Sep 18, 2018 at 6:16 PM William Dunlap <wdunlap at tibco.com> wrote:> The ratio of object size to rds file size depends on the object. Some > variation is due to how header information is stored in memory and in the > file but I suspect most is due to how compression works (e.g., a vector of > repeated values can be compressed into a smaller file than a bunch of > random bytes). > > f <- function (data, ...) { > force(data) > tf <- tempfile() > on.exit(unlink(tf)) > save(data, file = tf) > c(`obj/file size` = as.numeric(object.size(data)/file.size(tf))) > } > > > f(rep(0,1e6)) > obj/file size > 1021.456 > > f(rep(0,1e6), compress=FALSE) > obj/file size > 0.9999986 > > f(rep(89.7,1e6)) > obj/file size > 682.6555 > > f(log(1:1e6)) > obj/file size > 1.309126 > > f(vector("list",1e6)) > obj/file size > 2021.744 > > f(as.list(log(1:1e6))) > obj/file size > 8.907579 > > f(sample(as.raw(0:255),size=8e6,replace=TRUE)) > obj/file size > 0.9998433 > > f(rep(as.raw(0:255),length=8e6)) > obj/file size > 254.5595 > > f(as.character(1:1e6)) > obj/file size > 23.5567 > > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Tue, Sep 18, 2018 at 8:28 AM, Joris Meys <jorismeys at gmail.com> wrote: > >> Dear all, >> >> I tried to read in a 3.8Gb RDS file on a computer with 16Gb available >> memory. To my astonishment, the memory footprint of R rises quickly to >> over >> 13Gb and the attempt ends with an error that says "cannot allocate vector >> of size 5.8Gb". >> >> I would expect that 3 times the memory would be enough to read in that >> file, but apparently I was wrong. I checked the memory.limit() and that >> one >> gave me a value of more than 13Gb. So I wondered if this was to be >> expected, or if there could be an underlying reason why this file doesn't >> want to open. >> >> Thank you in advance >> Joris >> >> -- >> Joris Meys >> Statistical consultant >> >> Department of Data Analysis and Mathematical Modelling >> Ghent University >> Coupure Links 653, B-9000 Gent (Belgium) >> < >> https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g >> > >> >> ----------- >> Biowiskundedagen 2017-2018 >> http://www.biowiskundedagen.ugent.be/ >> >> ------------------------------- >> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > >-- Joris Meys Statistical consultant Department of Data Analysis and Mathematical Modelling Ghent University Coupure Links 653, B-9000 Gent (Belgium) <https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g> ----------- Biowiskundedagen 2017-2018 http://www.biowiskundedagen.ugent.be/ ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]]