Jonathan Greenberg
2012-May-02 22:23 UTC
[R] Quickest way to make a large "empty" file on disk?
R-helpers: What would be the absolute fastest way to make a large "empty" file (e.g. filled with all zeroes) on disk, given a byte size and a given number number of empty values. I know I can use writeBin, but the "object" in this case may be far too large to store in main memory. I'm asking because I'm going to use this file in conjunction with mmap to do parallel writes to this file. Say, I want to create a blank file of 10,000 floating point numbers. Thanks! --j -- Jonathan A. Greenberg, PhD Assistant Professor Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 415-763-5476 AIM: jgrn307, MSN: jgrn307@hotmail.com, Gchat: jgrn307, Skype: jgrn3007 http://www.geog.illinois.edu/people/JonathanGreenberg.html [[alternative HTML version deleted]]
Jeff Ryan
2012-May-02 22:42 UTC
[R] [R-sig-hpc] Quickest way to make a large "empty" file on disk?
Look at the man page for dd (assuming you are on *nix) A quick google will get you a command to try. I'm not at my desk or I would as well. Jeff Jeffrey Ryan | Founder | jeffrey.ryan at lemnica.com www.lemnica.com On May 2, 2012, at 5:23 PM, Jonathan Greenberg <jgrn at illinois.edu> wrote:> R-helpers: > > What would be the absolute fastest way to make a large "empty" file (e.g. > filled with all zeroes) on disk, given a byte size and a given number > number of empty values. I know I can use writeBin, but the "object" in > this case may be far too large to store in main memory. I'm asking because > I'm going to use this file in conjunction with mmap to do parallel writes > to this file. Say, I want to create a blank file of 10,000 floating point > numbers. > > Thanks! > > --j > > -- > Jonathan A. Greenberg, PhD > Assistant Professor > Department of Geography and Geographic Information Science > University of Illinois at Urbana-Champaign > 607 South Mathews Avenue, MC 150 > Urbana, IL 61801 > Phone: 415-763-5476 > AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007 > http://www.geog.illinois.edu/people/JonathanGreenberg.html > > [[alternative HTML version deleted]] > > _______________________________________________ > R-sig-hpc mailing list > R-sig-hpc at r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
Denham Robert
2012-May-02 22:44 UTC
[R] [R-sig-hpc] Quickest way to make a large "empty" file on disk?
Jonathon, 10,000 numbers is pretty small, so I don't think time will be a big problem. You could write this using writeBin with no problems. For larger files, why not just use a loop? The writing is pretty fast, so I don't think you'll have too many problems. On my machine:> ptm <- proc.time() > zz <- file("testbin.bin", "wb") > for(i in 100000) writeBin(rep(0,100000000),zz, size=16) > close(zz) > proc.time() - ptmuser system elapsed 2.416 1.728 16.705 Otherwise I would suggest writing a little piece of c code to do what you want. Robert -----Original Message----- From: r-sig-hpc-bounces at r-project.org [mailto:r-sig-hpc-bounces at r-project.org] On Behalf Of Jonathan Greenberg Sent: Thursday, 3 May 2012 8:24 AM To: r-help; r-sig-hpc at r-project.org Subject: [R-sig-hpc] Quickest way to make a large "empty" file on disk? R-helpers: What would be the absolute fastest way to make a large "empty" file (e.g. filled with all zeroes) on disk, given a byte size and a given number number of empty values. I know I can use writeBin, but the "object" in this case may be far too large to store in main memory. I'm asking because I'm going to use this file in conjunction with mmap to do parallel writes to this file. Say, I want to create a blank file of 10,000 floating point numbers. Thanks! --j -- Jonathan A. Greenberg, PhD Assistant Professor Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 415-763-5476 AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007 http://www.geog.illinois.edu/people/JonathanGreenberg.html [[alternative HTML version deleted]] _______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-hpc ------------------------------ The information in this email together with any attachments is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. There is no waiver of any confidentiality/privilege by your inadvertent receipt of this material. Any form of review, disclosure, modification, distribution and/or publication of this email message is prohibited, unless as a necessary part of Departmental business. If you have received this message in error, you are asked to inform the sender as quickly as possible and delete this message and any copies of this message from your computer and/or your computer system network.
Henrik Bengtsson
2012-May-02 22:45 UTC
[R] Quickest way to make a large "empty" file on disk?
An R solution is: allocateFile <- function(pathname, nbrOfBytes) { con <- file(pathname, open="wb"); on.exit(close(con)); seek(con, where=nbrOfBytes-1L, origin="start", rw="write"); writeBin(as.raw(0), con=con); invisible(pathname); } # allocateFile()> allocateFile("foo.bin", nbrOfBytes=985403) > file.info("foo.bin")$size[1] 985403 Note sure if it works on all OSes/file systems. /Henrik On Wed, May 2, 2012 at 3:23 PM, Jonathan Greenberg <jgrn at illinois.edu> wrote:> R-helpers: > > What would be the absolute fastest way to make a large "empty" file (e.g. > filled with all zeroes) on disk, given a byte size and a given number > number of empty values. ?I know I can use writeBin, but the "object" in > this case may be far too large to store in main memory. ?I'm asking because > I'm going to use this file in conjunction with mmap to do parallel writes > to this file. ?Say, I want to create a blank file of 10,000 floating point > numbers. > > Thanks! > > --j > > -- > Jonathan A. Greenberg, PhD > Assistant Professor > Department of Geography and Geographic Information Science > University of Illinois at Urbana-Champaign > 607 South Mathews Avenue, MC 150 > Urbana, IL 61801 > Phone: 415-763-5476 > AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007 > http://www.geog.illinois.edu/people/JonathanGreenberg.html > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Jeff Ryan
2012-May-02 22:57 UTC
[R] [R-sig-hpc] Quickest way to make a large "empty" file on disk?
Something like: http://markus.revti.com/2007/06/creating-empty-file-with-specified-size/ Is one way I know of. Jeff Jeffrey Ryan | Founder | jeffrey.ryan at lemnica.com www.lemnica.com On May 2, 2012, at 5:23 PM, Jonathan Greenberg <jgrn at illinois.edu> wrote:> R-helpers: > > What would be the absolute fastest way to make a large "empty" file (e.g. > filled with all zeroes) on disk, given a byte size and a given number > number of empty values. I know I can use writeBin, but the "object" in > this case may be far too large to store in main memory. I'm asking because > I'm going to use this file in conjunction with mmap to do parallel writes > to this file. Say, I want to create a blank file of 10,000 floating point > numbers. > > Thanks! > > --j > > -- > Jonathan A. Greenberg, PhD > Assistant Professor > Department of Geography and Geographic Information Science > University of Illinois at Urbana-Champaign > 607 South Mathews Avenue, MC 150 > Urbana, IL 61801 > Phone: 415-763-5476 > AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007 > http://www.geog.illinois.edu/people/JonathanGreenberg.html > > [[alternative HTML version deleted]] > > _______________________________________________ > R-sig-hpc mailing list > R-sig-hpc at r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
Simon Urbanek
2012-May-03 01:08 UTC
[R] [R-sig-hpc] Quickest way to make a large "empty" file on disk?
On May 2, 2012, at 6:23 PM, Jonathan Greenberg wrote:> R-helpers: > > What would be the absolute fastest way to make a large "empty" file (e.g. > filled with all zeroes) on disk, given a byte size and a given number > number of empty values. I know I can use writeBin, but the "object" in > this case may be far too large to store in main memory. I'm asking because > I'm going to use this file in conjunction with mmap to do parallel writes > to this file. Say, I want to create a blank file of 10,000 floating point > numbers. >The most trivial way is to simply seek to the end and write a byte:> n=100000 > f=file("foo","wb") > seek(f,n-1)[1] 0> writeBin(raw(1),f) > close(f) > file.info("foo")$size[1] 1e+05 Cheers, Simon> Thanks! > > --j > > -- > Jonathan A. Greenberg, PhD > Assistant Professor > Department of Geography and Geographic Information Science > University of Illinois at Urbana-Champaign > 607 South Mathews Avenue, MC 150 > Urbana, IL 61801 > Phone: 415-763-5476 > AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007 > http://www.geog.illinois.edu/people/JonathanGreenberg.html > > [[alternative HTML version deleted]] > > _______________________________________________ > R-sig-hpc mailing list > R-sig-hpc at r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-hpc > >
"Jens Oehlschlägel"
2012-May-03 11:28 UTC
[R] [R-sig-hpc] Quickest way to make a large "empty" file on disk?
Jonathan, On some filesystems (e.g. NTFS, see below) it is possible to create 'sparse' memory-mapped files, i.e. reserving the space without the cost of actually writing initial values. Package 'ff' does this automatically and also allows to access the file in parallel. Check the example below and see how big file creation is immediate. Jens Oehlschl?gel > library(ff) > library(snowfall) > ncpus <- 2 > n <- 1e8 > system.time( + x <- ff(vmode="double", length=n, filename="c:/Temp/x.ff") + ) User System verstrichen 0.01 0.00 0.02 > # check finalizer, with an explicit filename we should have a 'close' finalizer > finalizer(x) [1] "close" > # if not, set it to 'close' inorder to not let slaves delete x on slave shutdown > finalizer(x) <- "close" > sfInit(parallel=TRUE, cpus=ncpus, type="SOCK") R Version: R version 2.15.0 (2012-03-30) snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 2 CPUs. > sfLibrary(ff) Library ff loaded. Library ff loaded in cluster. Warnmeldung: In library(package = "ff", character.only = TRUE, pos = 2, warn.conflicts TRUE, : 'keep.source' is deprecated and will be ignored > sfExport("x") # note: do not export the same ff multiple times > # explicitely opening avoids a gc problem > sfClusterEval(open(x, caching="mmeachflush")) # opening with 'mmeachflush' inststead of 'mmnoflush' is a bit slower but prevents OS write storms when the file is larger than RAM [[1]] [1] TRUE [[2]] [1] TRUE > system.time( + sfLapply( chunk(x, length=ncpus), function(i){ + x[i] <- runif(sum(i)) + invisible() + }) + ) User System verstrichen 0.00 0.00 30.78 > system.time( + s <- sfLapply( chunk(x, length=ncpus), function(i) quantile(x[i], c(0.05, 0.95)) ) + ) User System verstrichen 0.00 0.00 4.38 > # for completeness > sfClusterEval(close(x)) [[1]] [1] TRUE [[2]] [1] TRUE > csummary(s) 5% 95% Min. 0.04998 0.95 1st Qu. 0.04999 0.95 Median 0.05001 0.95 Mean 0.05001 0.95 3rd Qu. 0.05002 0.95 Max. 0.05003 0.95 > # stop slaves > sfStop() Stopping cluster > # with the close finalizer we are responsible for deleting the file explicitely (unless we want to keep it) > delete(x) [1] TRUE > # remove r-side metadata > rm(x) > # truly free memory > gc() Gesendet: Donnerstag, 03. Mai 2012 um 00:23 Uhr Von: "Jonathan Greenberg" <jgrn at illinois.edu> An: r-help <r-help at r-project.org>, r-sig-hpc at r-project.org Betreff: [R-sig-hpc] Quickest way to make a large "empty" file on disk? R-helpers: What would be the absolute fastest way to make a large "empty" file (e.g. filled with all zeroes) on disk, given a byte size and a given number number of empty values. I know I can use writeBin, but the "object" in this case may be far too large to store in main memory. I'm asking because I'm going to use this file in conjunction with mmap to do parallel writes to this file. Say, I want to create a blank file of 10,000 floating point numbers. Thanks! --j -- Jonathan A. Greenberg, PhD Assistant Professor Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 415-763-5476 AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007 [1]http://www.geog.illinois.edu/people/JonathanGreenberg.html [[alternative HTML version deleted]] _______________________________________________ R-sig-hpc mailing list R-sig-hpc at r-project.org [2]https://stat.ethz.ch/mailman/listinfo/r-sig-hpc References 1. http://www.geog.illinois.edu/people/JonathanGreenberg.html 2. https://stat.ethz.ch/mailman/listinfo/r-sig-hpc