similar to: Parallel compression support for saving to rds/rdata files?

Displaying 20 results from an estimated 2000 matches similar to: "Parallel compression support for saving to rds/rdata files?"

2016 Jul 27
2
Model object, when generated in a function, saves entire environment when saved
Another solution is to only save the parts of the model object that interest you. As long as they don't include the formula (which is what drags along the environment it was created in), you will save space. E.g., tfun2 <- function(subset) { junk <- 1:1e6 list(subset=subset, lm(Sepal.Length ~ Sepal.Width, data=iris, subset=subset)$coef) } saveSize(tfun2(1:4)) #[1] 152 Bill
2015 Jan 15
4
Request to speed up save()
Hi, I am dealing with very large datasets and it takes a long time to save a workspace image. The options to save compressed data are: "gzip", "bzip2" or "xz", the default being gzip. I wonder if it's possible to include the pbzip2 (http://compression.ca/pbzip2/) algorithm as an option when saving. "PBZIP2 is a parallel implementation of the bzip2
2016 Jul 27
3
Model object, when generated in a function, saves entire environment when saved
In the below, I generate a model from an environment that isn't .GlobalEnv with a large object that is unrelated to the model generation. It seems to save the irrelevant object unnecessarily. In my actual use case, I am running and saving many models in a loop that each use a single large data.frame (that gets collapsed into a small data.frame for estimation), so removing it isn't an
2020 Jan 29
2
Model object, when generated in a function, saves entire environment when saved
Reviving an old thread. I haven't noticed this be a problem for a while when saving RDS's which is great. However, I noticed the problem again when saving `qs` files (https://github.com/traversc/qs) which is an RDS replacement with a fast serialization / compression system. I'd like to get an idea of what change was made within R to address this issue for `saveRDS`. My thought is that
2016 Jul 27
0
Model object, when generated in a function, saves entire environment when saved
One way around this problem is to make a new environment whose parent environment is .GlobalEnv and which contains only what the the call to lm() requires and to compute lm() in that environment. E.g., tfun1 <- function (subset) { junk <- 1:1e+06 env <- new.env(parent = globalenv()) env$subset <- subset with(env, lm(Sepal.Length ~ Sepal.Width, data = iris, subset =
2010 Jul 19
22
zfs send to remote any ideas for a faster way than ssh?
I''ve tried ssh blowfish and scp arcfour. both are CPU limited long before the 10g link is. I''vw also tried mbuffer, but I get broken pipe errors part way through the transfer. I''m open to ideas for faster ways to to either zfs send directly or through a compressed file of the zfs send output. For the moment I; zfs send > pigz scp arcfour the file gz file to the
2016 Jul 27
0
Model object, when generated in a function, saves entire environment when saved
Thanks so much for all this. The first solution is what I'm going with as I want the terms object to come along so that predict still works. On Wed, Jul 27, 2016 at 12:28 PM, William Dunlap via R-devel < r-devel at r-project.org> wrote: > Another solution is to only save the parts of the model object that > interest you. As long as they don't include the formula (which is
2018 Apr 22
0
Problem reading RDS files
Wouldn't the obvious problem be that your data file is corrupted or was never created using saveRDS in the first place? Can you show us a complete example of creating and attempting to read what was just created? On April 22, 2018 10:20:05 AM CDT, mohammad moradi <mri.moradi at gmail.com> wrote: >Hi there, > >I faced a weird problem doing a seemingly simple task in R.
2015 Jan 15
0
Request to speed up save()
In addition to the major points that others made: if you care about speed, don't use compression. With today's fast disks it's an order of magnitude slower to use compression: > d=lapply(1:10, function(x) as.integer(rnorm(1e7))) > system.time(saveRDS(d, file="test.rds.gz")) user system elapsed 17.210 0.148 17.397 > system.time(saveRDS(d,
2018 Apr 22
2
Problem reading RDS files
Hi there, I faced a weird problem doing a seemingly simple task in R. Specifically, when trying for reading an RDS file from the working directory, the following error is appeared. Code: records <- readRDS("tweets.rds") Error: Error in readRDS("tweets.rds") : error reading from connection In addition: Warning message: In readRDS("tweets.rds") : invalid or
2012 Oct 05
24
Building an On-Site and Off-Size ZFS server, replication question
Good morning. I am in the process of planning a system which will have 2 ZFS servers, one on site, one off site. The on site server will be used by workstations and servers in house, and most of that will stay in house. There will, however, be data i want backed up somewhere else, which is where the offsite server comes in... This server will be sitting in a Data Center and will have some storage
2020 May 27
2
Determinant of umask for sieve_pipe_bin_dir scripts?
Hi, What determines the umask of sieve_pipe_bin_dir scripts ? The results from my script are always being set to 0600. My script is simple and shown below, even if I adjust the right line to add " && chmod 644", the actual resulting file still remains at 0600 ?!? #!/bin/bash # Usage: imapsieve_copy <email> <spam|ham> MSG_USER="$1" MSG_TYPE="$2"
2018 Apr 23
1
Problem reading RDS files
I've tried to re-experiment the tutorial presented at http://www.rdatamining.com/docs/twitter-analysis-with-r and specifically aimed to use rds files (tweet records) at http://www.rdatamining.com/data/. On Sun, Apr 22, 2018 at 9:16 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote: > Wouldn't the obvious problem be that your data file is corrupted or was > never created
2019 Jun 19
2
libvirtd does not update VM .xml configuration on filesystem after virsh blockcommit
Hi, Recently We've upgraded some KVM hosts from Fedora 29 to Fedora 30 and now experience broken VM configurations on filesystem after virsh blockcommit. Commands "virsh dumpxml ..." and "virsh dumpxml --inactive ..." is showing diffrent configuration than the one on filesystem. In case of restart libvirtd or system reboot, there are broken VM xml configurations on
2015 Mar 02
2
[LLVMdev] clang change function name
Hi, I compile a .cpp with cmd: clang++ -emit-llvm -c -g -O0 -w pbzip2.cpp -o pbzip2.bc -lbz2 llvm-dis pbzip2.bc One function in .cpp is consumer_decompress. However, I look inside pbzip2.ll. The function name is changed to "define i8* @_Z19consumer_decompressPv(i8* %q) #0 {" Why clang adds a "_Z19" prefix and "Pv" suffix? Thanks,
2015 Mar 02
2
[LLVMdev] clang change function name
Got it, thanks. But in my pass, I use function name to locate. Can I disable mangling in clang? Best, Haopeng On 3/1/15 10:44 PM, John Criswell wrote: > On 3/1/15 11:38 PM, Haopeng Liu wrote: >> Hi, >> >> I compile a .cpp with cmd: >> clang++ -emit-llvm -c -g -O0 -w pbzip2.cpp -o pbzip2.bc -lbz2 >> llvm-dis pbzip2.bc >> >> One function in .cpp is
2012 Jun 28
2
Strange du/df behaviour.
Hi all. I have currently a server: cat /etc/redhat-release CentOS release 5.7 (Final) uname -a Linux host.domain.com 2.6.18-274.18.1.el5 #1 SMP Thu Feb 9 12:45:44 EST 2012 x86_64 x86_64 x86_64 GNU/Linux I have there a filesystem mounted: /dev/vg0/paczki /home/paczki-workdir ext4 defaults,noatime 0 0 on which df gives strange output: LANG=C df -h
2009 Dec 04
30
ZFS send | verify | receive
If there were a ?zfs send? datastream saved someplace, is there a way to verify the integrity of that datastream without doing a ?zfs receive? and occupying all that disk space? I am aware that ?zfs send? is not a backup solution, due to vulnerability of even a single bit error, and lack of granularity, and other reasons. However ... There is an attraction to ?zfs send? as an augmentation to the
2010 Aug 05
3
[LLVMdev] a problem when using postDominatorTree
On 08/05/2010 06:46 AM, Wenbin Zhang wrote: > Hi all, > I'm using postDominatorTree to do some program analysis. My code works > well for small tests, but when I run it on real applications, the > following error occurs: > /Inorder PostDominator Tree: DFSNumbers invalid: 0 slow queries. > [1] <<exit node>> {0,21} > [2] %bb1 {1,2} > [2] %bb {3,4} > [2]
2018 Feb 07
1
saveRDS() overwrites file when object is not found
I ran into this behaviour when accidentally running a line of code that I shouldn't have. When saving over an rds with an object that's not found, I would have expected saveRDS to not touch the file. saveRDS(iris, "test.rds") file.size("test.rds") #> [1] 1080 saveRDS(no_object_here, "test.rds") #> Error in saveRDS(no_object_here, "test.rds"):