similar to: Serializing many small objects efficiently

Displaying 20 results from an estimated 100 matches similar to: "Serializing many small objects efficiently"

2007 Aug 23
3
RData File Specification?
Hi, I am developing a tool for converting a large data frame stored in an uncompressed binary (XDR) RData file to a delimited text file. The data frame is too large to load() and extract rows from on a typical PC. I'm looking to parse through the file and extract individual entries without loading the whole thing into memory. In terms of some C source functions, instead of doing
2008 Jul 31
2
C versions of serialize/unserialize in packages
Are the functions 'R_Unserialize' and 'R_InitFileInPStream' allowed to be used in R packages? I guess I'm just not clear on the implications of this comment in 'Rinternals.h': /* The connection interface is not yet available to packages. To allow limited use of connection pointers this defines the opaque pointer type. */ I have a function in the
2011 Dec 06
1
unserialize and eager execution
Hi, While debugging a network server I'm developing I noticed something unusual - call to unserialize() resulted in an error about loading a namespace. I was a bit taken back by this - why should unserializing an object cause a namespace lookup? Are there any other side-effects of unserialize() that I should be cautious about? I've been digging through the R_Unserialize() call, I
2015 Mar 17
2
Reduce memory peak when serializing to raw vectors
Hi, I've been doing some tests using serialize() to a raw vector: df <- data.frame(runif(50e6,1,10)) ser <- serialize(df,NULL) In this example the data frame and the serialized raw vector occupy ~400MB each, for a total of ~800M. However the memory peak during serialize() is ~1.2GB: $ cat /proc/15155/status |grep Vm ... VmHWM: 1207792 kB VmRSS: 817272 kB We work with very
2009 Mar 31
1
external equiv to R_serialize()?
I'm trying to efficiently allow conversion of R objects to PostgreSQL bytea (raw binary) datatype within PL/R for persistent storage in Postgres tables. I have found R_serialize() which looks like what I need, -- e.g. R_serialize(object, NULL, FALSE, NULL) -- except that it is marked attribute_hidden. Is there some other externally available interface that I can use? Thanks, Joe
2015 Mar 17
2
Reduce memory peak when serializing to raw vectors
Presumably one could stream over the data twice, the first to get the size, without storing the data. Slower but more memory efficient, unless I'm missing something. Michael On Tue, Mar 17, 2015 at 2:03 PM, Simon Urbanek <simon.urbanek at r-project.org> wrote: > Jorge, > > what you propose is not possible because the size of the output is > unknown, that's why a
2012 Apr 30
2
fast version of split.data.frame or conversion from data.frame to list of its rows
Hi, I was wondering if there is anything more efficient than split to do the kind of conversion in the subject. If I create a data frame as in system.time({fd = data.frame(x=1:2000, y = rnorm(2000), id = paste("x", 1:2000, sep =""))}) user system elapsed 0.004 0.000 0.004 and then I try to split it > system.time(split(fd, 1:nrow(fd))) user system elapsed
2012 Oct 11
2
bug tracker broken
Hi, I get a 404 page not found on the root. There is not webmaster link on r-project.org that I can see. Whom should I contact? Thanks Antonio PS: Yes I was trying to report my first bug. It's a conspiracy with p < 0.01. [[alternative HTML version deleted]]
2009 Nov 07
1
getConnection, R_outpstream_st
Hello, I'm trying to use the limited connections api defined in Rinternals.h. I have code that looks like this (inspired from do_serializeToConn) : SEXP serialize_to_connection( SEXP xp, SEXP connection ){ Rconnection con ; struct R_outpstream_st out; R_pstream_format_t type = R_pstream_binary_format ; SEXP (*hook)(SEXP, SEXP) = NULL ; con = getConnection(Rf_asInteger(connection));
2006 Feb 23
1
Utilizing the internet module
Hello all, I'd like to utilize the R_Sock* functions from R_ext/R-ftp-http.h in my R package. The intent is to use these in conjunction with R_serialize() to store R objects in a remote data store. I'm aware that version 2.2.1 of "Writing R extensions" explains that these may be undocumented and unstable, but I have a couple of questions: 1) are they platform independent?
2012 Nov 20
1
Buffer overflow in date package
Dear list-members, I have observed quite a strange problem with the date package. You will find below what I get on my machine (Ubuntu). I have been able to reproduce the error on Red Hat too. But it seems not to happen on Windows (and on some other Linux distros?). > require(date) Loading required package: date > sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-pc-linux-gnu
2007 Jun 24
1
There was a problem by the use of snow.
problem of the very large memory require by the Sign extension. --- R-2.5.0.orig/src/main/serialize.c 2007-03-27 01:42:08.000000000 +0900 +++ R-2.5.0/src/main/serialize.c 2007-06-25 00:48:58.000000000 +0900 @@ -1866,7 +1866,7 @@ static void resize_buffer(membuf_t mb, int needed) { - int newsize = 2 * needed; + size_t newsize = 2 * needed; mb->buf = realloc(mb->buf,
2006 Oct 02
0
2.3.1: interacting bugs in load() and gzfile() (PR#9271)
Hello, If repeated calls are made to save() using the same pre-opened gzfile connection to a file, and then the connection is closed, the objects saved by the second and subsequent calls are not correctly restored by repeated calls to load() with a new gzfile connection to the same file. What follows are a session exposing the bugs, analysis (see ANALYSIS), patches (see PATCHES), and a session
2015 Mar 17
0
Reduce memory peak when serializing to raw vectors
Jorge, what you propose is not possible because the size of the output is unknown, that's why a dynamically growing PStream buffer is used - it cannot be pre-allocated. Cheers, Simon > On Mar 17, 2015, at 1:37 PM, Martinez de Salinas, Jorge <jorge.martinez-de-salinas at hp.com> wrote: > > Hi, > > I've been doing some tests using serialize() to a raw vector: >
2015 Mar 17
0
Reduce memory peak when serializing to raw vectors
Hi, I've been doing some tests using serialize() to a raw vector: df <- data.frame(runif(50e6,1,10)) ser <- serialize(df,NULL) In this example the data frame and the serialized raw vector occupy ~400MB each, for a total of ~800M. However the memory peak during serialize() is ~1.2GB: $ cat /proc/15155/status |grep Vm ... VmHWM: 1207792 kB VmRSS: 817272 kB We work with very
2015 Mar 17
0
Reduce memory peak when serializing to raw vectors
In principle, yes (that's what Rserve serialization does), but AFAIR we don't have the infrastructure in place for that. But then you may as well serialize to a connection instead. To be honest I don't see why you would serialize anything big to a vector - you can't really do anything useful with that ... (what you couldn't do with the streaming version). Sent from my iPhone
2007 Nov 02
0
applying duplicated, unique and match to lists?
Dear R developers, While improving duplicated.array() and friends and developing equivalents for the new ff package for large datasets I came across two questions: 1) is it safe to use duplicated.default(), unique.default() and match() on arbitrary lists? If so, we can speed up duplicated.array and friends considerably by using list() instead of paste(collapse="\r") 2) while
2011 Jun 03
0
Revolutions Blog: May Roundup
I write about R every weekday at the Revolutions blog: http://blog.revolutionanalytics.com and every month I post a summary of articles from the previous month of particular interest to readers of r-help. In case you missed them, here are some articles related to R from the month of May: A review of "R Cookbook", a new how-to book for R programmers: http://bit.ly/j4e9Lg A detailed
2011 Nov 09
0
Revolutions Blog: October Roundup
I write about R every weekday at the Revolutions blog: http://blog.revolutionanalytics.com and every month I post a summary of articles from the previous month of particular interest to readers of r-help. In case you missed them, here are some articles related to R from the month of October: The creator of the ggplot2 package, Hadley Wickham, shares details on some forthcoming big-data graphics
2006 Aug 11
0
serializing / deserializing active records with children
What is the best approach to serialize / deserialize full blown active record objects with multiple child objects (one-to-many relations). YAML::load(), YAML::dump() almost work :) -> loaded object has children (I see them in breakpoint session: "puts parent"), but when I am accessing children (e.g: "puts parent.emails") array becomes cleared - I guess rails are trying to