MarcelK
2008-Nov-06 20:05 UTC
[Rd] .C(..., DUP=FALSE) memory costs depending on input size?
Hello, I'm trying to create my own C code for use within R. While optimizing the code I've noticed that even while only using pointers to get my data to C the time needed still depends on data (vector) size. To test this, I've created an empty C function to which I've send vectors containing various sizes of elements. The time needed for each call is measured and plotted. I would expect a flat line (a little above y=0) since the only thing send are pointers. What I do not expect is to see a linear climbing line when the vector size increases. Initializing the vectors isn't being measured, only the '.C' call to an empty C function, see below. Is there anything I'm missing that can explain this input-size dependent latency? The only reason I can think of is that these vectors are being copied along the way. What follows is both the R and C code which I use only for testing and a plot of both measurements with DUP=TRUE and DUP=FALSE: (RED: DUP=FALSE, GREEN: DUP=TRUE) http://www.nabble.com/file/p20368695/CandR.png R code: ---------- # sequence from 512 to 2^23 with 2^17 stepsize a <- seq(512, 2^23, 2^17) # storage for wall time h <- length(a); j <- length(a) for (i in 1:length(a)) { x <- as.double(1:a[i]) y <- as.double(x) # system.time()[3] is (actual) wall time h[i] <- system.time(.C("commTest", x, y, DUP=FALSE))[3] j[i] <- system.time(.C("commTest", x, y, DUP=TRUE))[3] x <- 0 y <- 0 } # plot: plot(a, h, type="l", col="red", xlab="Vector Size -->", ylab="Time in Seconds -->"); lines(a, j, col="green") C code: ----------- #include<R.h> extern "C" { void commTest(double* a, double* b); } /* * Empty function * Just testing communication costs between R --> C */ void commTest(double* a, double* b) { /* Do ab-so-lute-ly-nothing.. */ } System Details: --------------------- Linux gpu 2.6.18-6-amd64 #1 SMP Thu May 8 06:49:39 UTC 2008 x86_64 GNU/Linux R version 2.7.1 (2008-06-23) -- View this message in context: http://www.nabble.com/.C%28...%2C-DUP%3DFALSE%29-memory-costs-depending-on-input-size--tp20368695p20368695.html Sent from the R devel mailing list archive at Nabble.com.
MarcelK
2008-Nov-06 20:09 UTC
[Rd] .C(..., DUP=FALSE) memory costs depending on input size?
Sorry for spamming, legend with the plot is wrong: RED: DUP = TRUE GREEN: DUP = FALSE Pretty clear from the plot itself, but it's both wrong in the plot header and in the plot code (just swap 'h' and 'j'). -- View this message in context: http://www.nabble.com/.C%28...%2C-DUP%3DFALSE%29-memory-costs-depending-on-input-size--tp20368695p20368753.html Sent from the R devel mailing list archive at Nabble.com.
William Dunlap
2008-Nov-06 20:12 UTC
[Rd] .C(..., DUP=FALSE) memory costs depending on input size?
> -----Original Message----- > From: r-devel-bounces at r-project.org > [mailto:r-devel-bounces at r-project.org] On Behalf Of MarcelK > Sent: Thursday, November 06, 2008 12:06 PM > To: r-devel at r-project.org > Subject: [Rd] .C(..., DUP=FALSE) memory costs depending on input size? > > > Hello, > > I'm trying to create my own C code for use within R. While > optimizing the code I've noticed that even while only using > pointers to get my data to C the time needed still depends on > data (vector) size.Does using NAOK=TRUE in the .C() help? That would avoid an NA-scan of the input vectors. Bill Dunlap TIBCO Spotfire Inc wdunlap tibco.com
Jeff Ryan
2008-Nov-06 22:28 UTC
[Rd] .C(..., DUP=FALSE) memory costs depending on input size?
Marcel, If you are writing the C code from scratch, take a look at either .Call or .External, as both make no copies of the input objects, and require no explicit conversion to the underlying storage type (numeric/integer/etc) within the function call. An even greater benefit is that you will also have access to the actual R objects within C. Jeff On Thu, Nov 6, 2008 at 2:05 PM, MarcelK <m_kempenaar at planet.nl> wrote:> > Hello, > > I'm trying to create my own C code for use within R. While optimizing the > code I've noticed that even while only using pointers to get my data to C > the time needed still depends on data (vector) size. > > To test this, I've created an empty C function to which I've send vectors > containing various sizes of elements. The time needed for each call is > measured and plotted. I would expect a flat line (a little above y=0) since > the only thing send are pointers. What I do not expect is to see a linear > climbing line when the vector size increases. Initializing the vectors isn't > being measured, only the '.C' call to an empty C function, see below. > > Is there anything I'm missing that can explain this input-size dependent > latency? The only reason I can think of is that these vectors are being > copied along the way. > > What follows is both the R and C code which I use only for testing and a > plot of both measurements with DUP=TRUE and DUP=FALSE: > > (RED: DUP=FALSE, GREEN: DUP=TRUE) > http://www.nabble.com/file/p20368695/CandR.png > > > R code: > ---------- > # sequence from 512 to 2^23 with 2^17 stepsize > a <- seq(512, 2^23, 2^17) > # storage for wall time > h <- length(a); j <- length(a) > for (i in 1:length(a)) { > x <- as.double(1:a[i]) > y <- as.double(x) > # system.time()[3] is (actual) wall time > h[i] <- system.time(.C("commTest", x, y, DUP=FALSE))[3] > j[i] <- system.time(.C("commTest", x, y, DUP=TRUE))[3] > x <- 0 > y <- 0 > } > # plot: > plot(a, h, type="l", col="red", xlab="Vector Size -->", ylab="Time in > Seconds -->"); lines(a, j, col="green") > > > C code: > ----------- > #include<R.h> > extern "C" { > void commTest(double* a, double* b); > } > > /* > * Empty function > * Just testing communication costs between R --> C > */ > void commTest(double* a, double* b) { > /* Do ab-so-lute-ly-nothing.. */ > } > > System Details: > --------------------- > Linux gpu 2.6.18-6-amd64 #1 SMP Thu May 8 06:49:39 UTC 2008 x86_64 GNU/Linux > R version 2.7.1 (2008-06-23) > -- > View this message in context: http://www.nabble.com/.C%28...%2C-DUP%3DFALSE%29-memory-costs-depending-on-input-size--tp20368695p20368695.html > Sent from the R devel mailing list archive at Nabble.com. > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Jeffrey Ryan jeffrey.ryan at insightalgo.com ia: insight algorithmics www.insightalgo.com