thr3ads.net - similar to: "Significant performance difference between split of a data.frame and split of vectors"

Displaying 20 results from an estimated 10000 matches similar to: "Significant performance difference between split of a data.frame and split of vectors"

extract the data that match

2010 Feb 17

extract the data that match

Hi r-users, I would like to extract the data that match. Attached is my data: I'm interested in matchind the value in column 'intg' with value in column 'rand_no' > cbind(z=z,intg=dd,rand_no = rr) z intg rand_no [1,] 0.00 0.000 0.001 [2,] 0.01 0.000 0.002 [3,] 0.02 0.000 0.002 [4,] 0.03 0.000 0.003 [5,] 0.04 0.000 0.003 [6,]

Speeding up transpose

2010 Aug 26

Speeding up transpose

I've looked at how to speed up the transpose function in R (ie, t(X)). The existing code does the work with loops like the following: for (i = 0; i < len; i++) REAL(r)[i] = REAL(a)[(i / ncol) + (i % ncol) * nrow]; It seems a bit optimistic to expect a compiler to produce good code from this. I've re-written these loops as follows: for (i = 0, j = 0; i<len; i +=

How to get prediction for a variable in WinBUGS?

2012 Jul 02

How to get prediction for a variable in WinBUGS?

Dear all,I am a new user of WinBUGS and need your help. After running the following code, I got parameters of beta0 through beta4 (stats, density), but I don't know how to get the prediction of the last value of h, the variable I set to NA and want to model it using the following code.Does anyone can given me a hint? Any advice would be greatly appreciated.Best

Improvement of [dpq]wilcox functions

2009 Jul 09

Improvement of [dpq]wilcox functions

Hi, I believe I have significantly improved [dpq]wilcox functions by implementing Harding's algorithm: Harding, E.F. (1984): An Efficient, Minimal-storage Procedure for Calculating the Mann-Whitney U, Generalized U and Similar Distributions, App. Statist., 33, 1-6 Results on my computer show (against R-2.9.1): > system.time( dwilcox( 800, 800, 80) ) user system elapsed 0.240

Speeding up sum and prod

2010 Aug 23

Speeding up sum and prod

Looking for more ways to speed up R, I've found that large improvements are possible in the speed of "sum" and "prod" for long real vectors. Here is a little test with R version 2.11.1 on an Intel Linux system > a <- seq(0,1,length=1000) > system.time({for (i in 1:1000000) b <- sum(a)}) user system elapsed 4.800 0.010 4.817 > system.time({for (i

strange timings in convolve(x,y,type="open")

2007 Dec 19

strange timings in convolve(x,y,type="open")

Dear R-ophiles, I've found something very odd when I apply convolve to ever larger vectors. Here is an example below with vectors ranging from 2^11 to 2^17. There is a funny bump up at 2^12. Then it gets very slow at 2^16. > for( i in 11:20 )print( system.time(convolve(1:2^i,1:2^i,type="o"))) user system elapsed 0.002 0.000 0.002 user system elapsed 0.373

Very slow subsetting by name

2010 Jul 15

Very slow subsetting by name

Hi, I'm subsetting a named vector using character indices. My vector of indices (or keys) is 10x longer than the vector I'm subsetting. All my keys are distinct and only 10% of them are valid (i.e. match a name of the vector being subsetted). It is surprisingly slow: x1 <- 1:1000 names(x1) <- paste("a", x1, sep="") keys <- sample(c(names(x1),

Couldn't (and shouldn't) is.unsorted() be faster?

2008 Apr 17

Couldn't (and shouldn't) is.unsorted() be faster?

Hi, Couldn't is.unsorted() bail out immediately here (after comparing the first 2 elements): > x <- 20000000:1 > system.time(is.unsorted(x), gcFirst=TRUE) user system elapsed 0.084 0.040 0.124 > x <- 200000000:1 > system.time(is.unsorted(x), gcFirst=TRUE) user system elapsed 0.772 0.440 1.214 Thanks! H.

x[order(x)] vs sort(x)?

2008 Aug 21

x[order(x)] vs sort(x)?

Hi I have a question (which may be an obvious one). It is about an idiom which I have seen quite often: o <- order(x); <- x[o] vs. the alternative x <- sort(x) I am just wondering as to the rationale behind the order/reindex idiom vs sorting. Especially as there seems to be a marked performance difference (especially for integer vectors): > x <- trunc(runif(1E6, 1, 100)) >

split.Date

2008 Jul 08

split.Date

Hello, I wanted to suggest that the below method for split.Date be added to the base library to significantly speed up splits with values of class Date. In the below example I show a speed improvement of 175x for 1000 data points. On a vector of size 1e6, the time difference was 22 minutes for split.default versus 0.3 seconds for the split.Date function below (!). Note that this improvement will

[LLVMdev] [cfe-dev] costing optimisations

2012 Nov 23

[LLVMdev] [cfe-dev] costing optimisations

On 23/11/2012, at 5:46 PM, Sean Silva wrote: > Adding LLVMdev, since this is intimately related to the optimization passes. > >> I think this is roughly because some function level optimisations are >> worse than O(N) in the number of instructions. > > Please profile this and mail llvmdev regarding passes with > significantly superlinear behavior (e.g. O(n^2)). My

fast version of split.data.frame or conversion from data.frame to list of its rows

2012 Apr 30

fast version of split.data.frame or conversion from data.frame to list of its rows

Hi, I was wondering if there is anything more efficient than split to do the kind of conversion in the subject. If I create a data frame as in system.time({fd = data.frame(x=1:2000, y = rnorm(2000), id = paste("x", 1:2000, sep =""))}) user system elapsed 0.004 0.000 0.004 and then I try to split it > system.time(split(fd, 1:nrow(fd))) user system elapsed

Slow perl on CentOS - ActivePerl as a solution

2008 Aug 29

Slow perl on CentOS - ActivePerl as a solution

Hi all, I found out that one of my perl scripts is heavily affected by the current bug. I was lazy to compile anything and I didn't want to mess up my system doing some experiments, so I tried to install ActivePerl as a temporary solution instead (RPMs are available). ActivePerl 5.8 is approximately 73x faster that CentOS version and ActivePerl 5.10 even slightly faster. Both versions and

Performance of 'by' and 'ddply' on a large data frame

2009 Nov 19

Performance of 'by' and 'ddply' on a large data frame

I've only recently started using R. One of the problems I come up against is after having extracted a large dataset (>5M rows) out of database, I realize I need another variable. In this case I have data frame with dates. I want to find the minimum date for each value of x1 and add that minimum date to my data.frame. > randomdf <- function(p) { data.frame(x1=sample(1:10^4, 10^p,

stacking consecutive columns

2010 Nov 17

stacking consecutive columns

I have a file, each column of which is a separate year, and each row of each column is mean precipitation for that month. Looks like this (except it goes back to 1964). month X2000 X2001 X2002 X2003 X2004 X2005 X2006 X2007 X2008 X2009 1 1.600 1.010 4.320 2.110 0.925 3.275 3.460 0.675 1.315 2.920 2 2.960 3.905 3.230 2.380 2.720 1.880 2.430 1.380

socketSelect(..., timeout): non-integer timeouts in (0, 2) (?) equal infinite timeout on Linux - weird

2016 Oct 01

socketSelect(..., timeout): non-integer timeouts in (0, 2) (?) equal infinite timeout on Linux - weird

There's something weird going on for certain non-integer values of argument 'timeout' to base::socketSelect(). For such values, there is no timeout and you effectively end up with an infinite timeout. I can reproduce this on R 3.3.1 on Ubuntu 16.04 and RedHat 6.6, but not on Windows (via Linux Wine). # 1. In R master session > con <- socketConnection('localhost', port

socketSelect(..., timeout): non-integer timeouts in (0, 2) (?) equal infinite timeout on Linux - weird

2017 Oct 05

socketSelect(..., timeout): non-integer timeouts in (0, 2) (?) equal infinite timeout on Linux - weird

Fixed in 73470 Best, Tomas On 10/05/2017 06:11 AM, Henrik Bengtsson wrote: > I'd like to follow up/bump the attention to this bug causing the > timeout to fail for socketSelect() on Unix. It is still there in R > 3.4.2 and R-devel. I've identified the bug in the R source code - the > bug is due to floating-point precisions and comparison using >=. See > PR17203

why is nrow() so slow?

2009 Sep 15

why is nrow() so slow?

dear R wizards: here is the strange question for the day. It seems to me that nrow() is very slow. Let me explain what I mean: ds= data.frame( NA, x=rnorm(10000) ) ## a sample data set > system.time( { for (i in 1:10000) NA } ) ## doing nothing takes virtually no time user system elapsed 0.000 0.000 0.001 ## this is something that should take time; we need to add 10,000

more efficient small subsets from moderate vectors?

2008 Nov 19

more efficient small subsets from moderate vectors?

This creates a named vector of length nx, then repeatedly draws a single sample from it. lkup <- function(nx, m=10000L) { tbl <- seq_len(nx) names(tbl) <- as.character(tbl) v <- sample(names(tbl), m, replace=TRUE) system.time(for(k in v) tbl[k], gcFirst=TRUE) } There is an abrupt performance degredation at nx=1000 > lkup(1000) user system elapsed 0.180

crossprod is slower than t(AA)%*BB

2008 Mar 10

crossprod is slower than t(AA)%*BB

Dear Rdevelopers The background for this email is that I was helping a PhD student to improve the speed of her R code. I suggested to replace calls like t(AA)%*% BB by crossprod(AA,BB) since I expected this to be faster. The surprising result to me was that this change actually made her code slower. > ## Examples : > > AA <- matrix(rnorm(3000*1000),3000,1000) > BB <-

similar to: Significant performance difference between split of a data.frame and split of vectors