similar to: Significant performance difference between split of a data.frame and split of vectors

Displaying 20 results from an estimated 10000 matches similar to: "Significant performance difference between split of a data.frame and split of vectors"

2010 Feb 17
2
extract the data that match
Hi r-users,   I would like to extract the data that match.  Attached is my data: I'm interested in matchind the value in column 'intg' with value in column 'rand_no' > cbind(z=z,intg=dd,rand_no = rr)             z  intg rand_no    [1,]  0.00 0.000   0.001    [2,]  0.01 0.000   0.002    [3,]  0.02 0.000   0.002    [4,]  0.03 0.000   0.003    [5,]  0.04 0.000   0.003    [6,] 
2010 Aug 26
2
Speeding up transpose
I've looked at how to speed up the transpose function in R (ie, t(X)). The existing code does the work with loops like the following: for (i = 0; i < len; i++) REAL(r)[i] = REAL(a)[(i / ncol) + (i % ncol) * nrow]; It seems a bit optimistic to expect a compiler to produce good code from this. I've re-written these loops as follows: for (i = 0, j = 0; i<len; i +=
2012 Jul 02
1
How to get prediction for a variable in WinBUGS?
Dear all,I am a new user of WinBUGS and need your help. After running the following code, I got parameters of beta0 through beta4 (stats, density), but I don't know how to get the prediction of the last value of h, the variable I set to NA and want to model it using the following code.Does anyone can given me a hint? Any advice would be greatly appreciated.Best
2009 Jul 09
2
Improvement of [dpq]wilcox functions
Hi, I believe I have significantly improved [dpq]wilcox functions by implementing Harding's algorithm: Harding, E.F. (1984): An Efficient, Minimal-storage Procedure for Calculating the Mann-Whitney U, Generalized U and Similar Distributions, App. Statist., 33, 1-6 Results on my computer show (against R-2.9.1): > system.time( dwilcox( 800, 800, 80) ) user system elapsed 0.240
2010 Aug 23
1
Speeding up sum and prod
Looking for more ways to speed up R, I've found that large improvements are possible in the speed of "sum" and "prod" for long real vectors. Here is a little test with R version 2.11.1 on an Intel Linux system > a <- seq(0,1,length=1000) > system.time({for (i in 1:1000000) b <- sum(a)}) user system elapsed 4.800 0.010 4.817 > system.time({for (i
2007 Dec 19
1
strange timings in convolve(x,y,type="open")
Dear R-ophiles, I've found something very odd when I apply convolve to ever larger vectors. Here is an example below with vectors ranging from 2^11 to 2^17. There is a funny bump up at 2^12. Then it gets very slow at 2^16. > for( i in 11:20 )print( system.time(convolve(1:2^i,1:2^i,type="o"))) user system elapsed 0.002 0.000 0.002 user system elapsed 0.373
2010 Jul 15
1
Very slow subsetting by name
Hi, I'm subsetting a named vector using character indices. My vector of indices (or keys) is 10x longer than the vector I'm subsetting. All my keys are distinct and only 10% of them are valid (i.e. match a name of the vector being subsetted). It is surprisingly slow: x1 <- 1:1000 names(x1) <- paste("a", x1, sep="") keys <- sample(c(names(x1),
2008 Apr 17
1
Couldn't (and shouldn't) is.unsorted() be faster?
Hi, Couldn't is.unsorted() bail out immediately here (after comparing the first 2 elements): > x <- 20000000:1 > system.time(is.unsorted(x), gcFirst=TRUE) user system elapsed 0.084 0.040 0.124 > x <- 200000000:1 > system.time(is.unsorted(x), gcFirst=TRUE) user system elapsed 0.772 0.440 1.214 Thanks! H.
2008 Aug 21
1
x[order(x)] vs sort(x)?
Hi I have a question (which may be an obvious one). It is about an idiom which I have seen quite often: o <- order(x); <- x[o] vs. the alternative x <- sort(x) I am just wondering as to the rationale behind the order/reindex idiom vs sorting. Especially as there seems to be a marked performance difference (especially for integer vectors): > x <- trunc(runif(1E6, 1, 100)) >
2008 Jul 08
1
split.Date
Hello, I wanted to suggest that the below method for split.Date be added to the base library to significantly speed up splits with values of class Date. In the below example I show a speed improvement of 175x for 1000 data points. On a vector of size 1e6, the time difference was 22 minutes for split.default versus 0.3 seconds for the split.Date function below (!). Note that this improvement will
2012 Nov 23
0
[LLVMdev] [cfe-dev] costing optimisations
On 23/11/2012, at 5:46 PM, Sean Silva wrote: > Adding LLVMdev, since this is intimately related to the optimization passes. > >> I think this is roughly because some function level optimisations are >> worse than O(N) in the number of instructions. > > Please profile this and mail llvmdev regarding passes with > significantly superlinear behavior (e.g. O(n^2)). My
2012 Apr 30
2
fast version of split.data.frame or conversion from data.frame to list of its rows
Hi, I was wondering if there is anything more efficient than split to do the kind of conversion in the subject. If I create a data frame as in system.time({fd = data.frame(x=1:2000, y = rnorm(2000), id = paste("x", 1:2000, sep =""))}) user system elapsed 0.004 0.000 0.004 and then I try to split it > system.time(split(fd, 1:nrow(fd))) user system elapsed
2008 Aug 29
0
Slow perl on CentOS - ActivePerl as a solution
Hi all, I found out that one of my perl scripts is heavily affected by the current bug. I was lazy to compile anything and I didn't want to mess up my system doing some experiments, so I tried to install ActivePerl as a temporary solution instead (RPMs are available). ActivePerl 5.8 is approximately 73x faster that CentOS version and ActivePerl 5.10 even slightly faster. Both versions and
2009 Nov 19
1
Performance of 'by' and 'ddply' on a large data frame
I've only recently started using R. One of the problems I come up against is after having extracted a large dataset (>5M rows) out of database, I realize I need another variable. In this case I have data frame with dates. I want to find the minimum date for each value of x1 and add that minimum date to my data.frame. > randomdf <- function(p) { data.frame(x1=sample(1:10^4, 10^p,
2010 Nov 17
3
stacking consecutive columns
I have a file, each column of which is a separate year, and each row of each column is mean precipitation for that month. Looks like this (except it goes back to 1964). month X2000 X2001 X2002 X2003 X2004 X2005 X2006 X2007 X2008 X2009 1 1.600 1.010 4.320 2.110 0.925 3.275 3.460 0.675 1.315 2.920 2 2.960 3.905 3.230 2.380 2.720 1.880 2.430 1.380
2016 Oct 01
2
socketSelect(..., timeout): non-integer timeouts in (0, 2) (?) equal infinite timeout on Linux - weird
There's something weird going on for certain non-integer values of argument 'timeout' to base::socketSelect(). For such values, there is no timeout and you effectively end up with an infinite timeout. I can reproduce this on R 3.3.1 on Ubuntu 16.04 and RedHat 6.6, but not on Windows (via Linux Wine). # 1. In R master session > con <- socketConnection('localhost', port
2017 Oct 05
1
socketSelect(..., timeout): non-integer timeouts in (0, 2) (?) equal infinite timeout on Linux - weird
Fixed in 73470 Best, Tomas On 10/05/2017 06:11 AM, Henrik Bengtsson wrote: > I'd like to follow up/bump the attention to this bug causing the > timeout to fail for socketSelect() on Unix. It is still there in R > 3.4.2 and R-devel. I've identified the bug in the R source code - the > bug is due to floating-point precisions and comparison using >=. See > PR17203
2009 Sep 15
2
why is nrow() so slow?
dear R wizards: here is the strange question for the day. It seems to me that nrow() is very slow. Let me explain what I mean: ds= data.frame( NA, x=rnorm(10000) ) ## a sample data set > system.time( { for (i in 1:10000) NA } ) ## doing nothing takes virtually no time user system elapsed 0.000 0.000 0.001 ## this is something that should take time; we need to add 10,000
2008 Nov 19
1
more efficient small subsets from moderate vectors?
This creates a named vector of length nx, then repeatedly draws a single sample from it. lkup <- function(nx, m=10000L) { tbl <- seq_len(nx) names(tbl) <- as.character(tbl) v <- sample(names(tbl), m, replace=TRUE) system.time(for(k in v) tbl[k], gcFirst=TRUE) } There is an abrupt performance degredation at nx=1000 > lkup(1000) user system elapsed 0.180
2008 Mar 10
1
crossprod is slower than t(AA)%*BB
Dear Rdevelopers The background for this email is that I was helping a PhD student to improve the speed of her R code. I suggested to replace calls like t(AA)%*% BB by crossprod(AA,BB) since I expected this to be faster. The surprising result to me was that this change actually made her code slower. > ## Examples : > > AA <- matrix(rnorm(3000*1000),3000,1000) > BB <-