thr3ads.net - similar to: "speeding up functions for large datasets"

Displaying 20 results from an estimated 10000 matches similar to: "speeding up functions for large datasets"

default printer selection based upon computer a user logs into

2009 Apr 24

default printer selection based upon computer a user logs into

We are using roaming profiles. I have a user that logs onto two computers that are in different buildings. Computer 1 is collections w/ default printer HP Laserjet 4000DTN (10.8.9.223) and Computer 2 is salesshop w/ default printer HP Laserjet 4100DTN (10.8.3.31). The user complains that when she logs into salesshop, does her work, logs out, and then logs in at the collections computer,

strange Sys.Date() side effect

2012 Jan 10

strange Sys.Date() side effect

Any ideas what is the problem with this code? > N <- 2; c(Sys.Date(), sprintf('N = %d', N)) [1] "2012-01-10" NA Warning message: In as.POSIXlt.Date(x) : NAs introduced by coercion Best regards, Ryszard Ryszard Czerminski AstraZeneca Pharmaceuticals LP 35 Gatehouse Drive Waltham, MA 02451 USA 781-839-4304 ryszard.czerminski@astrazeneca.com

Spliting columns, strings or reg exp returning substrings

2009 Sep 25

Spliting columns, strings or reg exp returning substrings

Currently as the first column in a data frame I have string values in the format xx_yy - I want to create a new column with just the substring xx (for each row in turn). Three possible ways to do this might be (1) split the string by '_' using strsplit and paste the first of the resulting variables into a new column, but I have been unable to do this for each row of my data frame in turn

bizarre seq() behavior?

2011 Nov 23

bizarre seq() behavior?

Is there any rational explanation for the bizarre seq() behavior below? > seq(2,8.1, lenght.out=3) [1] 2 3 4 5 6 7 8 > help(seq) > seq(2,8,length.out=3) [1] 2 5 8 > seq(2,8.1,length.out=3) [1] 2.00 5.05 8.10 Except maybe that it is early in the morning :) Best regards, Ryszard Ryszard Czerminski AstraZeneca Pharmaceuticals LP 35 Gatehouse Drive Waltham, MA 02451 USA 781-839-4304

strsplit() does not split on "."?

2012 Jan 12

strsplit() does not split on "."?

Any ideas what is wrong? > strsplit("a.b", ".") # generates empty strings with split="." [[1]] [1] "" "" "" > strsplit("a b", " ") # seems to work fine with split=" ", and other characters... [[1]] [1] "a" "b" > > R.Version() $platform [1]

zero inflated poisson and censored-continuous models

2001 Feb 07

zero inflated poisson and censored-continuous models

I wonder if there is a package that will estimate a Zero Inflated Poisson Model (ZIP), and also if there is a package that will estimate what is called the Tobit model: that is a combination of censored and observed values in the same sample. Georgina Bermann Biostatistics AstraZeneca R&D M?lndal -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing

Merging tables

2009 Jan 20

Merging tables

I am relatively new to R and am trying to do some basic data manipulation. Basically I have a table (csv - table 1) of data for a set of samples (rows), and a second table (table 2) of information about a subset of samples of particular interest. I want to pull out the data from table 1 for the samples in table 2, either by: * Merging the two tables based on a common identifier (SampleID - may

smooth contour lines

2010 Sep 27

smooth contour lines

Is there an easy way to control smoothness of the contour lines? In the plot I am working on due to the undersampling the contour lines I am getting are jugged, but it is clear "by eye" these should be basically straight lines. In maps package I found smooth.map function, but maybe there is a more generic way of accomplishing the same thing. Ideally there would be an option to control

How to write efficient R code

2004 Feb 17

How to write efficient R code

I have been lurking in this list a while and searching in the archives to find out how one learns to write fast R code. One solution seems to be to write part of the code not in R but in C. However after finding a benchmark article (http://www.sciviews.org/other/benchmark.htm) I have been more interested in making the R code itself more efficient. I would like to find more info about this. I have

Error in predict.randomForest ... subscript out of bounds with NULL name in X

2012 Jan 25

Error in predict.randomForest ... subscript out of bounds with NULL name in X

RF trains fine with X, but fails on prediction > library(randomForest) > chirps <- c(20,16.0,19.8,18.4,17.1,15.5,14.7,17.1,15.4,16.2,15,17.2,16,17,14.1) > temp <- c(88.6,71.6,93.3,84.3,80.6,75.2,69.7,82,69.4,83.3,78.6,82.6,80.6,83.5,76 .3) > X <- cbind(1,chirps) > rf <- randomForest(X, temp) > yp <- predict(rf, X) Error in predict.randomForest(rf, X) : subscript

SAMBA and Win2000 SP3

2002 Oct 23

SAMBA and Win2000 SP3

We are presenty using SAMBA 2.2 w. Windows 2000 sp1 and will be upgrading to Windows 2000 sp3. Are there any known or suspected problems with the combination of Windows 2000 sp3 and SAMBA 2.2. We are using Solaris 7 on the Unix side. /ola Ola Engstr?m Technical Computing & Information Services AstraZeneca R&D M?lndal S-431 83 M?lndal Sweden

aggregating strings

2009 Jul 28

aggregating strings

I am currently summarising a data set by collapsing data based on common identifiers in a column. I am using the 'aggregate' function to summarise numeric columns, i.e. "aggregate(dat[,3], list(dat$gene), mean)". I also wish to summarise text columns e.g. by concatenating values in a comma separated list, but the aggregate function can only return scalar values and so something

Re: [R] Unexpected behaviour of identical (PR#6799)

2004 Apr 20

Re: [R] Unexpected behaviour of identical (PR#6799)

"Swinton, Jonathan" <Jonathan.Swinton@astrazeneca.com> writes: > # works as expected > > ac <- c('A','B'); > > identical(ac,ac[1:2]) > [1] TRUE > > #but > > af <- factor(ac) > > identical(af,af[1:2]) > [1] FALSE > > Any opinions? Did a cross-check with Splus and it doesn't do that , so I think it

How to write an S4 method for sum or a Summary generic

2004 Apr 19

How to write an S4 method for sum or a Summary generic

If I have a class Foo, then i can write an S3 method for sum for it: >setClass("Foo",representation(a="integer"));aFoo=new("Foo",a=c(1:3,NA)) >sum.Foo <- function(x,na.rm){print(x);print(na.rm);sum(x at a,na.rm=na.rm)} >sum(aFoo) But how do I write an S4 method for this? All my attempts to do so have foundered. For example

Speeding Up Rsync for Large File Sets

2012 Nov 30

Speeding Up Rsync for Large File Sets

We have a particular file system that we're trying to keep in sync between two FreeBSD/ZFS servers using Rsync. The file system has many millions of files, and about 4TB of data total. Rsync takes HOURS to run, even when there are no files to transfer. Just the comparison itself takes hours. Is there any way to speed up the transfer? The command line I'm using is:

standard errors of fitted values are different S-plus survival pa ckage and R

2001 Apr 02

standard errors of fitted values are different S-plus survival pa ckage and R

Perhaps this question has been asked before: but using the function predict( fit,type="terms",se.fit=T), where fit is a coxph object in S-plus, the estimated standard errors are different. It may be different estimators of the variance of the residuals? Which one is the default in R, I don't find that too easily in the documentation. Does anybody know? I'll be very grateful

randomForest: too many elements specified?

2011 Jan 20

randomForest: too many elements specified?

I getting "Error in matrix(0, n, n) : too many elements specified" while building randomForest model, which looks like memory allocation error. Software versions are: randomForest 4.5-25, R version 2.7.1 Dataset is big (~90K rows, ~200 columns), but this is on a big machine ( ~120G RAM) and I call randomForest like this: randomForest(x,y) i.e. in supervised mode and not requesting

Speeding up resampling of rows from a large matrix

2007 May 25

Speeding up resampling of rows from a large matrix

I'm trying to: Resample with replacement pairs of distinct rows from a 120 x 65,000 matrix H of 0's and 1's. For each resampled pair sum the resulting 2 x 65,000 matrix by column: 0 1 0 1 ... + 0 0 1 1 ... _______ = 0 1 1 2 ... For each column accumulate the number of 0's, 1's and 2's over the resamples to obtain a 3 x 65,000 matrix G. For those

confidence intervals for linear combinations when using lme

2004 Jul 23

confidence intervals for linear combinations when using lme

Hi I really hope someone can help me. I have just started to work with S-plus, and have not yet understood how it really works. I am now trying to fit a mixed effects model with lme. My goal is to compare four different groups, at several different time points, and I therefore would like to create confidence intervals for linear combinations of my estimated parameters (as I usually do with

Competing risk regression with CRR slow on large datasets?

2011 Jul 20

Competing risk regression with CRR slow on large datasets?

Hi, I posted this question on stats.stackexchange.com 3 days ago but the answer didn't really address my question concerning the speed in competing risk regression. I hope you don't mind me asking it in this forum: I?m doing a registry based study with almost 200 000 observations and I want to perform a competing risk analysis. My problem is that the crr() in the cmprsk package is

similar to: speeding up functions for large datasets