similar to: which() vs. just logical selection in df

Displaying 20 results from an estimated 200 matches similar to: "which() vs. just logical selection in df"

2020 Oct 14
2
which() vs. just logical selection in df
Hi Dr. Snow, & R-helpers, Thank you for your reply! I hadn't heard of the {microbenchmark} package & was excited to try it! Thank you for the suggestion! I did check the reference source for which() beforehand, which included the statement to remove NAa, and I didn't have any missing values or NAs: sum(is.na(dat$gender2)) sum(is.na(dat$gender)) sum(is.na(dat$y)) [1] 0 [1] 0 [1]
2013 Nov 20
4
How to stop Kaplan-Meier curve at a time point
Hello R users I have a question with Kaplan-Meier Curve with respect to my research. We have done a retrospective study on fillings in the tooth and their survival in relation to the many influencing factors. We had a long follow-up time (upto 8yrs for some variables). However, we decided to stop the analysis at the 6year follow up time, so that we can have uniform follow-up time for all the
2008 Mar 10
1
crossprod is slower than t(AA)%*BB
Dear Rdevelopers The background for this email is that I was helping a PhD student to improve the speed of her R code. I suggested to replace calls like t(AA)%*% BB by crossprod(AA,BB) since I expected this to be faster. The surprising result to me was that this change actually made her code slower. > ## Examples : > > AA <- matrix(rnorm(3000*1000),3000,1000) > BB <-
2008 Apr 17
1
Couldn't (and shouldn't) is.unsorted() be faster?
Hi, Couldn't is.unsorted() bail out immediately here (after comparing the first 2 elements): > x <- 20000000:1 > system.time(is.unsorted(x), gcFirst=TRUE) user system elapsed 0.084 0.040 0.124 > x <- 200000000:1 > system.time(is.unsorted(x), gcFirst=TRUE) user system elapsed 0.772 0.440 1.214 Thanks! H.
2008 Mar 10
2
write.table with row.names=FALSE unnecessarily slow?
write.table with large data frames takes quite a long time > system.time({ + write.table(df, '/tmp/dftest.txt', row.names=FALSE) + }, gcFirst=TRUE) user system elapsed 97.302 1.532 98.837 A reason is because dimnames is always called, causing 'anonymous' row names to be created as character vectors. Avoiding this in src/library/utils, along the lines of Index:
2005 May 04
1
Cost of method dispatching: was: when can we expect Prof Tierney's compiled R?
> -----Original Message----- > From: Prof Brian Ripley [mailto:ripley@stats.ox.ac.uk] > Sent: Wednesday, April 27, 2005 1:13 AM > To: Vadim Ogranovich > Cc: Luke Tierney; r-devel@stat.math.ethz.ch > Subject: Re: [Rd] RE: [R] when can we expect Prof Tierney's > compiled R? > > On Tue, 26 Apr 2005, Vadim Ogranovich wrote: > ... > > The arithmetic shows
2010 Nov 06
1
Hashing and environments
Hi, I'm trying to write a general-purpose "lexicon" class and associated methods for storing and accessing information about large numbers of specific words (e.g., their frequencies in different genres). Crucial to making such a class practically useful is to get hashing working correctly so that information about specific words can be accessed quickly. But I've never really
2020 Oct 14
0
which() vs. just logical selection in df
Inline. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Oct 14, 2020 at 3:23 PM 1/k^c <kchamberln at gmail.com> wrote: Is which() invoking c-level code by chance, making it slightly faster > on average? > You do not need
2005 May 08
3
Light-weight data.frame class: was: how to add method to .Primitive function
Hi, Encouraged by a tip from Simon Urbanek I tried to use the S3 machinery to write a faster version of the data.frame class. This quickly hits a snag: the "[.default"(x, i) for some reason cares about the dimensionality of x. In the end there is a full transcript of my R session. It includes the motivation for writing the class and the problems I have encountered. As a result I see
2008 Feb 04
2
maybe a bug in the system.time() function? (PR#10696)
Full_Name: Alessandra Iacobucci Version: 2.5.1 OS: Mac OS X 10.4.11 Submission from: (NULL) (193.48.71.92) Hi, I am making some intensive simulations for the testing of a Population Monte Carlo algorithm. This involves also a study of the CPU times in two different case. What I am trying to measure is the "real" CPU time, the one which is independent on the %CPU. I'm using the
2008 Nov 19
1
more efficient small subsets from moderate vectors?
This creates a named vector of length nx, then repeatedly draws a single sample from it. lkup <- function(nx, m=10000L) { tbl <- seq_len(nx) names(tbl) <- as.character(tbl) v <- sample(names(tbl), m, replace=TRUE) system.time(for(k in v) tbl[k], gcFirst=TRUE) } There is an abrupt performance degredation at nx=1000 > lkup(1000) user system elapsed 0.180
2008 Jun 28
2
Parallel R
Hello, The problem I'm working now requires to operate on big matrices. I've noticed that there are some packages that allows to run some commands in parallel. I've tried snow and NetWorkSpaces, without much success (they are far more slower that the normal functions) My problem is very simple, it doesn't require any communication between parallel tasks; only that it divides
2005 Aug 05
6
Computing sums of the columns of an array
Hi, I have a 5x731 array A, and I want to compute the sums of the columns. Currently I do: apply(A, 2, sum) But it turns out, this is slow: 70% of my CPU time is spent here, even though there are many complicated steps in my computation. Is there a faster way? Thanks, Martin
2005 Feb 25
3
Loops and dataframes
Hi, I am experiencing a long delay when using dataframes inside loops and was wordering if this is a bug or not. Example code: > st <- rep(1,100000) > ed <- rep(2,100000) > for(i in 1:length(st)) st[i] <- ed[i] # works fine > df <- data.frame(start=st,end=ed) > for(i in 1:dim(df)[1]) df[i,1] <- df[i,2] #takes for ever R: R 2.0.0 (2004-10-04) OS: Linux, Fedora Core 2
2004 Dec 06
6
how to get how many lines there are in a file.
hi all If I wanna get the total number of lines in a big file without reading the file's content into R as matrix or data frame, any methods or functions? thanks in advance. Regards
2005 Apr 15
5
Pearson corelation and p-value for matrix
Hi, I was trying to evaluate the pearson correlation and the p-values for an nxm matrix, where each row represents a vector. One way to do it would be to iterate through each row, and find its correlation value( and the p-value) with respect to the other rows. Is there some function by which I can use the matrix as input? Ideally, the output would be an nxn matrix, containing the p-values
2005 Jan 24
1
Weighted.mean(x,wt) vs. t(x) %*% wt
What is the difference between the above two operations ? [[alternative HTML version deleted]]
2005 Jan 20
2
Creating a custom connection to read from multiple files
Hello, is it possible to create my own connection which I could use with read.table or scan ? I would like to create a connection that would read from multiple files in sequence (like if they were concatenated), possibly with an option to skip first n lines of each file. I would like to avoid using platform specific scripts for that... (currently I invoke "/bin/cat" from R to create a
2004 Nov 23
2
sorting without order
Hello, In order to increase the performance of a script I'd like to sort very large vectors containing repeated integer values. I'm not interesting in having the values sorted, but only grouped. I also need the equivalent of index.return from the standard "sort" function: f(c(10,1,10,100,1,10)) => grouped: c(10,10,10,1,1,100) ix: c(1,3,6,2,5,4) is there a way
2004 Dec 06
0
What is the most useful way to detect nonlinearity in lo
> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of > Ted.Harding at nessie.mcc.ac.uk > Sent: Sunday, December 05, 2004 7:14 PM > To: r-help at stat.math.ethz.ch > Subject: Re: [R] What is the most useful way to detect > nonlinearity in lo > > > On 05-Dec-04 Peter Dalgaard wrote: