thr3ads.net - similar to: "practical to loop over 2million rows?"

Displaying 20 results from an estimated 60000 matches similar to: "practical to loop over 2million rows?"

For loop gets exponentially slower as dataset gets larger...

2006 Jan 03

For loop gets exponentially slower as dataset gets larger...

I am running R 2.1.1 in a Microsoft Windows XP environment. I have a matrix with three vectors (“columns”) and ~2 million “rows”. The three vectors are date_, id, and price. The data is ordered (sorted) by code and date_. (The matrix contains daily prices for several thousand stocks, and has ~2 million “rows”. If a stock did not trade on a particular date, its price is set to “NA”)

Assigning values to several consecutives rows in a sequence while leaving some empty

2012 Oct 22

Assigning values to several consecutives rows in a sequence while leaving some empty

Hello all, I'm trying to group several consecutives rows (and assigning them the same value) while leaving some of the rows empty (when a certain condition is not fulfilled). My data are locations (xy coordinates), the date/time at which they were measured, and the time span between measures. Somehow simplified, they look like this:

updating elements of a vector sequentially - is there a faster way?

2012 Aug 24

updating elements of a vector sequentially - is there a faster way?

I would like to know whether there is a faster way to do the below operation (updating vec1). My objective is to update the elements of a vector (vec1), where a particular element i is dependent on the previous one. I need to do this on vectors that are 1 million or longer and need to repeat that process several hundred times. The for loop works but is slow. If there is a faster way, please let

Memory Issue

2010 Aug 23

Memory Issue

Dear All, I have an issue on memory use in R programming. Here is the brief story: I want to simulate the power of a nonparameteric test and compare it with the existing tests. The basic steps are 1. I need to use Newton method to obtain the nonparametric MLE that involves the inversion of a large matrix (n-by-n matrix, it takes about less than 3 seconds in average to get the MLE. n = sample

Help needed for efficient way to loop through rows and columns

2012 May 16

Help needed for efficient way to loop through rows and columns

Dear R-helpers: I am trying to write a script that iterates through a dataframe that looks like this: Example dataset called "sample": names <- c("S1", "S2", "S3", "S4") X <- c("BB", "AB", "AB", "AA") Y <- c("BB", "BB", "AB", "AA") Z <- c("BB",

Proposed speedup of ifelse

2018 May 03

Proposed speedup of ifelse

> I propose a patch to ifelse that leverages anyNA(test) to achieve an > improvement in performance. For a test vector of length 10, the change > nearly halves the time taken and for a test of length 1 million, there > is a tenfold increase in speed. Even for small vectors, the > distributions of timings between the old and the proposed ifelse do > not intersect. For smaller

incrementation within ifelse

2013 Jan 08

incrementation within ifelse

Dear R-helper, I am working on a very large data frame and I am trying to add a new column and write in it with certain conditions. I have try to use this code with the data frame p : ID = 0 p[,"newColumn"]<- ifelse (p$flagFoehn3_durr == 1, ifelse(p$Guetsch == 0, ID <<- ID ++ , ID ) , 0 ) What I am trying to do

Proposed speedup of ifelse

2018 May 03

Proposed speedup of ifelse

I propose a patch to ifelse that leverages anyNA(test) to achieve an improvement in performance. For a test vector of length 10, the change nearly halves the time taken and for a test of length 1 million, there is a tenfold increase in speed. Even for small vectors, the distributions of timings between the old and the proposed ifelse do not intersect. The patch does not intend to change the

Printing data frame with million rows

2011 Aug 07

Printing data frame with million rows

Dear all, I was working on number of files and at the end I got a data frame with approx. million rows.To prin this data frame in output, I used capture.output(print.data.frame(end,row.names=F), file = "summary", append = FALSE) where end is the name of my data frame and summary is the name of my output file. but when I checked the output there were only 10000 rows and at the last it

plot a vertical column of colored rectangles

2011 Jul 15

plot a vertical column of colored rectangles

Hi, I've been really struggling with this. If I have a vector like dat <- c(0,0,0,0,1,1,1,0,0,0,1,1,0,0,0,1,0,0,0) I want to plot each element as a colored rectangle (red=1, blue=1) in the right order, so they all stack up forming a vertical column on the graph. Sort of like a building, with each floor in the appropriate color. Any ideas? I've tried using ggplot and geom_tile, but my

nested if else statements

2012 Feb 05

nested if else statements

I have a vector of 2,1,0 I want to change to 0,1,2 respectively (the data is allele dosages) I have tried multiple nested if/else statements and looked at the ?if help and cannot work out what is wrong, other people have posted code which is identical and they state works. Any help would be greatly appreciated. > A[1:20] [1] 1 1 0 0 1 0 1 0 1 0 0 0 1 1 0 1 1 1 0 0 > B <-

How can I avoid nested 'for' loops or quicken the process?

2008 Dec 22

How can I avoid nested 'for' loops or quicken the process?

Hi All, I'm still pretty new to using R - and I was hoping I might be able to get some advice as to how to use 'apply' or a similar function instead of using nested for loops. Right now I have a script which uses nested for loops similar to this: i <- 1 for(a in Alpha) { for (b in Beta) { for (c in Gamma) { for (d in Delta) { for (e in Epsilon) { Output[i] <-

Converting code to R Question

2013 Feb 26

Converting code to R Question

I'm learning R and am converting some code from SPSS into R. My background is in SAS/SPSS so the vectorization is new to me and I'm trying to learn how to NOT use loops...or use them sparingly. I'm wondering what the most efficient to tackle a problem I'm working on is. Below is an example piece of code. Essentially what it does is set a variable to zero, loop through item

Methods for showing statistics over space

2008 Oct 25

Methods for showing statistics over space

Hi, I have a question which is a little off-topic but then again, it should stay in the boundaries of what can be done with available R functions. Has anyone pointers to tutorials or the like where one can get inspiration on how to visualize some "spatial" statistics? I want to analyze different statistics of 60 counties in a country. I have a shape file for those counties thus I can

ifelse on a series of rows for multiple criteria

2010 Feb 05

ifelse on a series of rows for multiple criteria

Dear all, I am attempting to perform a calculation which counts the number of positive (or negative) values based on the sample mean (on a per-row basis). If the mean is>0 then only positive values should be counted, and if the mean is <0 then only negative values should be counted. In cases where the mean is equal to zero, the value -99999 should be returned. The following is an example

search and replace

2010 Apr 14

search and replace

I have a dataframe with almost a million rows which has one column with strings. That column has several entries with the words "South", "North", "East" and "West" which I would like to replace with S, N, E, and W, respectively. Obviously, I can use gsub multiple times df $col2 <- gsub("West", "W", df$col2) which will require

Counting by rows based on multiple criteria

2010 Feb 08

Counting by rows based on multiple criteria

Dear all, I have a data frame of 6 columns and ~60000 rows which I hope to perform the following calculation on. For each row, I wish to determine whether there are a greater number of positive or negative numbers. Then, if there are more positive numbers in the row, count how many occur - but if there are more negative numbers in the row, count them instead and insert a minus symbol before

dropping rows

2004 Dec 02

dropping rows

Hi! Sorry for asking a trivial questions, but I can't seem to figure this out. I have a dataframe called master containing 30-odd variables. In this dataframe, I have observations across these 30 variables from 1930 to 2003 (I've made a "year" variable). How can I drop all rows for which the year is less than 1960? I'm assuming something with ifelse() but I can't

how to remove rows in which 2 or more observations are smaller than a given threshold?

2011 Feb 26

how to remove rows in which 2 or more observations are smaller than a given threshold?

Hello The data set I am examining has 7425 observations (rows with unique identifiers) and 46 samples(columns). I have been trying to generate a dataset that filters out observations that are "negligible" The definition of "negligible" is absolute value less or equal to 1.58. The rule that I would like to adopt to create a new data is: drop rows in which 2 or more

removing duplicated rows from a data.frame

2001 Oct 31

removing duplicated rows from a data.frame

Dear all, Sorry for the simplicity of the question, but how does one go about removing duplicated rows in a data.frame? I'm looking for a quick and simple solution, as my data.frames are relatively large (50000 by 50). I've racked my brain and searched the help files and found nothing useful or quick, only duplicated() and unique() which work only work on lists. Thanks Gary.

similar to: practical to loop over 2million rows?