similar to: practical to loop over 2million rows?

Displaying 20 results from an estimated 60000 matches similar to: "practical to loop over 2million rows?"

2006 Jan 03
2
For loop gets exponentially slower as dataset gets larger...
I am running R 2.1.1 in a Microsoft Windows XP environment. I have a matrix with three vectors (“columns”) and ~2 million “rows”. The three vectors are date_, id, and price. The data is ordered (sorted) by code and date_. (The matrix contains daily prices for several thousand stocks, and has ~2 million “rows”. If a stock did not trade on a particular date, its price is set to “NA”)
2012 Oct 22
2
Assigning values to several consecutives rows in a sequence while leaving some empty
Hello all, I'm trying to group several consecutives rows (and assigning them the same value) while leaving some of the rows empty (when a certain condition is not fulfilled). My data are locations (xy coordinates), the date/time at which they were measured, and the time span between measures. Somehow simplified, they look like this:
2012 Aug 24
6
updating elements of a vector sequentially - is there a faster way?
I would like to know whether there is a faster way to do the below operation (updating vec1). My objective is to update the elements of a vector (vec1), where a particular element i is dependent on the previous one. I need to do this on vectors that are 1 million or longer and need to repeat that process several hundred times. The for loop works but is slow. If there is a faster way, please let
2010 Aug 23
2
Memory Issue
Dear All, I have an issue on memory use in R programming. Here is the brief story: I want to simulate the power of a nonparameteric test and compare it with the existing tests. The basic steps are 1. I need to use Newton method to obtain the nonparametric MLE that involves the inversion of a large matrix (n-by-n matrix, it takes about less than 3 seconds in average to get the MLE. n = sample
2012 May 16
2
Help needed for efficient way to loop through rows and columns
Dear R-helpers: I am trying to write a script that iterates through a dataframe that looks like this: Example dataset called "sample": names <- c("S1", "S2", "S3", "S4") X <- c("BB", "AB", "AB", "AA") Y <- c("BB", "BB", "AB", "AA") Z <- c("BB",
2018 May 03
2
Proposed speedup of ifelse
> I propose a patch to ifelse that leverages anyNA(test) to achieve an > improvement in performance. For a test vector of length 10, the change > nearly halves the time taken and for a test of length 1 million, there > is a tenfold increase in speed. Even for small vectors, the > distributions of timings between the old and the proposed ifelse do > not intersect. For smaller
2013 Jan 08
1
incrementation within ifelse
Dear R-helper, I am working on a very large data frame and I am trying to add a new column and write in it with certain conditions. I have try to use this code with the data frame p : ID = 0 p[,"newColumn"]<- ifelse (p$flagFoehn3_durr == 1, ifelse(p$Guetsch == 0, ID <<- ID ++ , ID ) , 0 ) What I am trying to do
2018 May 03
1
Proposed speedup of ifelse
I propose a patch to ifelse that leverages anyNA(test) to achieve an improvement in performance. For a test vector of length 10, the change nearly halves the time taken and for a test of length 1 million, there is a tenfold increase in speed. Even for small vectors, the distributions of timings between the old and the proposed ifelse do not intersect. The patch does not intend to change the
2011 Aug 07
3
Printing data frame with million rows
Dear all, I was working on number of files and at the end I got a data frame with approx. million rows.To prin this data frame in output, I used capture.output(print.data.frame(end,row.names=F), file = "summary", append = FALSE) where end is the name of my data frame and summary is the name of my output file. but when I checked the output there were only 10000 rows and at the last it
2011 Jul 15
2
plot a vertical column of colored rectangles
Hi, I've been really struggling with this. If I have a vector like dat <- c(0,0,0,0,1,1,1,0,0,0,1,1,0,0,0,1,0,0,0) I want to plot each element as a colored rectangle (red=1, blue=1) in the right order, so they all stack up forming a vertical column on the graph. Sort of like a building, with each floor in the appropriate color. Any ideas? I've tried using ggplot and geom_tile, but my
2012 Feb 05
4
nested if else statements
I have a vector of 2,1,0 I want to change to 0,1,2 respectively (the data is allele dosages) I have tried multiple nested if/else statements and looked at the ?if help and cannot work out what is wrong, other people have posted code which is identical and they state works. Any help would be greatly appreciated. > A[1:20] [1] 1 1 0 0 1 0 1 0 1 0 0 0 1 1 0 1 1 1 0 0 > B <-
2008 Dec 22
2
How can I avoid nested 'for' loops or quicken the process?
Hi All, I'm still pretty new to using R - and I was hoping I might be able to get some advice as to how to use 'apply' or a similar function instead of using nested for loops. Right now I have a script which uses nested for loops similar to this: i <- 1 for(a in Alpha) { for (b in Beta) { for (c in Gamma) { for (d in Delta) { for (e in Epsilon) { Output[i] <-
2013 Feb 26
2
Converting code to R Question
I'm learning R and am converting some code from SPSS into R. My background is in SAS/SPSS so the vectorization is new to me and I'm trying to learn how to NOT use loops...or use them sparingly. I'm wondering what the most efficient to tackle a problem I'm working on is. Below is an example piece of code. Essentially what it does is set a variable to zero, loop through item
2008 Oct 25
1
Methods for showing statistics over space
Hi, I have a question which is a little off-topic but then again, it should stay in the boundaries of what can be done with available R functions. Has anyone pointers to tutorials or the like where one can get inspiration on how to visualize some "spatial" statistics? I want to analyze different statistics of 60 counties in a country. I have a shape file for those counties thus I can
2010 Feb 05
2
ifelse on a series of rows for multiple criteria
Dear all, I am attempting to perform a calculation which counts the number of positive (or negative) values based on the sample mean (on a per-row basis). If the mean is>0 then only positive values should be counted, and if the mean is <0 then only negative values should be counted. In cases where the mean is equal to zero, the value -99999 should be returned. The following is an example
2010 Apr 14
2
search and replace
I have a dataframe with almost a million rows which has one column with strings. That column has several entries with the words "South", "North", "East" and "West" which I would like to replace with S, N, E, and W, respectively. Obviously, I can use gsub multiple times df $col2 <- gsub("West", "W", df$col2) which will require
2010 Feb 08
2
Counting by rows based on multiple criteria
Dear all, I have a data frame of 6 columns and ~60000 rows which I hope to perform the following calculation on. For each row, I wish to determine whether there are a greater number of positive or negative numbers. Then, if there are more positive numbers in the row, count how many occur - but if there are more negative numbers in the row, count them instead and insert a minus symbol before
2004 Dec 02
6
dropping rows
Hi! Sorry for asking a trivial questions, but I can't seem to figure this out. I have a dataframe called master containing 30-odd variables. In this dataframe, I have observations across these 30 variables from 1930 to 2003 (I've made a "year" variable). How can I drop all rows for which the year is less than 1960? I'm assuming something with ifelse() but I can't
2011 Feb 26
2
how to remove rows in which 2 or more observations are smaller than a given threshold?
Hello The data set I am examining has 7425 observations (rows with unique identifiers) and 46 samples(columns). I have been trying to generate a dataset that filters out observations that are "negligible" The definition of "negligible" is absolute value less or equal to 1.58. The rule that I would like to adopt to create a new data is: drop rows in which 2 or more
2001 Oct 31
2
removing duplicated rows from a data.frame
Dear all, Sorry for the simplicity of the question, but how does one go about removing duplicated rows in a data.frame? I'm looking for a quick and simple solution, as my data.frames are relatively large (50000 by 50). I've racked my brain and searched the help files and found nothing useful or quick, only duplicated() and unique() which work only work on lists. Thanks Gary.