similar to: merge( , by='row.names') slowness

Displaying 20 results from an estimated 20000 matches similar to: "merge( , by='row.names') slowness"

2008 Aug 18
2
matrix row product and cumulative product
I spent a lot of time searching and came up empty handed on the following query. Is there an equivalent to rowSums that does product or cumulative product and avoids use of apply or looping? I found a rowProd in a package but it was a convenience function for apply. As part of a likelihood calculation called from optim, I?m computing products and cumulative products of rows of matrices with
2007 Nov 26
2
colnames slow (PR#10470)
Full_Name: Tomas Larsson Version: 2.6.0 OS: Windows XP Submission from: (NULL) (198.208.251.24) This is not a bug, it is a performance issue but I think it should have an easy fix. I have a large matrix (about 2,000,000 by 20), when I type colnames(x) it takes a long time to get the result. However, if I select just the first couple of rows of the matrix I don't have to wait for the
2012 Feb 14
1
Filling out a data frame row by row.... slow!
I'm reading a file and using the file to populate a data frame. The way the file is laid out, I need to fill in the data frame one row at a time. When I start reading my file, I don't know how many rows I will need. It's on the order of a million. Being mindful of the time expense of reallocation, I decided on a strategy of doubling the data frame size every time I needed to expand
2007 Sep 19
3
Row-by-row regression on matrix
Folks, I have a 3000 x 4 matrix (y), which I need to regress row-by-row against a 4-vector (x) to create a matrix lm.y of intercepts and slopes. To illustrate: y <- matrix(rnorm(12000), ncol = 4) x <- c(1/12, 3/12, 6/12, 1) system.time(lm.y <- t(apply(y, 1, function(z) lm(z ~ x)$coefficient))) [1] 44.72 18.00 69.52 NA NA Takes more than a minute to do (and I need to do many
2011 Oct 19
2
Speed difference between df$a[1] and df[1,"a"]
I was surprised to find that df$a[1] is an order of magnitude faster than df[1,"a"]: > df <- data.frame(a=1:10) > system.time(replicate(100000, df$a[3])) user system elapsed 0.36 0.00 0.36 > system.time(replicate(100000, df[3,"a"])) user system elapsed 4.09 0.00 4.09 A priori, I'd have thought that combining the row and column
2024 Apr 08
4
Exceptional slowness with read.csv
Greetings, I have a csv file of 76 fields and about 4 million records. I know that some of the records have errors - unmatched quotes, specifically.? Reading the file with readLines and parsing the lines with read.csv(text = ...) is really slow. I know that the first 2459465 records are good. So I try this: > startTime <- Sys.time() > first_records <- read.csv(file_name, nrows
2010 Jul 30
2
(no subject)
hello, i am new to R and trying to calculate the beta coefficient for standard linear regression for a series of randomly generated numbers. I have created this loop, but it runs really slow, is there a way to improve it? #number of simulations n.k<-999 #create the matrix for regression coefficients generated from #random data beta<-matrix(0,1,n.k+1) e<-matrix(0,tslength,n.k+1) for(k
2008 Mar 10
2
write.table with row.names=FALSE unnecessarily slow?
write.table with large data frames takes quite a long time > system.time({ + write.table(df, '/tmp/dftest.txt', row.names=FALSE) + }, gcFirst=TRUE) user system elapsed 97.302 1.532 98.837 A reason is because dimnames is always called, causing 'anonymous' row names to be created as character vectors. Avoiding this in src/library/utils, along the lines of Index:
2024 Apr 08
1
Exceptional slowness with read.csv
No idea, but have you tried using ?scan to read those next 5 rows? It might give you a better idea of the pathologies that are causing problems. For example, an unmatched quote might result in some huge number of characters trying to be read into a single element of a character variable. As your previous respondent said, resolving such problems can be a challenge. Cheers, Bert On Mon, Apr 8,
2010 Feb 12
1
paired wilcox test on each row of a large dataframe
hI I have to calculate V statistic for each row of a large dataframe (28000). I can not use multtest package for paired wilcox test. I have been using for loop which are. Is there a way to speed the computation with another method like using apply or tapply? My data set looks like this: 11573_MB 11911_MB 11966_MB 12091_MB 12168_MB 12420_MB................ cg00000292
2024 Apr 08
2
Exceptional slowness with read.csv
Hi Dave, That's rather frustrating. I've found vroom (from the package vroom) to be helpful with large files like this. Does the following give you any better luck? vroom(file_name, delim = ",", skip = 2459465, n_max = 5) Of course, when you know you've got errors & the files are big like that it can take a bit of work resolving things. The command line tools awk
2007 Oct 10
2
slow load() in R2.6.0
I'm encountering excruciatingly slow load times for character vectors in R 2.6.0-- up to 30sec for a 15K file that contains a no-attributes character vector of length ~1e4 and object size ~0.5MB. In R 2.5.1, repeated loads of the same set of files are near-instantaneous. The problem is proving tricky to reproduce consistently from scratch, so I have attached the 3 files used in the examples
2024 Apr 08
1
Exceptional slowness with read.csv
data.table's fread is also fast. Not sure about error handling. But I can merge 300 csvs with a total of 0.5m lines and 50 columns in a couple of minutes versus a lifetime with read.csv or readr::read_csv On Mon, 8 Apr 2024, 16:19 Stevie Pederson, <stephen.pederson.au at gmail.com> wrote: > Hi Dave, > > That's rather frustrating. I've found vroom (from the package
2024 Apr 10
2
Exceptional slowness with read.csv
?s 06:47 de 08/04/2024, Dave Dixon escreveu: > Greetings, > > I have a csv file of 76 fields and about 4 million records. I know that > some of the records have errors - unmatched quotes, specifically. > Reading the file with readLines and parsing the lines with read.csv(text > = ...) is really slow. I know that the first 2459465 records are good. > So I try this: >
2007 Mar 02
5
extracting rows from a data frame by looping over the row names: performance issues
Hi, I have a big data frame: > mat <- matrix(rep(paste(letters, collapse=""), 5*300000), ncol=5) > dat <- as.data.frame(mat) and I need to do some computation on each row. Currently I'm doing this: > for (key in row.names(dat)) { row <- dat[key, ]; ... do some computation on row... } which could probably considered a very natural (and R'ish) way of
2024 Apr 08
2
Exceptional slowness with read.csv
I solved the mystery, but not the problem. The problem is that there's an unclosed quote somewhere in those 5 additional records I'm trying to access. So read.csv is reading million-character fields. It's slow at that. That mystery solved. However, the the problem persists: how to fix what is obvious to the naked eye - a quote not adjacent to a comma - but that read.csv can't
2011 May 02
2
Lasso with Categorical Variables
Hi! This is my first time posting. I've read the general rules and guidelines, but please bear with me if I make some fatal error in posting. Anyway, I have a continuous response and 29 predictors made up of continuous variables and nominal and ordinal categorical variables. I'd like to do lasso on these, but I get an error. The way I am using "lars" doesn't allow for the
2024 Apr 10
1
Exceptional slowness with read.csv
That's basically what I did 1. Get text lines using readLines 2. use tryCatch to parse each line using read.csv(text=...) 3. in the catch, use?gregexpr to find any quotes not adjacent to a comma (gregexpr("[^,]\"[^,]",...) 4. escape any quotes found by adding a second quote (using str_sub from stringr) 6. parse the patched text using read.csv(text=...) 7. write out the parsed
2011 Sep 07
1
Very slow assignments
I'm creating an object of a S4 class that has two slots: ListExamples, which is a list, and idx, which is an integer (as the code below). Then, I read a data.frame file with 10000 (ten thousands) of lines and 10 columns, do some pre-processing and, basically, I store each line as an element of a list in the slot ListExamples of the S4 object. However, any kind of assignment operation (<-)
2011 Nov 22
4
Data Frame Search Slow
Hey All, So - I promise to write a blog post on this topic and post it somewhere on the internet once I get to the bottom of this. Basically, the set-up to the problem is like this: 1. I have a data frame with dim (2547290, 4) 2. I need to make SQL like lookups on the dataframe. I have been using the following sort of syntax: a.dataframe[a.dataframe[[column_index]] %in% some_value, ] 3.