thr3ads.net - similar to: "how to get how many lines there are in a file."

Displaying 20 results from an estimated 7000 matches similar to: "how to get how many lines there are in a file."

2013 Oct 04

Tab Separated File Reading Error

Hello, I have a seemingly simple problem that a tab-delimited file can't be read in. > annoTranscripts <- read.table("matched.txt", sep = '\t', stringsAsFactors = FALSE) Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 5933 did not have 12 elements However, all lines do have 12 columns. > lines <-

Computing sums of the columns of an array

2005 Aug 05

Computing sums of the columns of an array

Hi, I have a 5x731 array A, and I want to compute the sums of the columns. Currently I do: apply(A, 2, sum) But it turns out, this is slow: 70% of my CPU time is spent here, even though there are many complicated steps in my computation. Is there a faster way? Thanks, Martin

intervals {nlme} lower CI greater than upper CI !!!????

2011 Feb 08

intervals {nlme} lower CI greater than upper CI !!!????

Hi folks... check this out.. > GLU<-lme(gluc~rt*cd4+sex+age+rf+nadir+pharmac+factor(hcv)+factor(hbs)+ + haartd+hivdur+factor(arv), + random= ~rt|id, na.action=na.omit) > intervals(GLU)$fixed lower est. upper (Intercept) 67.3467070345 7.362307e+01 7.989944e+01 rt *0.0148050160* 6.249304e-02 1.101811e-01 cd4

unexpected behavior from gzfile and unz

2007 Dec 19

unexpected behavior from gzfile and unz

I get unexpected behavior from "readLines()" and "scan()" depending on how the file is opened with "gzfile" or "unz". More specifically: > file <- gzfile("file.gz") > readLines(file,1) [1] "a\tb\tc" > readLines(file,1) [1] "a\tb\tc" > close(file) It seems that the stream is rewound between calls to readLines.

Slow IO: was [R] naive question

2004 Jun 30

Slow IO: was [R] naive question

I believe IO in R is slow because of the way it is implemented, not because it has to do some extra work for the user. I compared scan() with 'what' argument set (which is, AFAIK, is the fastest way to read a CSV file) to an equivalent C code. It turned out to be 20 - 50 times slower. I can see at least two main reasons why R's IO is so slow (I didn't profile this though): A) it

Loops and dataframes

2005 Feb 25

Loops and dataframes

Hi, I am experiencing a long delay when using dataframes inside loops and was wordering if this is a bug or not. Example code: > st <- rep(1,100000) > ed <- rep(2,100000) > for(i in 1:length(st)) st[i] <- ed[i] # works fine > df <- data.frame(start=st,end=ed) > for(i in 1:dim(df)[1]) df[i,1] <- df[i,2] #takes for ever R: R 2.0.0 (2004-10-04) OS: Linux, Fedora Core 2

Reading an upper triangular matrix

2003 Nov 10

Reading an upper triangular matrix

Hola! I have data in the form of a symmetric distance matrix, in the file I have recorded only the upper triangular part, with diagonal. The matrix is 21x21, and the file have row and col names, and some other information. I am trying to read with the following code (I tried many variations on it, but all give the same error). The items in the data file is delimited by white space. (Part

Pearson corelation and p-value for matrix

2005 Apr 15

Pearson corelation and p-value for matrix

Hi, I was trying to evaluate the pearson correlation and the p-values for an nxm matrix, where each row represents a vector. One way to do it would be to iterate through each row, and find its correlation value( and the p-value) with respect to the other rows. Is there some function by which I can use the matrix as input? Ideally, the output would be an nxn matrix, containing the p-values

skip lines on a connection

2004 May 01

skip lines on a connection

Hi, I am looking for an efficient way of skipping big chunks of lines on a connection (not necessarily at the beginning of the file). One way is to use read lines, e.g. readLines(1e6), but a) this incurs the overhead of construction of the return char vector and b) has a (fairly remote) potential to blow up the memory. Another way would be to use scan(), e.g. scan(con, skip=1e6, nmax=0)

crossprod is slower than t(AA)%*BB

2008 Mar 10

crossprod is slower than t(AA)%*BB

Dear Rdevelopers The background for this email is that I was helping a PhD student to improve the speed of her R code. I suggested to replace calls like t(AA)%*% BB by crossprod(AA,BB) since I expected this to be faster. The surprising result to me was that this change actually made her code slower. > ## Examples : > > AA <- matrix(rnorm(3000*1000),3000,1000) > BB <-

Lectura de texto

2014 Mar 12

Lectura de texto

Hola a todos, Me gustaria leer el texto que se encuentra en http://dl.dropboxusercontent.com/u/9601860/txt.txt He intentado txt <- 'http://dl.dropboxusercontent.com/u/9601860/txt.txt' r <- scan(txt) #Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : # invalid multibyte string at '<ff><fe>M' r <- read.table(txt, header = FALSE)

Weighted.mean(x,wt) vs. t(x) %*% wt

2005 Jan 24

Weighted.mean(x,wt) vs. t(x) %*% wt

What is the difference between the above two operations ? [[alternative HTML version deleted]]

random interaction effect in lmer

2011 Feb 06

random interaction effect in lmer

Hi dears while modeling an interaction random effect in lmer i receive the instantaneous error message > ldlM4<-lmer(ldl~rt*cd4+age+rf+pharmac+factor(hcv)+ + hivdur+(rt:cd4|id),na.action=na.omit,REML=F) *Warning message: In mer_finalize(ans) : false convergence (8) * I think the matter lies in syntax, 'cause i sistematically receive the same message even when changing response... PS:

Slow 'read.table' in R 1.4.0 (PR#1232)

2001 Dec 29

Slow 'read.table' in R 1.4.0 (PR#1232)

The 'read.table' function appears to be up to 10X slower in R 1.4.0 than R 1.3.1 for some of the data sets I read in. I was comparing the source code for the 2 versions and see that it was rewritten in R 1.4.0. I think I found out what part of the problem might be. I was comparing R1.3.1 and R1.4.0 code and it appears that a statement is missing in some of the code for R 1.4. This is

sorting without order

2004 Nov 23

sorting without order

Hello, In order to increase the performance of a script I'd like to sort very large vectors containing repeated integer values. I'm not interesting in having the values sorted, but only grouped. I also need the equivalent of index.return from the standard "sort" function: f(c(10,1,10,100,1,10)) => grouped: c(10,10,10,1,1,100) ix: c(1,3,6,2,5,4) is there a way

Parallel R

2008 Jun 28

Parallel R

Hello, The problem I'm working now requires to operate on big matrices. I've noticed that there are some packages that allows to run some commands in parallel. I've tried snow and NetWorkSpaces, without much success (they are far more slower that the normal functions) My problem is very simple, it doesn't require any communication between parallel tasks; only that it divides

Fast way to determine number of lines in a file

2010 Feb 08

Fast way to determine number of lines in a file

Hi all, Is there a fast way to determine the number of lines in a file? I'm looking for something like count.lines analogous to count.fields. Hadley -- http://had.co.nz/

Couldn't (and shouldn't) is.unsorted() be faster?

2008 Apr 17

Couldn't (and shouldn't) is.unsorted() be faster?

Hi, Couldn't is.unsorted() bail out immediately here (after comparing the first 2 elements): > x <- 20000000:1 > system.time(is.unsorted(x), gcFirst=TRUE) user system elapsed 0.084 0.040 0.124 > x <- 200000000:1 > system.time(is.unsorted(x), gcFirst=TRUE) user system elapsed 0.772 0.440 1.214 Thanks! H.

Summary: read.table on Mac OS X, CARBON vs. DARWIN

2002 Feb 22

Summary: read.table on Mac OS X, CARBON vs. DARWIN

Thanks a lot, James!! The problem is fixed. On the version 1.4.0 Mac/darwin (the latest available version for this system) the function read.table (which is called from read.delim etc., too) has the bug you explained. Inserting the row nlines <- nlines+1 after lines <- c(lines, line) removes this bug. M. On Friday, February 22, 2002, at 02:33 PM, james.holtman at convergys.com

problems in read.table

2007 Sep 06

problems in read.table

Dear R-users, I have encountered the following problem every now and then. But I was dealing with a very small dataset before, so it wasn't a problem (I just edited the dataset in Openoffice speadsheet). This time I have to deal with many large datasets containing commuting flow data. I appreciate if anyone could give me a hint or clue to get out of this problem. I have a .dat file

similar to: how to get how many lines there are in a file.