thr3ads.net - similar to: "Creating a custom connection to read from multiple files"

Displaying 20 results from an estimated 10000 matches similar to: "Creating a custom connection to read from multiple files"

2004 Feb 24

Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on hypatia.math.ethz.ch X-Spam-Status: No, hits=3.5 required=5.0 tests=BAYES_44,RCVD_IN_BL_SPAMCOP_NET autolearn=no version=2.63 X-Spam-Level: *** Dear ladies and gentlmen, I want to import a directory with about 400 files (.dat) in R. I know how to

Trying to merge new data set to bottom of old data set. Both are zoo objects.

2012 Apr 04

Trying to merge new data set to bottom of old data set. Both are zoo objects.

Here is the data I'm working with: http://r.789695.n4.nabble.com/file/n4530888/new.txt new.txt http://r.789695.n4.nabble.com/file/n4530888/old.txt old.txt My code is here: http://pastebin.com/9jjs6Ahr I'm looking for away to simply attach the new.txt to the bottom of old.txt through R, else I'll just throw it in Excel to do some preprocessing. I've looked into using merge,

Computing sums of the columns of an array

2005 Aug 05

Computing sums of the columns of an array

Hi, I have a 5x731 array A, and I want to compute the sums of the columns. Currently I do: apply(A, 2, sum) But it turns out, this is slow: 70% of my CPU time is spent here, even though there are many complicated steps in my computation. Is there a faster way? Thanks, Martin

large dataset import, aggregation and reshape

2005 Apr 24

large dataset import, aggregation and reshape

Dear useRs We have a data-set (comma delimited) with 12Millions of rows, and 5 columns (in fact many more, but we need only 4 of them): id, factor 'a' (5 levels), factor 'b' (15 levels), date-stamp, numeric measurement. We run R on suse-linux 9.1 with 2GB RAM, (and a 3.5GB swap file). on average we have 30 obs. per id. We want to aggregate (eg. sum of the measuresments under

R Newbie, please help!

2010 Jun 04

R Newbie, please help!

Hello Everyone, I just started a new job & it requires heavy use of R to analyze datasets. I have a data.table that looks like this. It is sorted by ID & Date, there are about 150 different IDs & the dataset spans 3 million rows. The main columns of concern are ID, date, and totret. What I need to do is to derive daily returns for each ID from totret, which is simply totret at time

crossprod is slower than t(AA)%*BB

2008 Mar 10

crossprod is slower than t(AA)%*BB

Dear Rdevelopers The background for this email is that I was helping a PhD student to improve the speed of her R code. I suggested to replace calls like t(AA)%*% BB by crossprod(AA,BB) since I expected this to be faster. The surprising result to me was that this change actually made her code slower. > ## Examples : > > AA <- matrix(rnorm(3000*1000),3000,1000) > BB <-

counting numbers without replicates in a vector

2004 Dec 16

counting numbers without replicates in a vector

Hi, I am just wondering if there is an easy way to count in a numeric vector how many numbers don't have replicates. For example, a=c(1,1,2,2,3,4,5), how can I know there are three numbers (3, 4 and 5) without replicates? Thank you! Jun =====

Parallel R

2008 Jun 28

Parallel R

Hello, The problem I'm working now requires to operate on big matrices. I've noticed that there are some packages that allows to run some commands in parallel. I've tried snow and NetWorkSpaces, without much success (they are far more slower that the normal functions) My problem is very simple, it doesn't require any communication between parallel tasks; only that it divides

Couldn't (and shouldn't) is.unsorted() be faster?

2008 Apr 17

Couldn't (and shouldn't) is.unsorted() be faster?

Hi, Couldn't is.unsorted() bail out immediately here (after comparing the first 2 elements): > x <- 20000000:1 > system.time(is.unsorted(x), gcFirst=TRUE) user system elapsed 0.084 0.040 0.124 > x <- 200000000:1 > system.time(is.unsorted(x), gcFirst=TRUE) user system elapsed 0.772 0.440 1.214 Thanks! H.

Loops and dataframes

2005 Feb 25

Loops and dataframes

Hi, I am experiencing a long delay when using dataframes inside loops and was wordering if this is a bug or not. Example code: > st <- rep(1,100000) > ed <- rep(2,100000) > for(i in 1:length(st)) st[i] <- ed[i] # works fine > df <- data.frame(start=st,end=ed) > for(i in 1:dim(df)[1]) df[i,1] <- df[i,2] #takes for ever R: R 2.0.0 (2004-10-04) OS: Linux, Fedora Core 2

how to get how many lines there are in a file.

2004 Dec 06

how to get how many lines there are in a file.

hi all If I wanna get the total number of lines in a big file without reading the file's content into R as matrix or data frame, any methods or functions? thanks in advance. Regards

skipping specific rows in read.table

2002 Apr 22

skipping specific rows in read.table

Hi, We are considering organizing some of our ascii files with multiple "column names" like so: a.long.but.complete.name a.different.complex.name short.name.1 short.name.2 1 7 2 8 3 9 [more data] The basic idea is that we want to keep, in one location, both a long descriptive name of each variable (in row 1) and a short convenient name (in row 2). I could imagine keeping other

Pearson corelation and p-value for matrix

2005 Apr 15

Pearson corelation and p-value for matrix

Hi, I was trying to evaluate the pearson correlation and the p-values for an nxm matrix, where each row represents a vector. One way to do it would be to iterate through each row, and find its correlation value( and the p-value) with respect to the other rows. Is there some function by which I can use the matrix as input? Ideally, the output would be an nxn matrix, containing the p-values

Weighted.mean(x,wt) vs. t(x) %*% wt

2005 Jan 24

Weighted.mean(x,wt) vs. t(x) %*% wt

What is the difference between the above two operations ? [[alternative HTML version deleted]]

merging corpora and metadata

2011 Nov 17

merging corpora and metadata

Greetings! I loose all my metadata after concatenating corpora. This is an example of what happens: > meta(corpus.1) MetaID cid fid selfirst selend fname 1 0 1 11 2169 2518 WCPD-2001-01-29-Pg217.scrb 2 0 1 14 9189 9702 WCPD-2003-01-13-Pg39.scrb 3 0 1 14 2109 2577 WCPD-2003-01-13-Pg39.scrb .... .... 17 0

write.table with row.names=FALSE unnecessarily slow?

2008 Mar 10

write.table with row.names=FALSE unnecessarily slow?

write.table with large data frames takes quite a long time > system.time({ + write.table(df, '/tmp/dftest.txt', row.names=FALSE) + }, gcFirst=TRUE) user system elapsed 97.302 1.532 98.837 A reason is because dimnames is always called, causing 'anonymous' row names to be created as character vectors. Avoiding this in src/library/utils, along the lines of Index:

automatically adjusting axis limits

2009 Oct 27

automatically adjusting axis limits

Dear R users, I am a newbie. Just switched from MATLAB. So thanks a lot for your patience. I have 50000 spectra collected in field. Each spectra has two columns : Wavelength (56) and the actual measurement. Each measurement came in a different .txt file on disk (50000 files in total). I wrote a script that reads every spectra in a for loop and constructs two variables : Wavelength (56) and

read.table performance

2011 Dec 06

read.table performance

** Disclaimer: I'm looking for general suggestions ** I'm sorry, but can't send out the file I'm using, so there is no reproducible example. I'm using read.table and it's taking over 30 seconds to read a tiny file. The strange thing is that it takes roughly the same amount of time if the file is 100 times larger. After re-reviewing the data Import / Export manual I think

R Memory Usage Concerns

2009 Sep 15

R Memory Usage Concerns

Hello all, To start with, these measurements are on Linux with R 2.9.2 (64-bit build) and Python 2.6 (also 64-bit). I've been investigating R for some log file analysis that I've been doing. I'm coming at this from the angle of a programmer whose primarily worked in Python. As I've been playing around with R, I've noticed that R seems to use a *lot* of memory, especially

sorting without order

2004 Nov 23

sorting without order

Hello, In order to increase the performance of a script I'd like to sort very large vectors containing repeated integer values. I'm not interesting in having the values sorted, but only grouped. I also need the equivalent of index.return from the standard "sort" function: f(c(10,1,10,100,1,10)) => grouped: c(10,10,10,1,1,100) ix: c(1,3,6,2,5,4) is there a way

similar to: Creating a custom connection to read from multiple files