thr3ads.net - similar to: "Processing 10^8 rows and 1^3 columns"

Displaying 20 results from an estimated 7000 matches similar to: "Processing 10^8 rows and 1^3 columns"

2008 Jul 08

Can R do this ?

I have a folder full of pngs and jpgs, and would like to consolidate them into a pdf with appropriate title and labels. Can this be done via R ? _________________________________________________________________ Easily publish your photos to your Spaces with Photo Gallery. [[alternative HTML version deleted]]

grouping values

2008 Jun 23

grouping values

I tried aggregate, apply etc, but can't get the right result. For example, m <- cbind(c(LETTERS[1:5]), c("aa", "bb", "cc", "aa", "cc")) [,1] [,2][1,] "A" "aa"[2,] "B" "bb"[3,] "C" "cc"[4,] "D" "aa"[5,] "E" "cc" how to obtain

Any simple way to subset a vector of strings that do contain a particular substring ?

2008 Jun 19

Any simple way to subset a vector of strings that do contain a particular substring ?

For example, strings <- c("aaaa", "bbbb","ccba"). How to get "aaaa", "bbbb" that do not contain "ba" ? _________________________________________________________________ [[alternative HTML version deleted]]

Identifying common prefixes from a vector of words, and delete those prefixes

2008 Jul 31

Identifying common prefixes from a vector of words, and delete those prefixes

For example, c("dog.is.an.animal", "cat.is.an.animal", "rat.is.an.animal"). How can I identify the common prefix is ".is.an.animal" and delete it to give c("dog", "cat", "rat") ? Thanks _________________________________________________________________ [[alternative HTML version deleted]]

counting number of "G" in "TCGGGGGACAATCGGTAACCCGTCT"

2008 Jul 15

counting number of "G" in "TCGGGGGACAATCGGTAACCCGTCT"

Any better solution than this ? sum(strsplit("TCGGGGGACAATCGGTAACCCGTCT", "")[[1]] == "G") _________________________________________________________________ [[alternative HTML version deleted]]

Beautify R scripts in microsoft word

2008 Sep 13

Beautify R scripts in microsoft word

I am generating a report containing several R scripts in the appendix. Is there any way to "beautify" the R source codes in microsoft word, similar to what we see in tinn-R ? Thanks _________________________________________________________________ [[alternative HTML version deleted]]

selecting values that are unique, instead of selecting unique values

2008 Jun 25

selecting values that are unique, instead of selecting unique values

unique(c(1:10,1)) gives 1:10 (i.e. unique values), is there any method to get only 2:10 (i.e. values that are unique) ? _________________________________________________________________ Easily edit your photos like a pro with Photo Gallery. [[alternative HTML version deleted]]

Computing row means for sets of 2 columns

2008 Jul 14

Computing row means for sets of 2 columns

Is there a better or more efficent approach than this without the use of t() ? > (m <- matrix(1:40, ncol=4)) [,1] [,2] [,3] [,4] [1,] 1 11 21 31 [2,] 2 12 22 32 [3,] 3 13 23 33 [4,] 4 14 24 34 [5,] 5 15 25 35 [6,] 6 16 26 36 [7,] 7 17 27 37 [8,] 8 18 28 38 [9,] 9 19 29 39[10,] 10 20 30 40 >

Generating GUI for r-scripts

2009 Jan 06

Generating GUI for r-scripts

Hi, I have developed some scripts that basically ask for input tab-limited format files, do some processing, and output several pictures or csv. Now I need to have some gui to wrap on top of the scripts, so that end-users can select their input files, adjust some parameters for processing, and select output folder or filenames. Please advice me if there is any tools or project suitable for

sqldf 0.3-5 package or tcltk problem

2010 Jul 28

sqldf 0.3-5 package or tcltk problem

This is my first post. I am running Mac OS X version 10.6.3. I am running R 2.11.0 GUI 1.33 64 bit. This may or may not be related to sqldf, but I experienced this problem while attempting to use an sqldf query. The same code runs with no problem on my Windows machine. Here is what happens: > r=sqldf("select ... ") Loading required package: tcltk Loading Tcl/Tk interface ... Then

Automatic detachment of dependent packages

2007 Sep 07

Automatic detachment of dependent packages

Dear All, When one loads certain packages, some other dependent packages are loaded as well. Is there some way of detaching them automatically when one detaches the first package loaded? For instance, > library(sqldf) Loading required package: RSQLite Loading required package: DBI Loading required package: gsubfn Loading required package: proto but > detach(package:sqldf) > >

how to convert data from long to wide format ?

2008 Oct 30

how to convert data from long to wide format ?

Given a dataframe m > m X Y V3 V4 1 1 A 0.5 1.2 2 1 B 0.2 1.4 3 2 A 0.1 0.9 How do I convert m to this with V4 as the cell values ? A B 1 1.2 1.4 2 0.9 NA

insert new columns to a matrix

2008 Jun 24

insert new columns to a matrix

Instead of prepend or append new columns to a matrix, how to insert them to a matrix ? For example, I would like to insert 3 new columns after the 5th column of matrix m. _________________________________________________________________ [[elided Hotmail spam]] [[alternative HTML version deleted]]

Speeding up casting a dataframe from long to wide format

2008 Dec 03

Speeding up casting a dataframe from long to wide format

Hi, I am casting a dataframe from long to wide format. The same codes that works for a smaller dataframe would take a long time (more than two hours and still running) for a longer dataframe of 2495227 rows and ten different predictors. How to make it more efficient ? wer <- data.frame(Name=c(1:5, 4:5), Type=c(letters[1:5], letters[4:5]), Predictor=c("A", "A",

Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

2008 Nov 26

Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

My two matrices are roughly the sizes of m1 and m2. I tried using two apply and cor.test to compute the correlation p.values. More than an hour, and the codes are still running. Please help to make it more efficient. m1 <- matrix(rnorm(100000), ncol=100) m2 <- matrix(rnorm(10000000), ncol=100) cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor.test(x,y)$p.value

Delete query in sqldf?

2007 Sep 07

Delete query in sqldf?

Dear All, Is sqldf equipped with delete queries? I have tried delete queries but with no success. Thanks in advance, Paul

How to force aggregate to exclude NA ?

2008 Dec 07

How to force aggregate to exclude NA ?

The aggregate function does "almost" all that I need to summarize a datasets, except that I can't specify exclusion of NAs without a little bit of hassle. > set.seed(143) > m <- data.frame(A=sample(LETTERS[1:5], 20, T), B=sample(LETTERS[1:10], 20, T), C=sample(c(NA, 1:4), 20, T), D=sample(c(NA,1:4), 20, T)) > m A B C D 1 E I 1 NA 2 A C NA NA 3 D I NA 3 4 C I

package NULL not found

2007 Jul 19

package NULL not found

In performing Rcmd check I am getting this output regarding using Argument '' and a NULL package not found and it stops with an error: * using log directory 'C:/Rpkgs/sqldf.Rcheck' * using ARGUMENT ' ' __ignored__ R version 2.5.1 (2007-06-27) * checking for file 'sqldf/DESCRIPTION' ... OK * this is package 'sqldf' version '0.1-0' * checking package

Value Lookup from File without Slurping

2009 Jan 16

Value Lookup from File without Slurping

Dear all, I have a repository file (let's call it repo.txt) that contain two columns like this: # tag value AAA 0.2 AAT 0.3 AAC 0.02 AAG 0.02 ATA 0.3 ATT 0.7 Given another query vector > qr <- c("AAC", "ATT") I would like to find the corresponding value for each query above, yielding: 0.02 0.7 However, I want to avoid slurping whole repo.txt

Fast method to compute average values of duplicated IDs

2008 Jun 10

Fast method to compute average values of duplicated IDs

Hi, How do I collapse (average in the simplest case) the values of those duplicated ids (i.e., 2, 5, 6, 9) to give a table of unique ids ? t <- cbind(id=c(1:10, 2,5,6,9), value=rnorm(14)) _________________________________________________________________ [[alternative HTML version deleted]]

similar to: Processing 10^8 rows and 1^3 columns