thr3ads.net - similar to: "Fast method to compute average values of duplicated IDs"

Displaying 20 results from an estimated 20000 matches similar to: "Fast method to compute average values of duplicated IDs"

Identifying common prefixes from a vector of words, and delete those prefixes

2008 Jul 31

Identifying common prefixes from a vector of words, and delete those prefixes

For example, c("dog.is.an.animal", "cat.is.an.animal", "rat.is.an.animal"). How can I identify the common prefix is ".is.an.animal" and delete it to give c("dog", "cat", "rat") ? Thanks _________________________________________________________________ [[alternative HTML version deleted]]

Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

2008 Nov 26

Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

My two matrices are roughly the sizes of m1 and m2. I tried using two apply and cor.test to compute the correlation p.values. More than an hour, and the codes are still running. Please help to make it more efficient. m1 <- matrix(rnorm(100000), ncol=100) m2 <- matrix(rnorm(10000000), ncol=100) cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor.test(x,y)$p.value

Any simple way to subset a vector of strings that do contain a particular substring ?

2008 Jun 19

Any simple way to subset a vector of strings that do contain a particular substring ?

For example, strings <- c("aaaa", "bbbb","ccba"). How to get "aaaa", "bbbb" that do not contain "ba" ? _________________________________________________________________ [[alternative HTML version deleted]]

counting number of "G" in "TCGGGGGACAATCGGTAACCCGTCT"

2008 Jul 15

counting number of "G" in "TCGGGGGACAATCGGTAACCCGTCT"

Any better solution than this ? sum(strsplit("TCGGGGGACAATCGGTAACCCGTCT", "")[[1]] == "G") _________________________________________________________________ [[alternative HTML version deleted]]

grouping values

2008 Jun 23

grouping values

I tried aggregate, apply etc, but can't get the right result. For example, m <- cbind(c(LETTERS[1:5]), c("aa", "bb", "cc", "aa", "cc")) [,1] [,2][1,] "A" "aa"[2,] "B" "bb"[3,] "C" "cc"[4,] "D" "aa"[5,] "E" "cc" how to obtain

selecting values that are unique, instead of selecting unique values

2008 Jun 25

selecting values that are unique, instead of selecting unique values

unique(c(1:10,1)) gives 1:10 (i.e. unique values), is there any method to get only 2:10 (i.e. values that are unique) ? _________________________________________________________________ Easily edit your photos like a pro with Photo Gallery. [[alternative HTML version deleted]]

Beautify R scripts in microsoft word

2008 Sep 13

Beautify R scripts in microsoft word

I am generating a report containing several R scripts in the appendix. Is there any way to "beautify" the R source codes in microsoft word, similar to what we see in tinn-R ? Thanks _________________________________________________________________ [[alternative HTML version deleted]]

Can R do this ?

2008 Jul 08

Can R do this ?

I have a folder full of pngs and jpgs, and would like to consolidate them into a pdf with appropriate title and labels. Can this be done via R ? _________________________________________________________________ Easily publish your photos to your Spaces with Photo Gallery. [[alternative HTML version deleted]]

Generating GUI for r-scripts

2009 Jan 06

Generating GUI for r-scripts

Hi, I have developed some scripts that basically ask for input tab-limited format files, do some processing, and output several pictures or csv. Now I need to have some gui to wrap on top of the scripts, so that end-users can select their input files, adjust some parameters for processing, and select output folder or filenames. Please advice me if there is any tools or project suitable for

how to convert data from long to wide format ?

2008 Oct 30

how to convert data from long to wide format ?

Given a dataframe m > m X Y V3 V4 1 1 A 0.5 1.2 2 1 B 0.2 1.4 3 2 A 0.1 0.9 How do I convert m to this with V4 as the cell values ? A B 1 1.2 1.4 2 0.9 NA

Speeding up casting a dataframe from long to wide format

2008 Dec 03

Speeding up casting a dataframe from long to wide format

Hi, I am casting a dataframe from long to wide format. The same codes that works for a smaller dataframe would take a long time (more than two hours and still running) for a longer dataframe of 2495227 rows and ten different predictors. How to make it more efficient ? wer <- data.frame(Name=c(1:5, 4:5), Type=c(letters[1:5], letters[4:5]), Predictor=c("A", "A",

How to force aggregate to exclude NA ?

2008 Dec 07

How to force aggregate to exclude NA ?

The aggregate function does "almost" all that I need to summarize a datasets, except that I can't specify exclusion of NAs without a little bit of hassle. > set.seed(143) > m <- data.frame(A=sample(LETTERS[1:5], 20, T), B=sample(LETTERS[1:10], 20, T), C=sample(c(NA, 1:4), 20, T), D=sample(c(NA,1:4), 20, T)) > m A B C D 1 E I 1 NA 2 A C NA NA 3 D I NA 3 4 C I

insert new columns to a matrix

2008 Jun 24

insert new columns to a matrix

Instead of prepend or append new columns to a matrix, how to insert them to a matrix ? For example, I would like to insert 3 new columns after the 5th column of matrix m. _________________________________________________________________ [[elided Hotmail spam]] [[alternative HTML version deleted]]

How to preserve the numeric format and digits ?

2008 Jul 25

How to preserve the numeric format and digits ?

Instead of > m <- c(400000000, 50000000000) > paste("A", m, "B", sep="") [1] "A4e+08B" "A5e+10B" I want "A400000000" and "A50000000000"

Prevent read.table from converting "+" and "-" to 0

2008 Nov 04

Prevent read.table from converting "+" and "-" to 0

I am using read.table("data.txt", sep="\t") to read in a tab-limited text file. However, two columns of data were read wrongly. read.table converts "+" and "-" in the two columns to 0. I have tried setting other parameters but to no avail. TIA _________________________________________________________________ Get in touch with your inner athlete. Take the

How to optimize this codes ?

2008 Dec 04

How to optimize this codes ?

How to optimize the for-loop to be reasonably fast for sample.size=100000000 ? You may want to change sample.size=1000 to have an idea what I am achieving. set.seed(143) A <- matrix(sample(0:1, sample.size, TRUE), ncol=10, dimnames=list(NULL, LETTERS[1:10])) B <- list() for(i in 1:10) { B[[i]] <- apply(combn(LETTERS[1:10], i), 2, function(x) { sum(apply(data.frame(A[,x]), 1,

Reshape matrix from wide to long format

2008 Nov 25

Reshape matrix from wide to long format

I forgot the reshape equivalent for converting from wide to long format. Can someone help as my matrix is very big. The followin is just an example. > m <- matrix(1:20, nrow=4, dimnames=list(LETTERS[1:4], letters[1:5])) > m a b c d e A 1 5 9 13 17 B 2 6 10 14 18 C 3 7 11 15 19 D 4 8 12 16 20 > as.data.frame(cbind(rep(rownames(m), ncol(m)), rep(colnames(m), each=nrow(m)),

replacing segments of vector by their averages

2008 Jun 19

replacing segments of vector by their averages

Given a vector of numeric of length n, I need to find segments that are >= 0.2, compute the average of individual segments, and replace the original values in each segment by their corresponding averages. For example, there are three segments that are >= 0.2, the average of 1st segment is 0.3, 2nd is 0.5, and the 3rd is 0.5333333 >

ADaCGH package crashes at mpiInit()

2008 Jun 12

ADaCGH package crashes at mpiInit()

I have successfully installed ADaCGH package, and trying the example in SegmentPlotWrite did produce alot of pngs and html. I tried again the same example this morning (after a long night of installation), ADaCGH crashes at mpiInit() showing the error: Loading required package: Rmpi ELAN_EXCEPTION @ --: 6 (Initialisation error) elan_init: Can't get capability from environment Aborted I

Computing row means for sets of 2 columns

2008 Jul 14

Computing row means for sets of 2 columns

Is there a better or more efficent approach than this without the use of t() ? > (m <- matrix(1:40, ncol=4)) [,1] [,2] [,3] [,4] [1,] 1 11 21 31 [2,] 2 12 22 32 [3,] 3 13 23 33 [4,] 4 14 24 34 [5,] 5 15 25 35 [6,] 6 16 26 36 [7,] 7 17 27 37 [8,] 8 18 28 38 [9,] 9 19 29 39[10,] 10 20 30 40 >

similar to: Fast method to compute average values of duplicated IDs