thr3ads.net - similar to: "(Most efficient) way to make random sequences of random sequences"

Displaying 20 results from an estimated 10000 matches similar to: "(Most efficient) way to make random sequences of random sequences"

Best HMM package to generate random (protein) sequences?

2011 Mar 22

Best HMM package to generate random (protein) sequences?

Dear All, I would like to generate random protein sequences using a HMM model. Has anybody done that before, or would you have any idea which package is likely to be best for that? The important facts are that the HMM will be fitted on ~3 million sequential observations, with 20 different states (one for each amino acid). I guess that 2-5 hidden states should be enough, and an order of 3 would

Finding (swapped) repetitions of numbers pairs across two columns

2012 Dec 27

Finding (swapped) repetitions of numbers pairs across two columns

Hi, I've had this problem for a while and tackled it is a quite dirty way so I'm wondering is a better solution exists: If we have two vectors: v1 = c(0,1,2,3,4) v2 = c(5,3,2,1,0) How to remove one instance of the "3,1" / "1,3" double? At the moment I'm using the following solution, which is quite horrible: v1 = c(0,1,2,3,4) v2 = c(5,3,2,1,0) ft <-

Random sampling while keeping distribution of nearest neighbor distances constant.

2009 Aug 12

Random sampling while keeping distribution of nearest neighbor distances constant.

Dear All, I cannot find a solution to the following problem although I imagine that it is a classic, hence my email. I have a vector V of X values comprised between 1 and N. I would like to get random samples of X values also comprised between 1 and N, but the important point is: * I would like to keep the same distribution of distances between the X values * For example let's say N=10 and

group bunch of lines in a data.frame, an additional requirement

2006 Sep 13

group bunch of lines in a data.frame, an additional requirement

Thanks for pointing me out "aggregate", that works fine! There is one complication though: I have mixed types (numerical and character), So the matrix is of the form: A 1.0 200 ID1 A 3.0 800 ID1 A 2.0 200 ID1 B 0.5 20 ID2 B 0.9 50 ID2 C 5.0 70 ID1 One letter always has the same ID but one ID can be shared by many letters (like ID1) I just want to keep track of the ID, and get

A particular shuffling on a vector

2007 Apr 20

A particular shuffling on a vector

Hello, I was wondering if anyone can think of a straightforward way (without loops) to do the following shuffling: Let's imagine a vector: c(1,1,1,2,2,3,3,3) I would like to derive shuffled vectors __where the same digits are never separated__, although they can be at both ends (periodicity). So the following shuffled vectors are possible: c(2,2,1,1,1,3,3,3) c(2,1,1,1,3,3,3,2)

Retrieve indexes of the "first occurrence of numbers" in an effective manner

2012 Dec 27

Retrieve indexes of the "first occurrence of numbers" in an effective manner

Hi, That sounds simple but I cannot think of a really fast way of getting the following: c(1,1,2,2,3,3,4,4) would give c(1,3,5,7) i.e., a function that returns the indexes of the first occurrences of numbers. Note that numbers may have any order e.g., c(3,4,1,2,1,1,2,3,5), can be very large, and the vectors are also very large (which prohibits any loop). The best I could think of is: tmp =

How to "flatten" a multidimensional array into a dataframe?

2012 Apr 19

How to "flatten" a multidimensional array into a dataframe?

Hi, I have a three dimensional array, e.g., my.array = array(0, dim=c(2,3,4), dimnames=list( d1=c("A1","A2"), d2=c("B1","B2","B3"), d3=c("C1","C2","C3","C4")) ) what I would like to get is then a dataframe: d1 d2 d3 value A1 B1 C1 0 A2 B1 C1 0 . . . A2 B3 C4 0 I'm sure there is one function to do

Idea/package to "linearize a curve" along the diagonal?

2012 Mar 12

Idea/package to "linearize a curve" along the diagonal?

Hi, I am trying to normalize some data. First I fitted a principal curve (using the LCPM package), but now I would like to apply a transformation so that the curve becomes a "straight diagonal line" on the plot. The data used to fit the curve would then be normalized by applying the same transformation to it. A simple solution could be to apply translations only (e.g., as done after a

which(df$name=="A") takes ~1 second! (df is very large), but can it be speeded up?

2008 Aug 12

which(df$name=="A") takes ~1 second! (df is very large), but can it be speeded up?

Dear All, I have a large data frame ( 2700000 lines and 14 columns), and I would like to extract the information in a particular way illustrated below: Given a data frame "df": > col1=sample(c(0,1),10, rep=T) > names = factor(c(rep("A",5),rep("B",5))) > df = data.frame(names,col1) > df names col1 1 A 1 2 A 0 3 A 1 4 A

How to re-order clusters of hclust output?

2012 May 11

How to re-order clusters of hclust output?

Hello, The heatmap function conveniently has a "reorder.dendrogram" function so that clusters follow a certain logic. It seems that the hclust function doesn't have such feature. I can use the "reorder" function on the dendrogram obtained from hclust, but this does not modify the hclust object itself. I understand that the answer should be within the "heatmap"

Which non-parametric regression would allow fitting this type of data? (example given).

2012 Mar 11

Which non-parametric regression would allow fitting this type of data? (example given).

Hi, I'm wondering which function would allow fitting this type of data: tmp=rnorm(2000) X.1 = 5+tmp Y.1 = 5+ (5*tmp+rnorm(2000)) tmp=rnorm(100) X.2 = 9+tmp Y.2 = 40+ (1.5*tmp+rnorm(100)) X.3 = 7+ 0.5*runif(500) Y.3 = 15+20*runif(500) X = c(X.1,X.2,X.3) Y = c(Y.1,Y.2,Y.3) plot(X,Y) The problem with loess is that distances for the "goodness of

Mclust problem with mclust1Dplot: Error in to - from : non-numeric argument to binary operator

2008 Oct 20

Mclust problem with mclust1Dplot: Error in to - from : non-numeric argument to binary operator

Dear list members, I am using Mclust in order to deconvolute a distribution that I believe is a sum of two gaussians. First I can make a model: > my.data.model = Mclust(my.data, modelNames=c("E"), warn=T, G=1:3) But then, when I try to plot the result, I get the following error: > mclust1Dplot(my.data.model, parameters = my.data.model$parameters, what = "density")

problem with PDF/postcript, cannot change paper size: "‘mode(width)’ and ‘mode(height)’ differ between new and previous"

2010 Nov 16

problem with PDF/postcript, cannot change paper size: "‘mode(width)’ and ‘mode(height)’ differ between new and previous"

Hi, The pdf function would not let me change the paper size and gives me the following warning: pdf("figure.pdf", width="6", height="10") Warning message: ?mode(width)? and ?mode(height)? differ between new and previous ==> NOT changing ?width? & ?height? If I use the option paper = "a4r", it does not give me a warning but still prints on a

Compare two Power law or Exponential distributions

2006 Apr 18

Compare two Power law or Exponential distributions

Dear All, I'd like to compare exponential or power-law distributions. To do so, people are often referred to the ks.test. However, I imagine ks.test wouldn't be as powerful as a test specifically designed for a distribution type. So my question is, is there a more specific test for each of these distribution? (exponential or power-law) Thanks for your hints! E

How to improve the robustness of "loess"? - example included.

2012 Mar 10

How to improve the robustness of "loess"? - example included.

Hi, I posted a message earlier entitled "How to fit a line through the "Mountain crest" ..." I figured loess is probably the best way, but it seems that the problem is the robustness of the fit. Below I paste an example to illustrate the problem: tmp=rnorm(2000) X.background = 5+tmp; Y.background = 5+ (10*tmp+rnorm(2000)) X.specific = 3.5+3*runif(1000);

Find cyclically identical binary sequences

2009 May 05

Find cyclically identical binary sequences

Dear R-helpers, I need to generate all the binary sequences of length n (here n = 8) that start with 1 and have no fewer than two of each digit, and are not cyclic permutations of each other. Here is what I have done: len <- 8 df <- as.data.frame(numeric(2^(len - 1)) %o% numeric(len)) require(partitions) for (i in 1:2^(len - 1)) df[i, ] <- binary(i, dim = len)[[1]] df <-

Constrained vector permutation

2010 Jan 28

Constrained vector permutation

Hello, I'm trying to permute a vector of positive integers > 0 with the constraint that each element must be <= twice the element before it (i.e. for some vector x, x[i] <= 2*x[i-1]), assuming the "0th" element is 1. Hence the first element of the vector must always be 1 or 2 (by assuming the "0th" element is 1). Similarly, the 2nd must always be below/= 4, the

Basic help needed: group bunch of lines in a list (matrix)

2006 Sep 12

Basic help needed: group bunch of lines in a list (matrix)

Hello, I'd like to group the lines of a matrix so that: A 1.0 200 A 3.0 800 A 2.0 200 B 0.5 20 B 0.9 50 C 5.0 70 Would give: A 2.0 400 B 0.7 35 C 5.0 70 So all lines corresponding to a letter (level), become a single line where all the values of each column are averaged. I've done that with a loop but it doesn't sound right (it is very slow). I imagine there is a sort of

Smoothing z-values according to their x, y positions

2008 Mar 19

Smoothing z-values according to their x, y positions

Dear All, I'm sure this is not the first time this question comes up but I couldn't find the keywords that would point me out to it - so apologies if this is a re-post. Basically I've got thousands of points, each depending on three variables: x, y, and z. if I do a plot(x,y, col=z), I get something very messy. So I would like to smooth the values of z according to the values of

Mathematica now working with Nvidia GPUs --> any plan for R?

2008 Nov 18

Mathematica now working with Nvidia GPUs --> any plan for R?

Dear All, I just read an announcement saying that Mathematica is launching a version working with Nvidia GPUs. It is claimed that it'd make it ~10-100x faster! http://www.physorg.com/news146247669.html I was wondering if you are aware of any development going into this direction with R? Thanks for sharing your thoughts, Best wishes, Emmanuel

similar to: (Most efficient) way to make random sequences of random sequences