Displaying 20 results from an estimated 10000 matches similar to: "Retrieve indexes of the "first occurrence of numbers" in an effective manner"
2012 Mar 12
3
Idea/package to "linearize a curve" along the diagonal?
Hi,
I am trying to normalize some data. First I fitted a principal curve
(using the LCPM package), but now I would like to apply a
transformation so that the curve becomes a "straight diagonal line" on
the plot. The data used to fit the curve would then be normalized by
applying the same transformation to it.
A simple solution could be to apply translations only (e.g., as done
after a
2012 Dec 27
4
Finding (swapped) repetitions of numbers pairs across two columns
Hi,
I've had this problem for a while and tackled it is a quite dirty way
so I'm wondering is a better solution exists:
If we have two vectors:
v1 = c(0,1,2,3,4)
v2 = c(5,3,2,1,0)
How to remove one instance of the "3,1" / "1,3" double?
At the moment I'm using the following solution, which is quite horrible:
v1 = c(0,1,2,3,4)
v2 = c(5,3,2,1,0)
ft <-
2006 Sep 13
3
group bunch of lines in a data.frame, an additional requirement
Thanks for pointing me out "aggregate", that works fine!
There is one complication though: I have mixed types (numerical and character),
So the matrix is of the form:
A 1.0 200 ID1
A 3.0 800 ID1
A 2.0 200 ID1
B 0.5 20 ID2
B 0.9 50 ID2
C 5.0 70 ID1
One letter always has the same ID but one ID can be shared by many
letters (like ID1)
I just want to keep track of the ID, and get
2012 Apr 19
3
How to "flatten" a multidimensional array into a dataframe?
Hi,
I have a three dimensional array, e.g.,
my.array = array(0, dim=c(2,3,4), dimnames=list( d1=c("A1","A2"),
d2=c("B1","B2","B3"), d3=c("C1","C2","C3","C4")) )
what I would like to get is then a dataframe:
d1 d2 d3 value
A1 B1 C1 0
A2 B1 C1 0
.
.
.
A2 B3 C4 0
I'm sure there is one function to do
2008 Aug 12
1
which(df$name=="A") takes ~1 second! (df is very large), but can it be speeded up?
Dear All,
I have a large data frame ( 2700000 lines and 14 columns), and I would like to
extract the information in a particular way illustrated below:
Given a data frame "df":
> col1=sample(c(0,1),10, rep=T)
> names = factor(c(rep("A",5),rep("B",5)))
> df = data.frame(names,col1)
> df
names col1
1 A 1
2 A 0
3 A 1
4 A
2007 Aug 21
2
(Most efficient) way to make random sequences of random sequences
Hi,
I was wondering the what would be the (most efficient) way to generate
a sequence
of sequences, i mean:
if I have 1,2 and 3.
I'd like to generate a sequence of length N*3 (N ~ 1,000,000 or more)
Where random permutations of the sequence 1,2,3 follow each other.
i.e 1,2,3,1,3,2,3,2,1
/!\ The thing is that there should never be twice the same number of
in the same sub-sequence,
2011 Mar 22
1
Best HMM package to generate random (protein) sequences?
Dear All,
I would like to generate random protein sequences using a HMM model.
Has anybody done that before, or would you have any idea which package
is likely to be best for that?
The important facts are that the HMM will be fitted on ~3 million
sequential observations, with 20 different states (one for each amino
acid). I guess that 2-5 hidden states should be enough, and an order
of 3 would
2012 May 11
1
How to re-order clusters of hclust output?
Hello,
The heatmap function conveniently has a "reorder.dendrogram" function
so that clusters follow a certain logic.
It seems that the hclust function doesn't have such feature. I can use
the "reorder" function on the dendrogram obtained from hclust, but
this does not modify the hclust object itself.
I understand that the answer should be within the "heatmap"
2012 Mar 11
1
Which non-parametric regression would allow fitting this type of data? (example given).
Hi,
I'm wondering which function would allow fitting this type of data:
tmp=rnorm(2000)
X.1 = 5+tmp
Y.1 = 5+ (5*tmp+rnorm(2000))
tmp=rnorm(100)
X.2 = 9+tmp
Y.2 = 40+ (1.5*tmp+rnorm(100))
X.3 = 7+ 0.5*runif(500)
Y.3 = 15+20*runif(500)
X = c(X.1,X.2,X.3)
Y = c(Y.1,Y.2,Y.3)
plot(X,Y)
The problem with loess is that distances for the "goodness of
2011 Aug 14
1
FYI : XML 3.4.2 still breaks odfWeave 0.7.17
Dear list,
perusing the GMane archive shows that the issue with XML 3.4.x still bugs
odfWeave users.
I just checked that the newer XML 3.4.2 version still give the same
problem. Using it to weave a bit of documentation written with LibreOffice
3.3.3 (current in Debian testing) leads me to a 192 Kb, 590 pages
document, whereas reinstalling XML 3.2.0 gives me a 4 pages, 24Kb file
(not
2008 Oct 20
1
Mclust problem with mclust1Dplot: Error in to - from : non-numeric argument to binary operator
Dear list members,
I am using Mclust in order to deconvolute a distribution that I
believe is a sum of two gaussians.
First I can make a model:
> my.data.model = Mclust(my.data, modelNames=c("E"), warn=T, G=1:3)
But then, when I try to plot the result, I get the following error:
> mclust1Dplot(my.data.model, parameters = my.data.model$parameters, what = "density")
2009 Aug 12
3
Random sampling while keeping distribution of nearest neighbor distances constant.
Dear All,
I cannot find a solution to the following problem although I imagine
that it is a classic, hence my email.
I have a vector V of X values comprised between 1 and N.
I would like to get random samples of X values also comprised between
1 and N, but the important point is:
* I would like to keep the same distribution of distances between the X values *
For example let's say N=10 and
2010 Nov 16
1
problem with PDF/postcript, cannot change paper size: "‘mode(width)’ and ‘mode(height)’ differ between new and previous"
Hi,
The pdf function would not let me change the paper size and gives me
the following warning:
pdf("figure.pdf", width="6", height="10")
Warning message:
?mode(width)? and ?mode(height)? differ between new and previous
==> NOT changing ?width? & ?height?
If I use the option paper = "a4r", it does not give me a warning
but still prints on a
2007 Apr 20
1
A particular shuffling on a vector
Hello,
I was wondering if anyone can think of a straightforward way (without
loops) to do the following shuffling:
Let's imagine a vector:
c(1,1,1,2,2,3,3,3)
I would like to derive shuffled vectors __where the same digits are
never separated__, although they can be at both ends (periodicity).
So the following shuffled vectors are possible:
c(2,2,1,1,1,3,3,3)
c(2,1,1,1,3,3,3,2)
2011 May 18
1
matrix help (first occurrence of variable in column)
Dear R help,
Apologies for the less than informative subject line. I will do my
best to describe my problem.
Consider the following matrix:
mdat <- matrix(c(1,0,1,1,1,0), nrow = 2, ncol=3, byrow=TRUE,
dimnames = list(c("T1", "T2"),
c("sp.1", "sp.2", "sp.3")))
mdat
In my actual data I have time
2006 Apr 18
1
Compare two Power law or Exponential distributions
Dear All,
I'd like to compare exponential or power-law distributions.
To do so, people are often referred to the ks.test. However,
I imagine ks.test wouldn't be as powerful as a test specifically
designed for a distribution type.
So my question is, is there a more specific test for each of
these distribution? (exponential or power-law)
Thanks for your hints!
E
2012 Sep 19
3
effective way to return only the first argument of "which()"
Hi,
I was looking for a function like "which()" but only returns the first argument.
Compare:
x <- c(1,2,3,4,5,6)
y <- 4
which(x>y)
returns:
5,6
which(x>y)[1]
returns:
5
which(x>y)[1] is exactly what i need. I did use this but the dataset
is too big (~18 mio. Points).
That's why i need a more effective way to get the first element of a
vector which is
2012 Mar 10
1
How to improve the robustness of "loess"? - example included.
Hi,
I posted a message earlier entitled "How to fit a line through the
"Mountain crest" ..."
I figured loess is probably the best way, but it seems that the
problem is the robustness of the fit. Below I paste an example to
illustrate the problem:
tmp=rnorm(2000)
X.background = 5+tmp; Y.background = 5+ (10*tmp+rnorm(2000))
X.specific = 3.5+3*runif(1000);
2011 Jan 14
9
Selecting the first occurrence of a value after an occurrence of a different value
Hello everyone,
I am currently working with some data where I need to select the first
occurrence of a value after the occurrence of another value.
The data has two columns, one with a time and one with occurence of certain
events.
The column of data I want to select from looks like this (and each of these
events have a corresponding time in another column).
Stat71
OutMag
FirstResp
InMag
2011 Sep 07
2
finding events in a time duration.
Hi,
Premises: I have a database which contain the list of events and their time
stamps (This is a Unix time stamps)
What I want to do : I want know how much is the maximum occurrence of this
in any a time period of 7 days or does a event occur es more than "N" (say
5) times in a period of 7 days.
This time period is not fixed with "week