Displaying 20 results from an estimated 6000 matches similar to: "Hints for Data Mining"
2011 Sep 02
1
Hints for Data Clustering
Dear All,
I will be confronted (relatively soon) with the following problem:
given a set of known statistical indicators {s_i} , i=1,2...N for a N
countries I would like to be able to do some data clustering i.e.
determining the best way to partition the N countries according to their
known properties, encoded by the {s_i} set of indicators for those
countries.
Some properties of these
2004 Apr 18
2
lm with data=(means,sds,ns)
Hi Folks,
I am dealing with data which have been presented as
at each x_i, mean m_i of the y-values at x_i,
sd s_i of the y-values at x_i
number n_i of the y-values at x_i
and I want to linearly regress y on x.
There does not seem to be an option to 'lm' which can
deal with such data directly, though the regression
problem could be algebraically
2009 Oct 01
1
Help for 3D Plotting Data on 'Irregular' Grid
Dear All,
Here is what I am trying to achieve: I would like to plot some data in 3D.
Usually, one has a matrix of the kind
y_1(x_1) , y_1(x_2).....y_1(x_i)
y_2(x_1) , y_2(x_2).....y_2(x_i)
...........................................
y_n(x_1) , y_n(x_2)......y_n(x_i)
where e.g. y_2(x_1) is the value of y at time 2 at point x_1 (see that
the grid in x is the same for the y values at all times).
2011 Oct 31
1
Question on estimating standard errors with noisy signals using the quantreg package
Dear all,
My question might be more of a statistics question than a question on R,
although it's on how to apply the 'quantreg' package. Please accept my
apologies if you believe I am strongly misusing this list.
To be very brief, the problem is that I have data on only a random draw, not
all of doctors' patients. I am interested in the, say, median number of
patients of
1998 Jun 24
1
SPAM: Important Legislative Alert (fwd)
this has serious ramifications for the "nt domains for unix" project.
luke.
---------- Forwarded message ----------
Date: Tue, 23 Jun 1998 13:25:57 -0500
From: Simple Nomad <thegnome@NMRC.ORG>
To: NTBUGTRAQ@LISTSERV.NTBUGTRAQ.COM
Subject: SPAM: Important Legislative Alert
June 23rd, 1998 - The World Intellectual Property Organization treaty has
already passed the US Senate and is
2002 Apr 09
3
expressions on graphs
Hello,
I am trying to get a time derivative on a plot title. I prefer to have
it in the form \dot{s_i}, but \partial s_i/\partial t would be O.K. In
the graphics demo I cannot find either a dot or a partial equivalent.
Thanks,
John.
--
==========================================
John Janmaat
Department of Economics
Acadia University, Wolfville, NS, B0P 1X0
(902)585-1461
All opinions stated
2017 Aug 28
5
"Improvement with the R code"
Hello,
I am trying to implement a formula
aij= transition from state S_i to S_j/no of transition at state S_i
Code I have written is working with three state {1,2,3 }, but if the number
of states become={1,2,3,4,......n} then the code will not work, so can some
help me with this.
For and some rows of my data frame look like
2007 Feb 17
1
Constraint maximum (likelihood) using nlm
Hi,
I'm trying to find the maximum (likelihood) of a function. Therefore,
I'm trying to minimize the negative likelihood function:
# params: vector containing values of mu and sigma
# params[1] - mu, params[2]- sigma
# dat: matrix of data pairs y_i and s_i
# dat[,1] - column of y_i , dat[,2] column of s_i
negll <- function(params,dat,constant=0)
{
for(i in 1:length(dat[,1]))
{
2010 Feb 06
1
Canberra distance
Hi the list,
According to what I know, the Canberra distance between X et Y is : sum[
(|x_i - y_i|) / (|x_i|+|y_i|) ] (with | | denoting the function
'absolute value')
In the source code of the canberra distance in the file distance.c, we
find :
sum = fabs(x[i1] + x[i2]);
diff = fabs(x[i1] - x[i2]);
dev = diff/sum;
which correspond to the formula : sum[ (|x_i - y_i|) /
2010 Nov 28
1
faster base::sequence
Hello,
Based on yesterday's R-help thread (help: program efficiency), and
following Bill's suggestions, it appeared that sequence:
> sequence
function (nvec)
unlist(lapply(nvec, seq_len))
<environment: namespace:base>
could benefit from being written in C to avoid unnecessary memory
allocations.
I made this version using inline:
require( inline )
sequence_c <- local( {
2000 Oct 26
1
competing risks survival analysis
I will have data in the following form:
Time resp type stim type
300 a A
200 b A
155 a B
250 b B
80 c A
1000 d B
...
c is left censored observation; d is right censored
This sort of problem is discussed in Chap 9 of Cox & Oakes Analysis of
Survival Data under the name
2017 Aug 28
0
"Improvement with the R code"
Hi,
I think you overthought this one a little bit, I don't know if this is the
kind of code you are expecting but I came up with something like that:
generate_transition_matrix <- function(data, n_states) {
#To be sure I imagine you should check n_states is right at this point
transitions <- matrix(0, n_states, n_states)
#we could improve a little bit here because at
2006 Dec 08
1
MAXIMIZATION WITH CONSTRAINTS
Dear R users,
I?m a graduate students and in my master thesis I must
obtain the values of the parameters x_i which maximize this
Multinomial log?likelihood function
log(n!)-sum_{i=1]^4 log(n_i!)+sum_
{i=1}^4 n_i log(x_i)
under the following constraints:
a) sum_i x_i=1,
x_i>=0,
b) x_1<=x_2+x_3+x_4
c)x_2<=x_3+x_4
I have been using the
?ConstrOptim? R-function with the instructions
2007 May 08
5
Weighted least squares
Dear all,
I'm struggling with weighted least squares, where something that I had
assumed to be true appears not to be the case. Take the following
data set as an example:
df <- data.frame(x = runif(100, 0, 100))
df$y <- df$x + 1 + rnorm(100, sd=15)
I had expected that:
summary(lm(y ~ x, data=df, weights=rep(2, 100)))
summary(lm(y ~ x, data=rbind(df,df)))
would be equivalent, but
2009 Sep 11
2
[PATCH] generator.ml: Fix string list memory leak
Parsed string lists are allocated by malloc, but were never freed.
---
src/generator.ml | 16 +++++++++++++++-
1 files changed, 15 insertions(+), 1 deletions(-)
diff --git a/src/generator.ml b/src/generator.ml
index 7571f95..c72c329 100755
--- a/src/generator.ml
+++ b/src/generator.ml
@@ -6320,7 +6320,7 @@ and generate_fish_cmds () =
| OptString n
| FileIn n
|
2010 Sep 24
3
boundary check
Dear R,
I have a covariates matrix with 10 observations, e.g.
> X <- matrix(rnorm(50), 10, 5)
> X
[,1] [,2] [,3] [,4] [,5]
[1,] 0.24857135 0.30880745 -1.44118657 1.10229027 1.0526010
[2,] 1.24316806 0.36275370 -0.40096866 -0.24387888 -1.5324384
[3,] -0.33504014 0.42996246 0.03902479 -0.84778875 -2.4754644
[4,] 0.06710229 1.01950917
2007 Feb 01
3
Help with efficient double sum of max (X_i, Y_i) (X & Y vectors)
Greetings.
For R gurus this may be a no brainer, but I could not find pointers to
efficient computation of this beast in past help files.
Background - I wish to implement a Cramer-von Mises type test statistic
which involves double sums of max(X_i,Y_j) where X and Y are vectors of
differing length.
I am currently using ifelse pointwise in a vector, but have a nagging
suspicion that there is a
2001 Mar 05
1
Canberra dist and double zeros
Canberra distance is defined in function `dist' (standard library `mva') as
sum(|x_i - y_i| / |x_i + y_i|)
Obviously this is undefined for cases where both x_i and y_i are zeros. Since
double zeros are common in many data sets, this is a nuisance. In our field
(from which the distance is coming), it is customary to remove double zeros:
contribution to distance is zero when both x_i
2001 Mar 05
1
Canberra dist and double zeros
Canberra distance is defined in function `dist' (standard library `mva') as
sum(|x_i - y_i| / |x_i + y_i|)
Obviously this is undefined for cases where both x_i and y_i are zeros. Since
double zeros are common in many data sets, this is a nuisance. In our field
(from which the distance is coming), it is customary to remove double zeros:
contribution to distance is zero when both x_i
2011 Jan 21
2
ordering a vector
Hi,
is there a R function that order a matrix according to some criteria
based on the rows(or cols) of that matrix?
For example, let's say that my matrix S is composed by n rows S_1,
S_2,.., S_n and that I compute some real value g_i=g(S_i) for each
row.
Then I want to order this set of g_i (from smaller to bigger) and
order the correspondent row to the new position.
Is it possible (apart