thr3ads.net - similar to: "Hints for Data Mining"

Displaying 20 results from an estimated 6000 matches similar to: "Hints for Data Mining"

2011 Sep 02

Hints for Data Clustering

Dear All, I will be confronted (relatively soon) with the following problem: given a set of known statistical indicators {s_i} , i=1,2...N for a N countries I would like to be able to do some data clustering i.e. determining the best way to partition the N countries according to their known properties, encoded by the {s_i} set of indicators for those countries. Some properties of these

lm with data=(means,sds,ns)

2004 Apr 18

lm with data=(means,sds,ns)

Hi Folks, I am dealing with data which have been presented as at each x_i, mean m_i of the y-values at x_i, sd s_i of the y-values at x_i number n_i of the y-values at x_i and I want to linearly regress y on x. There does not seem to be an option to 'lm' which can deal with such data directly, though the regression problem could be algebraically

Help for 3D Plotting Data on 'Irregular' Grid

2009 Oct 01

Help for 3D Plotting Data on 'Irregular' Grid

Dear All, Here is what I am trying to achieve: I would like to plot some data in 3D. Usually, one has a matrix of the kind y_1(x_1) , y_1(x_2).....y_1(x_i) y_2(x_1) , y_2(x_2).....y_2(x_i) ........................................... y_n(x_1) , y_n(x_2)......y_n(x_i) where e.g. y_2(x_1) is the value of y at time 2 at point x_1 (see that the grid in x is the same for the y values at all times).

Question on estimating standard errors with noisy signals using the quantreg package

2011 Oct 31

Question on estimating standard errors with noisy signals using the quantreg package

Dear all, My question might be more of a statistics question than a question on R, although it's on how to apply the 'quantreg' package. Please accept my apologies if you believe I am strongly misusing this list. To be very brief, the problem is that I have data on only a random draw, not all of doctors' patients. I am interested in the, say, median number of patients of

SPAM: Important Legislative Alert (fwd)

1998 Jun 24

SPAM: Important Legislative Alert (fwd)

this has serious ramifications for the "nt domains for unix" project. luke. ---------- Forwarded message ---------- Date: Tue, 23 Jun 1998 13:25:57 -0500 From: Simple Nomad <thegnome@NMRC.ORG> To: NTBUGTRAQ@LISTSERV.NTBUGTRAQ.COM Subject: SPAM: Important Legislative Alert June 23rd, 1998 - The World Intellectual Property Organization treaty has already passed the US Senate and is

expressions on graphs

2002 Apr 09

expressions on graphs

Hello, I am trying to get a time derivative on a plot title. I prefer to have it in the form \dot{s_i}, but \partial s_i/\partial t would be O.K. In the graphics demo I cannot find either a dot or a partial equivalent. Thanks, John. -- ========================================== John Janmaat Department of Economics Acadia University, Wolfville, NS, B0P 1X0 (902)585-1461 All opinions stated

"Improvement with the R code"

2017 Aug 28

"Improvement with the R code"

Hello, I am trying to implement a formula aij= transition from state S_i to S_j/no of transition at state S_i Code I have written is working with three state {1,2,3 }, but if the number of states become={1,2,3,4,......n} then the code will not work, so can some help me with this. For and some rows of my data frame look like

Constraint maximum (likelihood) using nlm

2007 Feb 17

Constraint maximum (likelihood) using nlm

Hi, I'm trying to find the maximum (likelihood) of a function. Therefore, I'm trying to minimize the negative likelihood function: # params: vector containing values of mu and sigma # params[1] - mu, params[2]- sigma # dat: matrix of data pairs y_i and s_i # dat[,1] - column of y_i , dat[,2] column of s_i negll <- function(params,dat,constant=0) { for(i in 1:length(dat[,1])) {

Canberra distance

2010 Feb 06

Canberra distance

Hi the list, According to what I know, the Canberra distance between X et Y is : sum[ (|x_i - y_i|) / (|x_i|+|y_i|) ] (with | | denoting the function 'absolute value') In the source code of the canberra distance in the file distance.c, we find : sum = fabs(x[i1] + x[i2]); diff = fabs(x[i1] - x[i2]); dev = diff/sum; which correspond to the formula : sum[ (|x_i - y_i|) /

faster base::sequence

2010 Nov 28

faster base::sequence

Hello, Based on yesterday's R-help thread (help: program efficiency), and following Bill's suggestions, it appeared that sequence: > sequence function (nvec) unlist(lapply(nvec, seq_len)) <environment: namespace:base> could benefit from being written in C to avoid unnecessary memory allocations. I made this version using inline: require( inline ) sequence_c <- local( {

competing risks survival analysis

2000 Oct 26

competing risks survival analysis

I will have data in the following form: Time resp type stim type 300 a A 200 b A 155 a B 250 b B 80 c A 1000 d B ... c is left censored observation; d is right censored This sort of problem is discussed in Chap 9 of Cox & Oakes Analysis of Survival Data under the name

"Improvement with the R code"

2017 Aug 28

"Improvement with the R code"

Hi, I think you overthought this one a little bit, I don't know if this is the kind of code you are expecting but I came up with something like that: generate_transition_matrix <- function(data, n_states) { #To be sure I imagine you should check n_states is right at this point transitions <- matrix(0, n_states, n_states) #we could improve a little bit here because at

MAXIMIZATION WITH CONSTRAINTS

2006 Dec 08

MAXIMIZATION WITH CONSTRAINTS

Dear R users, I?m a graduate students and in my master thesis I must obtain the values of the parameters x_i which maximize this Multinomial log?likelihood function log(n!)-sum_{i=1]^4 log(n_i!)+sum_ {i=1}^4 n_i log(x_i) under the following constraints: a) sum_i x_i=1, x_i>=0, b) x_1<=x_2+x_3+x_4 c)x_2<=x_3+x_4 I have been using the ?ConstrOptim? R-function with the instructions

Weighted least squares

2007 May 08

Weighted least squares

Dear all, I'm struggling with weighted least squares, where something that I had assumed to be true appears not to be the case. Take the following data set as an example: df <- data.frame(x = runif(100, 0, 100)) df$y <- df$x + 1 + rnorm(100, sd=15) I had expected that: summary(lm(y ~ x, data=df, weights=rep(2, 100))) summary(lm(y ~ x, data=rbind(df,df))) would be equivalent, but

[PATCH] generator.ml: Fix string list memory leak

2009 Sep 11

[PATCH] generator.ml: Fix string list memory leak

Parsed string lists are allocated by malloc, but were never freed. --- src/generator.ml | 16 +++++++++++++++- 1 files changed, 15 insertions(+), 1 deletions(-) diff --git a/src/generator.ml b/src/generator.ml index 7571f95..c72c329 100755 --- a/src/generator.ml +++ b/src/generator.ml @@ -6320,7 +6320,7 @@ and generate_fish_cmds () = | OptString n | FileIn n |

boundary check

2010 Sep 24

boundary check

Dear R, I have a covariates matrix with 10 observations, e.g. > X <- matrix(rnorm(50), 10, 5) > X [,1] [,2] [,3] [,4] [,5] [1,] 0.24857135 0.30880745 -1.44118657 1.10229027 1.0526010 [2,] 1.24316806 0.36275370 -0.40096866 -0.24387888 -1.5324384 [3,] -0.33504014 0.42996246 0.03902479 -0.84778875 -2.4754644 [4,] 0.06710229 1.01950917

Help with efficient double sum of max (X_i, Y_i) (X & Y vectors)

2007 Feb 01

Help with efficient double sum of max (X_i, Y_i) (X & Y vectors)

Greetings. For R gurus this may be a no brainer, but I could not find pointers to efficient computation of this beast in past help files. Background - I wish to implement a Cramer-von Mises type test statistic which involves double sums of max(X_i,Y_j) where X and Y are vectors of differing length. I am currently using ifelse pointwise in a vector, but have a nagging suspicion that there is a

Canberra dist and double zeros

2001 Mar 05

Canberra dist and double zeros

Canberra distance is defined in function `dist' (standard library `mva') as sum(|x_i - y_i| / |x_i + y_i|) Obviously this is undefined for cases where both x_i and y_i are zeros. Since double zeros are common in many data sets, this is a nuisance. In our field (from which the distance is coming), it is customary to remove double zeros: contribution to distance is zero when both x_i

Canberra dist and double zeros

2001 Mar 05

Canberra dist and double zeros

ordering a vector

2011 Jan 21

ordering a vector

Hi, is there a R function that order a matrix according to some criteria based on the rows(or cols) of that matrix? For example, let's say that my matrix S is composed by n rows S_1, S_2,.., S_n and that I compute some real value g_i=g(S_i) for each row. Then I want to order this set of g_i (from smaller to bigger) and order the correspondent row to the new position. Is it possible (apart

similar to: Hints for Data Mining