Displaying 20 results from an estimated 10000 matches similar to: "terms weight access"
2004 Dec 14
1
stopwords
Hi!
I would like to use the lists of stopwords provided with Xapian. Are
there some standard way to remove stopwords automatically, or should I
implement it mysel in the indexer?
Regards,
Georges Dupret
2012 Nov 17
4
survfit & number of variables != number of variable names
This works ok:
> cox = coxph(surv ~ bucket*(today + accor + both) + activity, data = data)
> fit = survfit(cox, newdata=data[1:100,])
but using strata leads to problems:
> cox.s = coxph(surv ~ bucket*(today + accor + both) + strata(activity),
> data = data)
> fit.s = survfit(cox.s, newdata=data[1:100,])
Error in model.frame.default(data = data[1:100, ], formula = ~bucket + :
2007 Sep 12
2
k-means clustering
Dear list, first apologies for this is not strictly an R question but
a theoretical one.
I have read that use of k-means clustering assumes sphericity of data
distribution. Can anyone explain me what this means? My statistical
background is too poor. Is it another kind of distribution, like
gaussian or binomial? What does it happen if the distribution is not
spherical? Could you give me an
2007 Apr 22
2
distance method in kmeans
I am trying to cluster some binary data using k-means . As the regular "kmeans" available from stats package in R does'nt provide the option to change the distance method. I was wondering there is any package available to specify type of distance measure to be used in k means clustering in R. Especially distances like "Jaccard" which is good for binary data.
2013 Jul 17
5
Why last doesn't return an ActiveRecord::Relation
Hello,
Sorry if this has been still answered, I haven''t found nothing on it. I
would love to know why ActiveRecord::Base#last doesn''t return an
ActiveRecord::Relation just like all or where since an
ActiveRecord::Relation can act more or less like an array (as specified here<https://github.com/rails/rails/commit/0a6833b6f701c8c8febadfe2f45e25df29493602>
)?
Thanks, have
2012 May 02
1
coxph reference hazard rate
Hi,
In the following results I interpret exp(coef) as the factor that multiplies
the base hazard rate if the corresponding variable is TRUE. For example,
when the bucket is ks008 and fidelity <= 3, then the rate, compared to the
base rate h_0(t), is h(t) = 0.200 h_0(t). My question is then, to what case
does the base hazard rate correspond to? I would expect the reference to be
the first
2005 Mar 31
2
Using kmeans given cluster centroids and data with NAs
Hello,
I have used the functions agnes and cutree to cluster my data (4977
objects x 22 variables) into 8 clusters. I would like to refine the
solution using a k-means or similar algorithm, setting the initial
cluster centres as the group means from agnes. However my data matrix
has NA's in it and the function kmeans does not appear to accept this?
> dim(centres)
[1] 8 22
> dim(data)
2008 Jun 18
3
Cluster on both categorical and numerical data
Hello there. Is there any function in R that can do cluster on a set of
data that has both categorical and numerical variables? thanks.
siangli
2013 May 21
1
keep the centre fixed in K-means clustering
Dear R users
I have the matrix of the centres of some clusters, e.g. 20 clusters each
with 100 dimentions, so this matrix contains 20 rows * 100 columns numeric
values.
I have collected new data (each with 100 numeric values) and would like to
keep the above 20 centres fixed/'unmoved' whilst just see how my new data
fit in this grouping system, e.g. if the data is close to cluster 1
2003 May 24
1
predicting fuzzy cluster membership
Dear all,
I'm trying to obtain a fuzzy clustering with fanny from the cluster package,
using a given set of data. That worked just fine.
I have another separate sample of data from the same problem. For each case in
this new sample I would like to know their membership coefficients with
respect to the clustering obtained with the first dataset. In effect I want
to have a kind of prediction
2001 Sep 06
2
Array as time series?
Dear R-helpers,
I have 4-dimensional atmospheric data (x,y,z,t), which I want to analyse
on spatio-temporal diversities.
As far as I understand there only exists the possibility to construct
time series as two-dimensional matrices (mts).
For the moment, I hold it in different objects:
1. a four-dimensional array for the spatial related analyses
2. a two-dimensional mts timeserie, which was
2016 Mar 07
2
GSOC-2016 Project : Clustering of search results
On Mon, Mar 07, 2016 at 01:36:43AM +0530, Richhiey Thomas wrote:
> My questions are:
> 1) Can you direct me on how to convert this raw idea into a proposal in
> context to Xapian with more detail? What areas do I focus on?
Our GSoC guide has an application template
<https://trac.xapian.org/wiki/GSoCApplicationTemplate> which you
should use to structure your proposal. It has some
2007 Dec 05
1
Information criteria for kmeans
Hello,
how is, for example, the Schwarz criterion is defined for kmeans? It should
be something like:
k <- 2
vars <- 4
nobs <- 100
dat <- rbind(matrix(rnorm(nobs, sd = 0.3), ncol = vars),
matrix(rnorm(nobs, mean = 1, sd = 0.3), ncol = vars))
colnames(dat) <- paste("var",1:4)
(cl <- kmeans(dat, k))
schwarz <- sum(cl$withinss)+ vars*k*log(nobs)
Thanks
2004 May 28
6
distance in the function kmeans
Hi,
I want to know which distance is using in the function kmeans
and if we can change this distance.
Indeed, in the function pam, we can put a distance matrix in
parameter (by the line "pam<-pam(dist(matrixdata),k=7)" ) but
we can't do it in the function kmeans, we have to put the
matrix of data directly ...
Thanks in advance,
Nicolas BOUGET
2013 Mar 19
1
Cluster analysis on weighted survey data with continuous and categorical variables
I am trying to perform cluster analysis on survey data where each respondent has answered several questions, some of which have categorical answers ("blue" "pink" "green" etc) and some of which have scale answers (rating from 1 to 10 etc).My problem is that certain age groups were over-sampled and I need to weight the data collected in order to accurately reflect the
2016 Apr 04
2
Using final sample weight in survey package
I have the final sample weight (expansion factor) from a socieconomic
survey. I don't know the exact design used in the study ( (probably is a
stratified two-stage design).
To illustrate my problem I will use the next dataset which have a sample
weight (but the design is not specified) and incorporate the design with
svydesign and create some bootstrap replicates in order to be able to
2005 Apr 22
1
algorithm used in k-mean clustering
Hi,
I have used the kmean fucntion in R to produce some results for my analysis.
I like to know the specific underlying algorithm used for the implementation
of the function kmean in R. I tried looking for some documents but could not
find any.
I obtained the kmean result for k ranging from 2 to 10. When i did this
initally it worked perfectly. When i tried running again i get the error
2016 Mar 06
3
GSOC-2016 Project : Clustering of search results
On Sun, Mar 6, 2016 at 7:17 AM, James Aylett <james-xapian at tartarus.org>
wrote:
> On Sat, Mar 05, 2016 at 10:58:43PM +0530, Richhiey Thomas wrote:
>
> K-Means or something related certainly seems like a viable approach,
> so what you'll need to do is to come up with a proposal of how you'd
> implement this in Xapian (either with reference to the previous work,
>
2016 Apr 04
0
Using final sample weight in survey package
hi, probably not.. if your survey dataset has a complex design (like
clusters/strata), you need to include them in the `svydesign` call.
coercing an incorrect survey design into a replicate-weighted design will
not fix the problem of failing to account for the sampling strategy
On Mon, Apr 4, 2016 at 12:01 AM, Jos? Fernando Zea <jfzeac at gmail.com> wrote:
> I have the final sample
2005 Jun 16
1
Survey - Cluster Sampling
Dear WizaRds,
I am struggling to compute correctly a cluster sampling design. I want
to do one stage clustering with different parametric changes:
Let M be the total number of clusters in the population, and m the
number sampled. Let N be the total of elements in the population and n
the number sampled. y are the values sampled. This is my example data:
clus1 <-