similar to: terms weight access

Displaying 20 results from an estimated 10000 matches similar to: "terms weight access"

2004 Dec 14
1
stopwords
Hi! I would like to use the lists of stopwords provided with Xapian. Are there some standard way to remove stopwords automatically, or should I implement it mysel in the indexer? Regards, Georges Dupret
2012 Nov 17
4
survfit & number of variables != number of variable names
This works ok: > cox = coxph(surv ~ bucket*(today + accor + both) + activity, data = data) > fit = survfit(cox, newdata=data[1:100,]) but using strata leads to problems: > cox.s = coxph(surv ~ bucket*(today + accor + both) + strata(activity), > data = data) > fit.s = survfit(cox.s, newdata=data[1:100,]) Error in model.frame.default(data = data[1:100, ], formula = ~bucket + :
2007 Sep 12
2
k-means clustering
Dear list, first apologies for this is not strictly an R question but a theoretical one. I have read that use of k-means clustering assumes sphericity of data distribution. Can anyone explain me what this means? My statistical background is too poor. Is it another kind of distribution, like gaussian or binomial? What does it happen if the distribution is not spherical? Could you give me an
2007 Apr 22
2
distance method in kmeans
I am trying to cluster some binary data using k-means . As the regular "kmeans" available from stats package in R does'nt provide the option to change the distance method. I was wondering there is any package available to specify type of distance measure to be used in k means clustering in R. Especially distances like "Jaccard" which is good for binary data.
2013 Jul 17
5
Why last doesn't return an ActiveRecord::Relation
Hello, Sorry if this has been still answered, I haven''t found nothing on it. I would love to know why ActiveRecord::Base#last doesn''t return an ActiveRecord::Relation just like all or where since an ActiveRecord::Relation can act more or less like an array (as specified here<https://github.com/rails/rails/commit/0a6833b6f701c8c8febadfe2f45e25df29493602> )? Thanks, have
2012 May 02
1
coxph reference hazard rate
Hi, In the following results I interpret exp(coef) as the factor that multiplies the base hazard rate if the corresponding variable is TRUE. For example, when the bucket is ks008 and fidelity <= 3, then the rate, compared to the base rate h_0(t), is h(t) = 0.200 h_0(t). My question is then, to what case does the base hazard rate correspond to? I would expect the reference to be the first
2005 Mar 31
2
Using kmeans given cluster centroids and data with NAs
Hello, I have used the functions agnes and cutree to cluster my data (4977 objects x 22 variables) into 8 clusters. I would like to refine the solution using a k-means or similar algorithm, setting the initial cluster centres as the group means from agnes. However my data matrix has NA's in it and the function kmeans does not appear to accept this? > dim(centres) [1] 8 22 > dim(data)
2008 Jun 18
3
Cluster on both categorical and numerical data
Hello there. Is there any function in R that can do cluster on a set of data that has both categorical and numerical variables? thanks. siangli
2013 May 21
1
keep the centre fixed in K-means clustering
Dear R users I have the matrix of the centres of some clusters, e.g. 20 clusters each with 100 dimentions, so this matrix contains 20 rows * 100 columns numeric values. I have collected new data (each with 100 numeric values) and would like to keep the above 20 centres fixed/'unmoved' whilst just see how my new data fit in this grouping system, e.g. if the data is close to cluster 1
2003 May 24
1
predicting fuzzy cluster membership
Dear all, I'm trying to obtain a fuzzy clustering with fanny from the cluster package, using a given set of data. That worked just fine. I have another separate sample of data from the same problem. For each case in this new sample I would like to know their membership coefficients with respect to the clustering obtained with the first dataset. In effect I want to have a kind of prediction
2001 Sep 06
2
Array as time series?
Dear R-helpers, I have 4-dimensional atmospheric data (x,y,z,t), which I want to analyse on spatio-temporal diversities. As far as I understand there only exists the possibility to construct time series as two-dimensional matrices (mts). For the moment, I hold it in different objects: 1. a four-dimensional array for the spatial related analyses 2. a two-dimensional mts timeserie, which was
2016 Mar 07
2
GSOC-2016 Project : Clustering of search results
On Mon, Mar 07, 2016 at 01:36:43AM +0530, Richhiey Thomas wrote: > My questions are: > 1) Can you direct me on how to convert this raw idea into a proposal in > context to Xapian with more detail? What areas do I focus on? Our GSoC guide has an application template <https://trac.xapian.org/wiki/GSoCApplicationTemplate> which you should use to structure your proposal. It has some
2007 Dec 05
1
Information criteria for kmeans
Hello, how is, for example, the Schwarz criterion is defined for kmeans? It should be something like: k <- 2 vars <- 4 nobs <- 100 dat <- rbind(matrix(rnorm(nobs, sd = 0.3), ncol = vars), matrix(rnorm(nobs, mean = 1, sd = 0.3), ncol = vars)) colnames(dat) <- paste("var",1:4) (cl <- kmeans(dat, k)) schwarz <- sum(cl$withinss)+ vars*k*log(nobs) Thanks
2004 May 28
6
distance in the function kmeans
Hi, I want to know which distance is using in the function kmeans and if we can change this distance. Indeed, in the function pam, we can put a distance matrix in parameter (by the line "pam<-pam(dist(matrixdata),k=7)" ) but we can't do it in the function kmeans, we have to put the matrix of data directly ... Thanks in advance, Nicolas BOUGET
2013 Mar 19
1
Cluster analysis on weighted survey data with continuous and categorical variables
I am trying to perform cluster analysis on survey data where each respondent has answered several questions, some of which have categorical answers ("blue" "pink" "green" etc) and some of which have scale answers (rating from 1 to 10 etc).My problem is that certain age groups were over-sampled and I need to weight the data collected in order to accurately reflect the
2016 Apr 04
2
Using final sample weight in survey package
I have the final sample weight (expansion factor) from a socieconomic survey. I don't know the exact design used in the study ( (probably is a stratified two-stage design). To illustrate my problem I will use the next dataset which have a sample weight (but the design is not specified) and incorporate the design with svydesign and create some bootstrap replicates in order to be able to
2005 Apr 22
1
algorithm used in k-mean clustering
Hi, I have used the kmean fucntion in R to produce some results for my analysis. I like to know the specific underlying algorithm used for the implementation of the function kmean in R. I tried looking for some documents but could not find any. I obtained the kmean result for k ranging from 2 to 10. When i did this initally it worked perfectly. When i tried running again i get the error
2016 Mar 06
3
GSOC-2016 Project : Clustering of search results
On Sun, Mar 6, 2016 at 7:17 AM, James Aylett <james-xapian at tartarus.org> wrote: > On Sat, Mar 05, 2016 at 10:58:43PM +0530, Richhiey Thomas wrote: > > K-Means or something related certainly seems like a viable approach, > so what you'll need to do is to come up with a proposal of how you'd > implement this in Xapian (either with reference to the previous work, >
2016 Apr 04
0
Using final sample weight in survey package
hi, probably not.. if your survey dataset has a complex design (like clusters/strata), you need to include them in the `svydesign` call. coercing an incorrect survey design into a replicate-weighted design will not fix the problem of failing to account for the sampling strategy On Mon, Apr 4, 2016 at 12:01 AM, Jos? Fernando Zea <jfzeac at gmail.com> wrote: > I have the final sample
2005 Jun 16
1
Survey - Cluster Sampling
Dear WizaRds, I am struggling to compute correctly a cluster sampling design. I want to do one stage clustering with different parametric changes: Let M be the total number of clusters in the population, and m the number sampled. Let N be the total of elements in the population and n the number sampled. y are the values sampled. This is my example data: clus1 <-