similar to: Document clustering for R

Displaying 20 results from an estimated 3000 matches similar to: "Document clustering for R"

2016 May 05
2
GSoC 2016 - Introduction
Hello, Thanks James for the reply. That cleared a few things out. Apologies for replying late because of exams going on. I was going through the previous clustering API to understand how it worked and it seems like the the approach for construction of the termlists which are used for distance metrics use TF-IDF weighting with cosine similarity, which is very similar to the approach I would need
1999 Jan 20
2
dist function suggestion
This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. ---559023410-162216788-916833047=:29339 Content-Type: TEXT/PLAIN; charset=US-ASCII On my R installation (0.62.4) there is no dist() function, so I attach one possibility. It provides
2016 Jul 27
2
K MEANS clustering
Hey Parth, Thanks for the reply. I am considering implementing a cosine distance metric too, along with euclidian distance because of the dimensionality issue that comes in with K-Means and euclidian distance metric. That does help when we deal with sparse vectors for documents. The particular problem I'm having is representing centroids in an efficient way. For example, when we find the mean
2016 Jul 26
3
K MEANS clustering
Hello, I've been working on the KMeans clustering algorithm recently and since the past week, I have been stuck on a problem which I'm not able to find a solution to. Since we are representing documents as Tf-idf vectors, they are really sparse vectors (a usual corpus can have around 5000 terms). So it gets really difficult to represent these sparse vectors in a way that would be
2007 Sep 02
1
buglet in dist() ?
the first line of dist() says if (!is.na(pmatch(method, "euclidian"))) shouldn't that be "euclidean" ? --------------------- R version 2.5.1 (2007-06-27) i486-pc-linux-gnu locale:
2006 Apr 03
2
about arguments in "bclust"
Hi All, Just want to make sure, in function "bclust", do the following argument only have one option? argument "dist.method" has one option "Euclidian"; argument "hclust.method" has one option "average"; argument "base.method" has one option "kmeans". Thank you! [[alternative HTML version deleted]]
2016 Mar 06
3
GSOC-2016 Project : Clustering of search results
On Sun, Mar 6, 2016 at 7:17 AM, James Aylett <james-xapian at tartarus.org> wrote: > On Sat, Mar 05, 2016 at 10:58:43PM +0530, Richhiey Thomas wrote: > > K-Means or something related certainly seems like a viable approach, > so what you'll need to do is to come up with a proposal of how you'd > implement this in Xapian (either with reference to the previous work, >
2009 Mar 29
1
[cluster package question] What is the "sum of the dissimilarities" in the pam command ?
Hello Martin Maechler and All, A simple question (I hope): How can I compute the "sum of the dissimilarities" that appears in the pam command (from the cluster package) ? Is it the "manhattan" distance (such as the one implemented by "dist") ? I am asking since I am running clustering on a dataset. I found 7 medoids with the pam command, and from it I have the
2016 May 01
2
GSoC 2016 - Introduction
Before going ahead with the tests as you mentioned above, I would just like to clarify a few higher level things that I am still in doubt about. 1) As discussed during the IRC interview, I was suggested about first implementing a normal K-means clustering implementation and then adding on the PSO module as a functionality that can be used to improve quality of clustering for speed as a trade off.
2016 Mar 05
2
GSOC-2016 Project : Clustering of search results
Hello devs, I am Richhiey Thomas, pursuing my third year of undergraduate studies in Computer Science from Mumbai University. I had gone through the project list for this year and the project idea based on clustering caught my attention. I spoke to Assem Chelli on IRC who guided me to the code and got me started. I started going through the code and have successfully built Xapian on my machine.
2011 Aug 08
3
Distance between a vector and matrix rows
I am trying to find the distance between a vector and each row of a dataframe. I am using the function "distancevector" in the package "hopach" as follows: mydata<-as.data.frame(matrix(c(1,1,1,1,0,1,1,1,1,0),nrow=2)) V1 V2 V3 V4 V5 1 1 1 0 1 1 2 1 1 1 1 0 vec <- c(1,1,1,1,1) d2<-distancevector(mydata,vec,d="euclid") The Euclidean distance
2012 May 31
3
Quadrat counting with spatstat
I have photographs of plots that look like so: http://r.789695.n4.nabble.com/file/n4631960/Untitled.jpg I need to divide it up so each circle has an equal area surrounding it. So into 20 equal segments, each of which contains a circle. Quadratcount is not sufficient because if I divide it up into 36 equal quadrats, some quadrats do not contain one of the circles. I'm not even sure how to
2007 Apr 01
4
Abundance data ordination in R
Um texto embutido e sem conjunto de caracteres especificado associado... Nome: n?o dispon?vel Url: https://stat.ethz.ch/pipermail/r-help/attachments/20070401/33921c2a/attachment.pl
2010 Jul 20
1
p-values pvclust maximum distance measure
Hi, I am new to clustering and was wondering why pvclust using "maximum" as distance measure nearly always results in p-values above 95%. I wrote an example programme which demonstrates this effect. I uploaded a PDF showing the results Here is the code which produces the PDF file: ------------------------------------------------------------------------------------- s <-
2001 Dec 13
2
k-means with euclidian distance but no coordinates
Hi, I'm trying to build a thesaurus that will sensible values for rare words. I suspect the best algorithm to use is k-means although I'm not sure about that -- I would have preferred a k dimensional space with a binary cluster in each dimension so a word can belong to 0..k clusters, but I digress... I can measure the strength of correlation between words fairly easily by counting
2006 Apr 07
2
cclust causes R to crash when using manhattan kmeans
Dear R users, When I run the following code, R crashes: require(cclust) x <- matrix(c(0,0,0,1.5,1,-1), ncol=2, byrow=TRUE) cclust(x, centers=x[2:3,], dist="manhattan", method="kmeans") While this works: cclust(x, centers=x[2:3,], dist="euclidean", method="kmeans") I'm posting this here because I am not sure if it is a bug. I've been searching
2012 Aug 30
2
self-defined distance function to be computed on matrix
Hello, I have a self-defined function to be computed on each column in a matrix. The basic idea is to ignore the elements that have value of 0 during computation. I should be able to write my own function but it could be computational expensive, so I'd love to ask if anyone may have suggestions on how to implement it more efficiently. Thanks in advance. For example, there are three
2011 Apr 18
3
how to extract options for a function call
Hi, I'm having some difficulties formulating this question. But what I want, is to extract the options associated with a parameter for a function. e.g. method = c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN") in the optim function. So I would like to have a vector with c("Nelder-Mead", "BFGS", "CG",
2012 Oct 08
1
Any better way of optimizing time for calculating distances in the mentioned scenario??
Dear All, I'm dealing with a case, where 'manhattan' distance of each of 100 vectors is calculated from 10000 other vectors. For achieving this, following 4 scenarios are tested: 1) scenario 1: > x<-read.table("query.vec") > v<-read.table("query.vec2") > d<-matrix(nrow=nrow(v),ncol=nrow(x)) > for (i in 1:nrow(v)){ + d[i,]<-
2008 Dec 17
1
bug (?!) in "pam()" clustering from fpc package ?
Hello all. I wish to run k-means with "manhattan" distance. Since this is not supported by the function "kmeans", I turned to the "pam" function in the "fpc" package. Yet, when I tried to have the algorithm run with different starting points, I found that pam ignores and keep on starting the algorithm from the same starting-points (medoids). For my