Displaying 20 results from an estimated 300 matches similar to: "dist function suggestion"
2007 Sep 02
1
buglet in dist() ?
the first line of dist() says
if (!is.na(pmatch(method, "euclidian")))
shouldn't that be "euclidean" ?
---------------------
R version 2.5.1 (2007-06-27)
i486-pc-linux-gnu
locale:
2016 May 05
2
GSoC 2016 - Introduction
Hello,
Thanks James for the reply. That cleared a few things out. Apologies for
replying late because of exams going on.
I was going through the previous clustering API to understand how it worked
and it seems like the the approach for construction of the termlists which
are used for distance metrics use TF-IDF weighting with cosine similarity,
which is very similar to the approach I would need
2010 Jul 20
1
p-values pvclust maximum distance measure
Hi,
I am new to clustering and was wondering why pvclust using "maximum"
as distance measure nearly always results in p-values above 95%.
I wrote an example programme which demonstrates this effect. I
uploaded a PDF showing the results
Here is the code which produces the PDF file:
-------------------------------------------------------------------------------------
s <-
2001 Dec 13
2
k-means with euclidian distance but no coordinates
Hi,
I'm trying to build a thesaurus that will sensible values for rare words.
I suspect the best algorithm to use is k-means although I'm not sure about
that -- I would have preferred a k dimensional space with a binary cluster
in each dimension so a word can belong to 0..k clusters, but I digress...
I can measure the strength of correlation between words fairly easily by
counting
2005 Sep 12
4
Document clustering for R
I'm working on a project related to document clustering. I know that R
has clustering algorithms such as clara, but only supports two distance
metrics: euclidian and manhattan, which are not very useful for
clustering documents. I was wondering how easy it would be to extend the
clustering package in R to support other distance metrics, such as
cosine distance, or if there was an API for
2016 Jul 26
3
K MEANS clustering
Hello,
I've been working on the KMeans clustering algorithm recently and since the
past week, I have been stuck on a problem which I'm not able to find a
solution to.
Since we are representing documents as Tf-idf vectors, they are really
sparse vectors (a usual corpus can have around 5000 terms). So it gets
really difficult to represent these sparse vectors in a way that would be
2011 Apr 18
3
how to extract options for a function call
Hi, I'm having some difficulties formulating this question.
But what I want,
is to extract the options associated with a parameter for a function.
e.g.
method = c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN")
in the optim function.
So I would like to have a vector with
c("Nelder-Mead", "BFGS", "CG",
2006 Apr 03
2
about arguments in "bclust"
Hi All,
Just want to make sure, in function "bclust", do the following argument
only have one option?
argument "dist.method" has one option "Euclidian";
argument "hclust.method" has one option "average";
argument "base.method" has one option "kmeans".
Thank you!
[[alternative HTML version deleted]]
2016 Jun 09
2
2nd week progress
Hello devs,
I have filled out the repo link on TRAC as suggested. I'll also keep the
journal updated on TRAC from now on.
I am almost done with defining all the base classes required for the
clusterer and have started coding the euclidian distance metric. This
should be completed by tomorrow after which I'll be spending one day to
test and make sure everything functions as expected, so
2007 Apr 01
4
Abundance data ordination in R
Um texto embutido e sem conjunto de caracteres especificado associado...
Nome: n?o dispon?vel
Url: https://stat.ethz.ch/pipermail/r-help/attachments/20070401/33921c2a/attachment.pl
2016 Jul 27
2
K MEANS clustering
Hey Parth,
Thanks for the reply.
I am considering implementing a cosine distance metric too, along with
euclidian distance because of the dimensionality issue that comes in with
K-Means and euclidian distance metric.
That does help when we deal with sparse vectors for documents. The
particular problem I'm having is representing centroids in an efficient way.
For example, when we find the mean
2013 May 21
1
keep the centre fixed in K-means clustering
Dear R users
I have the matrix of the centres of some clusters, e.g. 20 clusters each
with 100 dimentions, so this matrix contains 20 rows * 100 columns numeric
values.
I have collected new data (each with 100 numeric values) and would like to
keep the above 20 centres fixed/'unmoved' whilst just see how my new data
fit in this grouping system, e.g. if the data is close to cluster 1
2008 Sep 16
1
Spatial join – optimizing code
Hi,
Few days ago I have asked about spatial join on the minimum distance between 2 sets of points with coordinates and attributes in 2 different data frames.
Simon Knapp sent code to do it when calculating distance on a sphere using lat, long coordinates and I've change his code to use Euclidian distances since my data had UTM coordinates.
Typically one data frame has around 30 000 points
2008 Oct 09
2
vectorization instead of using loop
Dear all,
I've sent this question 2 days ago and got response from Sarah. Thanks for
that. But unfortunately, it did not really solve our problem. The main issue
is that we want to use our own (manipulated) covariance matrix in the
calculation of the mahalanobis distance. Does anyone know how to vectorize
the below code instead of using a loop (which slows it down)?
I'd really appreciate
2005 Oct 06
1
Compare two distance matrices
Hi all,
I am trying to compare two distance matrices with R. I would like to
create a XY plot of these matrices and do some linear regression on
it. But, I am a bit new to R, so i have a few questions (I searched in
the documentation with no success).
The first problem is loading a distance matrix into R. This matrix is
the output of a the Phylip program Protdist and lookes like this:
5
2010 May 05
2
Using statistical test to distinguish two groups
Hi R friends,
I am posting this question even though I know that the nature of it is
closer to general stats than R. Please let me know if you are aware of
a list for general statistical questions:
I am looking for a simple method to distinguish two groups of data in
a long vector of numbers:
list <- c(1,2,3,2,3,2,3,4,3,2,3,4,3,2,400,340,3,2,4,5,6,4,3,6,4,5,3)
I would like to
2010 Jun 25
1
best way to plot a evolution in time
Hi everyone,
I have the following question:
given three objects let's say:
a <- c( 2 , 5, 15, 16)
b <- c(1 ,1, 8 , 8)
c <- c (10, 10 11 ,11)
m<-matrix(c(a,b,c),byrow=T,nrow=3)
rownames(m)<-c("gene a", 'gene b','gene c')
m
gene.dist<-dist(m,method='euclidian')
gene.dist
which is the best way to plot their evolution in time? shoul I use a
2020 Feb 22
2
The AnghaBench collection of compilable programs
Dear LLVMers,
we, at UFMG, have been building a large collection of compilable
benchmarks. Today, we have one million C files, mined from open-source
repositories, that compile into LLVM bytecodes (and from there to
object files). To ensure compilation, we perform type inference on the
C programs. Type inference lets us replace missing dependencies.
The benchmarks are available at:
2016 Mar 06
3
GSOC-2016 Project : Clustering of search results
On Sun, Mar 6, 2016 at 7:17 AM, James Aylett <james-xapian at tartarus.org>
wrote:
> On Sat, Mar 05, 2016 at 10:58:43PM +0530, Richhiey Thomas wrote:
>
> K-Means or something related certainly seems like a viable approach,
> so what you'll need to do is to come up with a proposal of how you'd
> implement this in Xapian (either with reference to the previous work,
>
2016 May 01
2
GSoC 2016 - Introduction
Before going ahead with the tests as you mentioned above, I would just like
to clarify a few higher level things that I am still in doubt about.
1) As discussed during the IRC interview, I was suggested about first
implementing a normal K-means clustering implementation and then adding on
the PSO module as a functionality that can be used to improve quality of
clustering for speed as a trade off.