thr3ads.net - similar to: "dist function suggestion"

Displaying 20 results from an estimated 300 matches similar to: "dist function suggestion"

2007 Sep 02

buglet in dist() ?

the first line of dist() says if (!is.na(pmatch(method, "euclidian"))) shouldn't that be "euclidean" ? --------------------- R version 2.5.1 (2007-06-27) i486-pc-linux-gnu locale:

GSoC 2016 - Introduction

2016 May 05

GSoC 2016 - Introduction

Hello, Thanks James for the reply. That cleared a few things out. Apologies for replying late because of exams going on. I was going through the previous clustering API to understand how it worked and it seems like the the approach for construction of the termlists which are used for distance metrics use TF-IDF weighting with cosine similarity, which is very similar to the approach I would need

p-values pvclust maximum distance measure

2010 Jul 20

p-values pvclust maximum distance measure

Hi, I am new to clustering and was wondering why pvclust using "maximum" as distance measure nearly always results in p-values above 95%. I wrote an example programme which demonstrates this effect. I uploaded a PDF showing the results Here is the code which produces the PDF file: ------------------------------------------------------------------------------------- s <-

k-means with euclidian distance but no coordinates

2001 Dec 13

k-means with euclidian distance but no coordinates

Hi, I'm trying to build a thesaurus that will sensible values for rare words. I suspect the best algorithm to use is k-means although I'm not sure about that -- I would have preferred a k dimensional space with a binary cluster in each dimension so a word can belong to 0..k clusters, but I digress... I can measure the strength of correlation between words fairly easily by counting

Document clustering for R

2005 Sep 12

Document clustering for R

I'm working on a project related to document clustering. I know that R has clustering algorithms such as clara, but only supports two distance metrics: euclidian and manhattan, which are not very useful for clustering documents. I was wondering how easy it would be to extend the clustering package in R to support other distance metrics, such as cosine distance, or if there was an API for

K MEANS clustering

2016 Jul 26

K MEANS clustering

Hello, I've been working on the KMeans clustering algorithm recently and since the past week, I have been stuck on a problem which I'm not able to find a solution to. Since we are representing documents as Tf-idf vectors, they are really sparse vectors (a usual corpus can have around 5000 terms). So it gets really difficult to represent these sparse vectors in a way that would be

how to extract options for a function call

2011 Apr 18

how to extract options for a function call

Hi, I'm having some difficulties formulating this question. But what I want, is to extract the options associated with a parameter for a function. e.g. method = c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN") in the optim function. So I would like to have a vector with c("Nelder-Mead", "BFGS", "CG",

about arguments in "bclust"

2006 Apr 03

about arguments in "bclust"

Hi All, Just want to make sure, in function "bclust", do the following argument only have one option? argument "dist.method" has one option "Euclidian"; argument "hclust.method" has one option "average"; argument "base.method" has one option "kmeans". Thank you! [[alternative HTML version deleted]]

2nd week progress

2016 Jun 09

2nd week progress

Hello devs, I have filled out the repo link on TRAC as suggested. I'll also keep the journal updated on TRAC from now on. I am almost done with defining all the base classes required for the clusterer and have started coding the euclidian distance metric. This should be completed by tomorrow after which I'll be spending one day to test and make sure everything functions as expected, so

Abundance data ordination in R

2007 Apr 01

Abundance data ordination in R

Um texto embutido e sem conjunto de caracteres especificado associado... Nome: n?o dispon?vel Url: https://stat.ethz.ch/pipermail/r-help/attachments/20070401/33921c2a/attachment.pl

K MEANS clustering

2016 Jul 27

K MEANS clustering

Hey Parth, Thanks for the reply. I am considering implementing a cosine distance metric too, along with euclidian distance because of the dimensionality issue that comes in with K-Means and euclidian distance metric. That does help when we deal with sparse vectors for documents. The particular problem I'm having is representing centroids in an efficient way. For example, when we find the mean

keep the centre fixed in K-means clustering

2013 May 21

keep the centre fixed in K-means clustering

Dear R users I have the matrix of the centres of some clusters, e.g. 20 clusters each with 100 dimentions, so this matrix contains 20 rows * 100 columns numeric values. I have collected new data (each with 100 numeric values) and would like to keep the above 20 centres fixed/'unmoved' whilst just see how my new data fit in this grouping system, e.g. if the data is close to cluster 1

Spatial join – optimizing code

2008 Sep 16

Spatial join – optimizing code

Hi, Few days ago I have asked about spatial join on the minimum distance between 2 sets of points with coordinates and attributes in 2 different data frames. Simon Knapp sent code to do it when calculating distance on a sphere using lat, long coordinates and I've change his code to use Euclidian distances since my data had UTM coordinates. Typically one data frame has around 30 000 points

vectorization instead of using loop

2008 Oct 09

vectorization instead of using loop

Dear all, I've sent this question 2 days ago and got response from Sarah. Thanks for that. But unfortunately, it did not really solve our problem. The main issue is that we want to use our own (manipulated) covariance matrix in the calculation of the mahalanobis distance. Does anyone know how to vectorize the below code instead of using a loop (which slows it down)? I'd really appreciate

Compare two distance matrices

2005 Oct 06

Compare two distance matrices

Hi all, I am trying to compare two distance matrices with R. I would like to create a XY plot of these matrices and do some linear regression on it. But, I am a bit new to R, so i have a few questions (I searched in the documentation with no success). The first problem is loading a distance matrix into R. This matrix is the output of a the Phylip program Protdist and lookes like this: 5

Using statistical test to distinguish two groups

2010 May 05

Using statistical test to distinguish two groups

Hi R friends, I am posting this question even though I know that the nature of it is closer to general stats than R. Please let me know if you are aware of a list for general statistical questions: I am looking for a simple method to distinguish two groups of data in a long vector of numbers: list <- c(1,2,3,2,3,2,3,4,3,2,3,4,3,2,400,340,3,2,4,5,6,4,3,6,4,5,3) I would like to

best way to plot a evolution in time

2010 Jun 25

best way to plot a evolution in time

Hi everyone, I have the following question: given three objects let's say: a <- c( 2 , 5, 15, 16) b <- c(1 ,1, 8 , 8) c <- c (10, 10 11 ,11) m<-matrix(c(a,b,c),byrow=T,nrow=3) rownames(m)<-c("gene a", 'gene b','gene c') m gene.dist<-dist(m,method='euclidian') gene.dist which is the best way to plot their evolution in time? shoul I use a

The AnghaBench collection of compilable programs

2020 Feb 22

The AnghaBench collection of compilable programs

Dear LLVMers, we, at UFMG, have been building a large collection of compilable benchmarks. Today, we have one million C files, mined from open-source repositories, that compile into LLVM bytecodes (and from there to object files). To ensure compilation, we perform type inference on the C programs. Type inference lets us replace missing dependencies. The benchmarks are available at:

GSOC-2016 Project : Clustering of search results

2016 Mar 06

GSOC-2016 Project : Clustering of search results

On Sun, Mar 6, 2016 at 7:17 AM, James Aylett <james-xapian at tartarus.org> wrote: > On Sat, Mar 05, 2016 at 10:58:43PM +0530, Richhiey Thomas wrote: > > K-Means or something related certainly seems like a viable approach, > so what you'll need to do is to come up with a proposal of how you'd > implement this in Xapian (either with reference to the previous work, >

GSoC 2016 - Introduction

2016 May 01

GSoC 2016 - Introduction

Before going ahead with the tests as you mentioned above, I would just like to clarify a few higher level things that I am still in doubt about. 1) As discussed during the IRC interview, I was suggested about first implementing a normal K-means clustering implementation and then adding on the PSO module as a functionality that can be used to improve quality of clustering for speed as a trade off.

similar to: dist function suggestion