Displaying 20 results from an estimated 45 matches for "jaccard".
2010 Dec 28
3
Jaccard dissimilarity matrix for PCA
Hi
I have a large dataset, containing a wide range of binary variables.
I would like first of all to compute a jaccard matrix, then do a PCA on this
matrix, so that I finally can do a hierarchical clustering on the principal
components.
My problem is, that I don't know how to compute the jaccard dissimilarity
matrix in R? Which package to use, and so on...
Can anybody help me?
Alternatively I'm search for...
2011 May 17
1
simprof test using jaccard distance
Dear All,
I would like to use the simprof function (clustsig package) but the available distances do not include Jaccard distance, which is the most appropriate for pres/abs community data. Here is the core of the function:
> simprof
function (data, num.expected = 1000, num.simulated = 999, method.cluster = "average",
method.distance = "euclidean", method.transform = "identity",...
2009 Mar 25
1
how to calcualte Jaccard Coefficient
Does anyone have a good method for calculating Jaccard coefficients now that the dissimilarity() function is no longer an option?
Wen Gu
John Jay College of Criminal Justice445 West 59 StreetNew York, NY 10029
wgu@gc.cuny.edu
_________________________________________________________________
Express your personality in color! Preview and select th...
2012 Dec 06
1
clustering of binary data
...function, by using method.hclust="ward"
and method.dist="binary". Altoghether it works (clusters and significance
obtained). However, I'm not convinced by the distance matrix. Association
between variables are indeed different from results obtained in PAST by
using Ward on a Jaccard matrix (that should be ok for binary data).
Moreover, when I try to obtain a Jaccard matrix in R from my data, by using
the Vegan package
mydistance<-vegdist(t(data),method="jaccard")
I receive the following error message:
Error in rowSums(x, na.rm = TRUE) : 'x' must be num...
2007 Jun 25
2
manipulate a matrix
I have read everything I can find on how to manipulate a results matrix in R and I have to admit I'm stumped. I have set up a process to extract a dataset from ArcGIS to compute a similarity index (Jaccards) in Vegan. The dataset is fairly simple, but large, and consists of rows = sample area, and columns = elements. I've been able to view the results in R, but I want to get the results out to a database and a matrix that is 6000-rows x 6000-columns can be very difficult to manipulate in Window...
2009 Nov 03
1
hierarchical clustering with Jaccard index
hi,
I want to do hierarchical clustering with Jaccord index. I tried to do with vegan package for finding index and hierarchical clustering with hclust function. While doing clustering it is showing an error message as "invalid distance method". I would be grateful if anyone tells how to rectify the error.
Thanks in advance,
kind regards,
Ms.Karunambigai M
PhD Scholar
Dept. of
2005 Nov 10
2
error in rowSums:'x' must be numeric
Dear All,
It's Eszter again from Hungary. I could not solve my problem form
yesterday, so I still have to ask your help.
I have a binary dataset of vegetation samples and species as a comma
separated file. I would like to calculate the Jaccard distance of the
dataset. I have the following error message:
Error in rowSums(x, prod(dn), p, na.rm) : 'x' must be numeric
In addition: Warning message:
results may be meaningless because input data have negative entries
in: vegdist(t2, method = "jaccard", binary = FALSE, diag...
2013 Feb 08
1
vegdist Error en double(N * (N - 1)/2) : tama?o del vector especificado es muy grande
...o bello <caro.bello58@gmail.com>
To: r-help@r-project.org
Cc:
Date: Fri, 8 Feb 2013 15:18:40 -0800 (PST)
Subject: vegdist Error en double(N * (N - 1)/2) : tamaño del vector
especificado es muy grande
Hi
I have some problems with the vegdist function. I want to calculate a
distance matrix with jaccard. I have binary data.
The problem is that i have a matrix of 138037 rows (sites) and 89 columns
(species). my script is:
rm(list=ls(all=T))
gc() ##para borrar todo lo que quede oculto en memoria
memory.limit(size = 100000) # it gives 1 Tera from HDD in case ram
memory is over
DF...
2013 Feb 18
1
questions hash functions
...5 1 0 0 0
1. How is possible to ompute the minhash signature for each column if
we use the following
three hash functions: h1(x) = 2x + 1 mod 6; h2(x) = 3x + 2 mod 6;
h3(x) = 5x + 2 mod 6.
2. Which of these hash functions are true permutations?
3.How close are the estimated Jaccard similarities for the six pairs of columns
to the true Jaccard similarities?
Thank you!
Tania
2008 Jul 18
1
manipulate a matrix2
Building upon Jim's answer below (Thanks Jim, that helped a lot), I need
to pickup where this thread left off. I'm using Vegan to calculate the
Jaccard's Index and the Row.Names and column names are represented in my
matrix as seen here.
[,3] [,5] [,6] [,9] [,11]
[3,] 0 6 11 16 21
[5,] 2 0 12 17 22
[6,] 3 8 0 18 23
[9,] 4 9 14 0 24
[11,] 5 10 15 20 0
When I use the...
2008 Apr 10
1
adonis (vegan package) and subsetted factors
...y two of them. So I started with:
> CoastNear = subset(gel_data, Habitat != "I")
The resulting data.frame has three levels for Habitat, but only two of
those levels have any records. Then I run:
> adonis(CoastNear[,5:118]~Habitat, data = CoastNear,permutations=1000,
+ method='jaccard')
Call:
adonis(formula = CoastNear[, 5:118] ~ Habitat, data = CoastNear,
permutations = 1000, method = "jaccard")
Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
Habitat 2.0000000 0.0092966 0.0046483 2.0549327 0.0707 0.005
Residuals 54.0000000 0.12214...
2007 Mar 02
0
Dice dissimilarity output and 'phylo' function in R
...s:
NULL
Rooted; includes branch lengths".
So I guess this explains why the consensus function
does not work.
Another thing I noticed in the output from the
'dissimilarity' function is that when I compared the
distances computed in R with that from NTSYS or SAS,
for example dice and jaccard coefficients I realised
that the dice distances are very different while the
jaccard distances are the same with those from these
other softwares.
The codes I used for a small example are shown below:
samptest4<- scan (file = "samp-test4.txt")
samptest4<- matrix(data = samptest4...
2007 Apr 22
2
distance method in kmeans
...sing k-means . As the regular "kmeans" available from stats package in R does'nt provide the option to change the distance method. I was wondering there is any package available to specify type of distance measure to be used in k means clustering in R. Especially distances like "Jaccard" which is good for binary data.
Thanks chandra
---------------------------------
[[alternative HTML version deleted]]
2009 Oct 29
2
similarity measure for binary data
I am doing hierarchical clustering with cluster package. I couldnot find similarity measures like matching coefficient, Jaccard coefficient and sokal and sneath. Could anyone please tell package with similarity measures for binary data?
kind regards,
Ms.Karunambigai M
PhD Scholar
Dept. of Biostatistics
NIMHANS
Bangalore
India
From cricket scores to your friends. Try the Yahoo! India Homepage! http://in.yahoo.com/tr...
2013 Jul 18
1
binary distance measure of the "dist" function in the "stats" package
Dear all:
I want to ask question about "binary" distance measure. As far as I
know, there are many binary distance measures,eg, binary Jarcad distance,
binary euclidean distance, and binary Bray-Curtis distance,etc. It is even
more confusing because many have more than one name. So , I wan to know
what the definite name of the binary distance measure of the "dist"
function
2004 May 11
1
stability measures for heirarchical clustering
...tstrapping for
clustering (using sample and generating a consesus tree with a web based
tool CONSENSE) but i wondered if there have been any advances on the
"bootstrapping clustering" front?
In terms of finding stability in sub sections of the clustering I'm thinking
of modifying the jaccard function from prabclus to look at pairwise
similarities in different cluster partitions of sub-samples of the data,
with high similarity being indicative of stability.
I wondered if anyone has already looked at stability measures for clustering
(particularly thos which interface with hclust), and...
2005 Nov 09
2
how to convert strings back to values?
...transpose the dataset, the original values become strings
(instead of 0,1,0,0,1 I have "0","1","0","0","1"). The distance matrix
cannot be counted from the transposed dataset, I have 2 error
messages:
<Warning in vegdist(tdf1, method = "jaccard", binary = FALSE, diag =
FALSE, : results may be meaningless because input data have
negative entries>
<Error in rowSums(x, prod(dn), p, na.rm) : 'x' must be numeric>
I do not understand the first, since I have only 1 and 0 in the dataset. I
guess I have the second becau...
2015 Dec 23
2
Cannot allocate vector of size
...0.2Mb
2) He eliminado todo lo que no es necesario de la función beta.pair dejándola en:library (betapart) # para que cargue la función betapart.core()
beta.pair <- function(x, index.family="sorensen"){ # test for a valid index index.family <- match.arg(index.family, c('jaccard','sorensen')) # test for pre-existing betapart objects if (! inherits(x, "betapart")){ x <- betapart.core(x) } # run the analysis given the index switch(index.family, sorensen = { beta.sim &...
2024 Nov 25
1
Problemas usando paquete textreuse
...y lo quiero utilizar para comparar dos
archivos pdf.
Me ha sido imposible cargar los archivos para utilizar las funciones
TextReuseCorpus() o TextReuseTextDocument().
En la documentación del paquete los archivos los cargan desde
¿Alguien sabe cómo se hace?
He conseguido calcular la similitud de jaccard utilizando este paquete,
pero para ello he empleado el siguiente código.
library(pdftools)
library(textreuse)
text1 <- pdf_text("uno.pdf")
text2 <- pdf_text("dos.pdf")
full_text1 <- paste(text1, collapse = " ")
full_text2 <- paste(text2, collapse = &...
2003 Aug 12
1
classification with quantitative variables
Hi all,
I want to conduct a cluster analysis with quantitative variables.
More precisely, it concerns binary and non-ordered categorical
variables. For such data, various
similarity measures have been proposed, such as the Jaccard index or the
simple matching index.
So, is there a package such as mva or multiv in the case of quantitative
variables?
Could you indicate me reviews, papers or technical reports dealing
with this problem?
Regards,
Olivier
--
-------------------------------------------------------------
M...