search for: jaccard

Displaying 20 results from an estimated 45 matches for "jaccard".

2010 Dec 28
3
Jaccard dissimilarity matrix for PCA
Hi I have a large dataset, containing a wide range of binary variables. I would like first of all to compute a jaccard matrix, then do a PCA on this matrix, so that I finally can do a hierarchical clustering on the principal components. My problem is, that I don't know how to compute the jaccard dissimilarity matrix in R? Which package to use, and so on... Can anybody help me? Alternatively I'm search for...
2011 May 17
1
simprof test using jaccard distance
Dear All, I would like to use the simprof function (clustsig package) but the available distances do not include Jaccard distance, which is the most appropriate for pres/abs community data. Here is the core of the function: > simprof function (data, num.expected = 1000, num.simulated = 999, method.cluster = "average", method.distance = "euclidean", method.transform = "identity",...
2009 Mar 25
1
how to calcualte Jaccard Coefficient
Does anyone have a good method for calculating Jaccard coefficients now that the dissimilarity() function is no longer an option? Wen Gu John Jay College of Criminal Justice445 West 59 StreetNew York, NY 10029 wgu@gc.cuny.edu _________________________________________________________________ Express your personality in color! Preview and select th...
2012 Dec 06
1
clustering of binary data
...function, by using method.hclust="ward" and method.dist="binary". Altoghether it works (clusters and significance obtained). However, I'm not convinced by the distance matrix. Association between variables are indeed different from results obtained in PAST by using Ward on a Jaccard matrix (that should be ok for binary data). Moreover, when I try to obtain a Jaccard matrix in R from my data, by using the Vegan package mydistance<-vegdist(t(data),method="jaccard") I receive the following error message: Error in rowSums(x, na.rm = TRUE) : 'x' must be num...
2007 Jun 25
2
manipulate a matrix
I have read everything I can find on how to manipulate a results matrix in R and I have to admit I'm stumped. I have set up a process to extract a dataset from ArcGIS to compute a similarity index (Jaccards) in Vegan. The dataset is fairly simple, but large, and consists of rows = sample area, and columns = elements. I've been able to view the results in R, but I want to get the results out to a database and a matrix that is 6000-rows x 6000-columns can be very difficult to manipulate in Window...
2009 Nov 03
1
hierarchical clustering with Jaccard index
hi, I want to do hierarchical clustering with Jaccord index. I tried to do with vegan package for finding index and hierarchical clustering with hclust function. While doing clustering it is showing an error message as "invalid distance method". I would be grateful if anyone tells how to rectify the error. Thanks in advance,   kind regards, Ms.Karunambigai M PhD Scholar Dept. of
2005 Nov 10
2
error in rowSums:'x' must be numeric
Dear All, It's Eszter again from Hungary. I could not solve my problem form yesterday, so I still have to ask your help. I have a binary dataset of vegetation samples and species as a comma separated file. I would like to calculate the Jaccard distance of the dataset. I have the following error message: Error in rowSums(x, prod(dn), p, na.rm) : 'x' must be numeric In addition: Warning message: results may be meaningless because input data have negative entries in: vegdist(t2, method = "jaccard", binary = FALSE, diag...
2013 Feb 08
1
vegdist Error en double(N * (N - 1)/2) : tama?o del vector especificado es muy grande
...o bello <caro.bello58@gmail.com> To: r-help@r-project.org Cc: Date: Fri, 8 Feb 2013 15:18:40 -0800 (PST) Subject: vegdist Error en double(N * (N - 1)/2) : tamaño del vector especificado es muy grande Hi I have some problems with the vegdist function. I want to calculate a distance matrix with jaccard. I have binary data. The problem is that i have a matrix of 138037 rows (sites) and 89 columns (species). my script is: rm(list=ls(all=T)) gc() ##para borrar todo lo que quede oculto en memoria memory.limit(size = 100000) # it gives 1 Tera from HDD in case ram memory is over DF...
2013 Feb 18
1
questions hash functions
...5 1 0 0 0 1. How is possible to ompute the minhash signature for each column if we use the following three hash functions: h1(x) = 2x + 1 mod 6; h2(x) = 3x + 2 mod 6; h3(x) = 5x + 2 mod 6. 2. Which of these hash functions are true permutations? 3.How close are the estimated Jaccard similarities for the six pairs of columns to the true Jaccard similarities? Thank you! Tania
2008 Jul 18
1
manipulate a matrix2
Building upon Jim's answer below (Thanks Jim, that helped a lot), I need to pickup where this thread left off. I'm using Vegan to calculate the Jaccard's Index and the Row.Names and column names are represented in my matrix as seen here. [,3] [,5] [,6] [,9] [,11] [3,] 0 6 11 16 21 [5,] 2 0 12 17 22 [6,] 3 8 0 18 23 [9,] 4 9 14 0 24 [11,] 5 10 15 20 0 When I use the...
2008 Apr 10
1
adonis (vegan package) and subsetted factors
...y two of them. So I started with: > CoastNear = subset(gel_data, Habitat != "I") The resulting data.frame has three levels for Habitat, but only two of those levels have any records. Then I run: > adonis(CoastNear[,5:118]~Habitat, data = CoastNear,permutations=1000, + method='jaccard') Call: adonis(formula = CoastNear[, 5:118] ~ Habitat, data = CoastNear, permutations = 1000, method = "jaccard") Df SumsOfSqs MeanSqs F.Model R2 Pr(>F) Habitat 2.0000000 0.0092966 0.0046483 2.0549327 0.0707 0.005 Residuals 54.0000000 0.12214...
2007 Mar 02
0
Dice dissimilarity output and 'phylo' function in R
...s: NULL Rooted; includes branch lengths". So I guess this explains why the consensus function does not work. Another thing I noticed in the output from the 'dissimilarity' function is that when I compared the distances computed in R with that from NTSYS or SAS, for example dice and jaccard coefficients I realised that the dice distances are very different while the jaccard distances are the same with those from these other softwares. The codes I used for a small example are shown below: samptest4<- scan (file = "samp-test4.txt") samptest4<- matrix(data = samptest4...
2007 Apr 22
2
distance method in kmeans
...sing k-means . As the regular "kmeans" available from stats package in R does'nt provide the option to change the distance method. I was wondering there is any package available to specify type of distance measure to be used in k means clustering in R. Especially distances like "Jaccard" which is good for binary data. Thanks chandra --------------------------------- [[alternative HTML version deleted]]
2009 Oct 29
2
similarity measure for binary data
I am doing hierarchical clustering with cluster package.  I couldnot find similarity measures like matching coefficient, Jaccard coefficient and sokal and sneath. Could anyone please tell package with similarity measures for binary data? kind regards, Ms.Karunambigai M PhD Scholar Dept. of Biostatistics NIMHANS Bangalore India From cricket scores to your friends. Try the Yahoo! India Homepage! http://in.yahoo.com/tr...
2013 Jul 18
1
binary distance measure of the "dist" function in the "stats" package
Dear all: I want to ask question about "binary" distance measure. As far as I know, there are many binary distance measures,eg, binary Jarcad distance, binary euclidean distance, and binary Bray-Curtis distance,etc. It is even more confusing because many have more than one name. So , I wan to know what the definite name of the binary distance measure of the "dist" function
2004 May 11
1
stability measures for heirarchical clustering
...tstrapping for clustering (using sample and generating a consesus tree with a web based tool CONSENSE) but i wondered if there have been any advances on the "bootstrapping clustering" front? In terms of finding stability in sub sections of the clustering I'm thinking of modifying the jaccard function from prabclus to look at pairwise similarities in different cluster partitions of sub-samples of the data, with high similarity being indicative of stability. I wondered if anyone has already looked at stability measures for clustering (particularly thos which interface with hclust), and...
2005 Nov 09
2
how to convert strings back to values?
...transpose the dataset, the original values become strings (instead of 0,1,0,0,1 I have "0","1","0","0","1"). The distance matrix cannot be counted from the transposed dataset, I have 2 error messages: <Warning in vegdist(tdf1, method = "jaccard", binary = FALSE, diag = FALSE, : results may be meaningless because input data have negative entries> <Error in rowSums(x, prod(dn), p, na.rm) : 'x' must be numeric> I do not understand the first, since I have only 1 and 0 in the dataset. I guess I have the second becau...
2015 Dec 23
2
Cannot allocate vector of size
...0.2Mb 2) He eliminado todo lo que no es necesario de la función beta.pair dejándola en:library (betapart) # para que cargue la función betapart.core() beta.pair <- function(x, index.family="sorensen"){ # test for a valid index index.family <- match.arg(index.family, c('jaccard','sorensen')) # test for pre-existing betapart objects if (! inherits(x, "betapart")){ x <- betapart.core(x) } # run the analysis given the index switch(index.family, sorensen = { beta.sim &...
2024 Nov 25
1
Problemas usando paquete textreuse
...y lo quiero utilizar para comparar dos archivos pdf. Me ha sido imposible cargar los archivos para utilizar las funciones TextReuseCorpus() o TextReuseTextDocument(). En la documentación del paquete los archivos los cargan desde ¿Alguien sabe cómo se hace? He conseguido calcular la similitud de jaccard utilizando este paquete, pero para ello he empleado el siguiente código. library(pdftools) library(textreuse) text1 <- pdf_text("uno.pdf") text2 <- pdf_text("dos.pdf") full_text1 <- paste(text1, collapse = " ") full_text2 <- paste(text2, collapse = &...
2003 Aug 12
1
classification with quantitative variables
Hi all, I want to conduct a cluster analysis with quantitative variables. More precisely, it concerns binary and non-ordered categorical variables. For such data, various similarity measures have been proposed, such as the Jaccard index or the simple matching index. So, is there a package such as mva or multiv in the case of quantitative variables? Could you indicate me reviews, papers or technical reports dealing with this problem? Regards, Olivier -- ------------------------------------------------------------- M...