thr3ads.net - search: "jaccard"

Displaying 20 results from an estimated 45 matches for "jaccard".

2010 Dec 28

Jaccard dissimilarity matrix for PCA

Hi I have a large dataset, containing a wide range of binary variables. I would like first of all to compute a jaccard matrix, then do a PCA on this matrix, so that I finally can do a hierarchical clustering on the principal components. My problem is, that I don't know how to compute the jaccard dissimilarity matrix in R? Which package to use, and so on... Can anybody help me? Alternatively I'm search for...

simprof test using jaccard distance

2011 May 17

simprof test using jaccard distance

Dear All, I would like to use the simprof function (clustsig package) but the available distances do not include Jaccard distance, which is the most appropriate for pres/abs community data. Here is the core of the function: > simprof function (data, num.expected = 1000, num.simulated = 999, method.cluster = "average", method.distance = "euclidean", method.transform = "identity",...

how to calcualte Jaccard Coefficient

2009 Mar 25

how to calcualte Jaccard Coefficient

Does anyone have a good method for calculating Jaccard coefficients now that the dissimilarity() function is no longer an option? Wen Gu John Jay College of Criminal Justice445 West 59 StreetNew York, NY 10029 wgu@gc.cuny.edu _________________________________________________________________ Express your personality in color! Preview and select th...

clustering of binary data

2012 Dec 06

clustering of binary data

...function, by using method.hclust="ward" and method.dist="binary". Altoghether it works (clusters and significance obtained). However, I'm not convinced by the distance matrix. Association between variables are indeed different from results obtained in PAST by using Ward on a Jaccard matrix (that should be ok for binary data). Moreover, when I try to obtain a Jaccard matrix in R from my data, by using the Vegan package mydistance<-vegdist(t(data),method="jaccard") I receive the following error message: Error in rowSums(x, na.rm = TRUE) : 'x' must be num...

manipulate a matrix

2007 Jun 25

manipulate a matrix

I have read everything I can find on how to manipulate a results matrix in R and I have to admit I'm stumped. I have set up a process to extract a dataset from ArcGIS to compute a similarity index (Jaccards) in Vegan. The dataset is fairly simple, but large, and consists of rows = sample area, and columns = elements. I've been able to view the results in R, but I want to get the results out to a database and a matrix that is 6000-rows x 6000-columns can be very difficult to manipulate in Window...

hierarchical clustering with Jaccard index

2009 Nov 03

hierarchical clustering with Jaccard index

hi, I want to do hierarchical clustering with Jaccord index. I tried to do with vegan package for finding index and hierarchical clustering with hclust function. While doing clustering it is showing an error message as "invalid distance method". I would be grateful if anyone tells how to rectify the error. Thanks in advance, kind regards, Ms.Karunambigai M PhD Scholar Dept. of

error in rowSums:'x' must be numeric

2005 Nov 10

error in rowSums:'x' must be numeric

Dear All, It's Eszter again from Hungary. I could not solve my problem form yesterday, so I still have to ask your help. I have a binary dataset of vegetation samples and species as a comma separated file. I would like to calculate the Jaccard distance of the dataset. I have the following error message: Error in rowSums(x, prod(dn), p, na.rm) : 'x' must be numeric In addition: Warning message: results may be meaningless because input data have negative entries in: vegdist(t2, method = "jaccard", binary = FALSE, diag...

vegdist Error en double(N * (N - 1)/2) : tama?o del vector especificado es muy grande

2013 Feb 08

vegdist Error en double(N * (N - 1)/2) : tama?o del vector especificado es muy grande

...o bello <caro.bello58@gmail.com> To: r-help@r-project.org Cc: Date: Fri, 8 Feb 2013 15:18:40 -0800 (PST) Subject: vegdist Error en double(N * (N - 1)/2) : tamaño del vector especificado es muy grande Hi I have some problems with the vegdist function. I want to calculate a distance matrix with jaccard. I have binary data. The problem is that i have a matrix of 138037 rows (sites) and 89 columns (species). my script is: rm(list=ls(all=T)) gc() ##para borrar todo lo que quede oculto en memoria memory.limit(size = 100000) # it gives 1 Tera from HDD in case ram memory is over DF...

questions hash functions

2013 Feb 18

questions hash functions

...5 1 0 0 0 1. How is possible to ompute the minhash signature for each column if we use the following three hash functions: h1(x) = 2x + 1 mod 6; h2(x) = 3x + 2 mod 6; h3(x) = 5x + 2 mod 6. 2. Which of these hash functions are true permutations? 3.How close are the estimated Jaccard similarities for the six pairs of columns to the true Jaccard similarities? Thank you! Tania

manipulate a matrix2

2008 Jul 18

manipulate a matrix2

Building upon Jim's answer below (Thanks Jim, that helped a lot), I need to pickup where this thread left off. I'm using Vegan to calculate the Jaccard's Index and the Row.Names and column names are represented in my matrix as seen here. [,3] [,5] [,6] [,9] [,11] [3,] 0 6 11 16 21 [5,] 2 0 12 17 22 [6,] 3 8 0 18 23 [9,] 4 9 14 0 24 [11,] 5 10 15 20 0 When I use the...

adonis (vegan package) and subsetted factors

2008 Apr 10

adonis (vegan package) and subsetted factors

...y two of them. So I started with: > CoastNear = subset(gel_data, Habitat != "I") The resulting data.frame has three levels for Habitat, but only two of those levels have any records. Then I run: > adonis(CoastNear[,5:118]~Habitat, data = CoastNear,permutations=1000, + method='jaccard') Call: adonis(formula = CoastNear[, 5:118] ~ Habitat, data = CoastNear, permutations = 1000, method = "jaccard") Df SumsOfSqs MeanSqs F.Model R2 Pr(>F) Habitat 2.0000000 0.0092966 0.0046483 2.0549327 0.0707 0.005 Residuals 54.0000000 0.12214...

2007 Mar 02

Dice dissimilarity output and 'phylo' function in R

...s: NULL Rooted; includes branch lengths". So I guess this explains why the consensus function does not work. Another thing I noticed in the output from the 'dissimilarity' function is that when I compared the distances computed in R with that from NTSYS or SAS, for example dice and jaccard coefficients I realised that the dice distances are very different while the jaccard distances are the same with those from these other softwares. The codes I used for a small example are shown below: samptest4<- scan (file = "samp-test4.txt") samptest4<- matrix(data = samptest4...

distance method in kmeans

2007 Apr 22

distance method in kmeans

...sing k-means . As the regular "kmeans" available from stats package in R does'nt provide the option to change the distance method. I was wondering there is any package available to specify type of distance measure to be used in k means clustering in R. Especially distances like "Jaccard" which is good for binary data. Thanks chandra --------------------------------- [[alternative HTML version deleted]]

similarity measure for binary data

2009 Oct 29

similarity measure for binary data

I am doing hierarchical clustering with cluster package. I couldnot find similarity measures like matching coefficient, Jaccard coefficient and sokal and sneath. Could anyone please tell package with similarity measures for binary data? kind regards, Ms.Karunambigai M PhD Scholar Dept. of Biostatistics NIMHANS Bangalore India From cricket scores to your friends. Try the Yahoo! India Homepage! http://in.yahoo.com/tr...

binary distance measure of the "dist" function in the "stats" package

2013 Jul 18

binary distance measure of the "dist" function in the "stats" package

Dear all: I want to ask question about "binary" distance measure. As far as I know, there are many binary distance measures,eg, binary Jarcad distance, binary euclidean distance, and binary Bray-Curtis distance,etc. It is even more confusing because many have more than one name. So , I wan to know what the definite name of the binary distance measure of the "dist" function

stability measures for heirarchical clustering

2004 May 11

stability measures for heirarchical clustering

...tstrapping for clustering (using sample and generating a consesus tree with a web based tool CONSENSE) but i wondered if there have been any advances on the "bootstrapping clustering" front? In terms of finding stability in sub sections of the clustering I'm thinking of modifying the jaccard function from prabclus to look at pairwise similarities in different cluster partitions of sub-samples of the data, with high similarity being indicative of stability. I wondered if anyone has already looked at stability measures for clustering (particularly thos which interface with hclust), and...

how to convert strings back to values?

2005 Nov 09

how to convert strings back to values?

...transpose the dataset, the original values become strings (instead of 0,1,0,0,1 I have "0","1","0","0","1"). The distance matrix cannot be counted from the transposed dataset, I have 2 error messages: <Warning in vegdist(tdf1, method = "jaccard", binary = FALSE, diag = FALSE, : results may be meaningless because input data have negative entries> <Error in rowSums(x, prod(dn), p, na.rm) : 'x' must be numeric> I do not understand the first, since I have only 1 and 0 in the dataset. I guess I have the second becau...

Cannot allocate vector of size

2015 Dec 23

Cannot allocate vector of size

...0.2Mb 2) He eliminado todo lo que no es necesario de la función beta.pair dejándola en:library (betapart) # para que cargue la función betapart.core() beta.pair <- function(x, index.family="sorensen"){ # test for a valid index index.family <- match.arg(index.family, c('jaccard','sorensen')) # test for pre-existing betapart objects if (! inherits(x, "betapart")){ x <- betapart.core(x) } # run the analysis given the index switch(index.family, sorensen = { beta.sim &...

Problemas usando paquete textreuse

2024 Nov 25

Problemas usando paquete textreuse

...y lo quiero utilizar para comparar dos archivos pdf. Me ha sido imposible cargar los archivos para utilizar las funciones TextReuseCorpus() o TextReuseTextDocument(). En la documentación del paquete los archivos los cargan desde ¿Alguien sabe cómo se hace? He conseguido calcular la similitud de jaccard utilizando este paquete, pero para ello he empleado el siguiente código. library(pdftools) library(textreuse) text1 <- pdf_text("uno.pdf") text2 <- pdf_text("dos.pdf") full_text1 <- paste(text1, collapse = " ") full_text2 <- paste(text2, collapse = &...

classification with quantitative variables

2003 Aug 12

classification with quantitative variables

Hi all, I want to conduct a cluster analysis with quantitative variables. More precisely, it concerns binary and non-ordered categorical variables. For such data, various similarity measures have been proposed, such as the Jaccard index or the simple matching index. So, is there a package such as mva or multiv in the case of quantitative variables? Could you indicate me reviews, papers or technical reports dealing with this problem? Regards, Olivier -- ------------------------------------------------------------- M...

search for: jaccard