similar to: how to group a large list of strings into categories based on string similarity?

Displaying 20 results from an estimated 100 matches similar to: "how to group a large list of strings into categories based on string similarity?"

2010 Apr 27
2
ShortRead with BWA
Dear folks, Please welcome a newbie both to R and the mailing list :). I am currently working on a sequencing project, and heard about R as well as some of its packages for next gen sequencing, and decided to give it a try. Starting with ShortRead, I found a document (http://www.bioconductor.org/packages/2.5/bioc/vignettes/ShortRead/inst/doc/ShortRead_and_HilbertVis.pdf) which does mention
2014 Nov 18
1
ShortRead::FastqStreamer and parallelization
Hi, I understand ShortRead::FastqStreamer will read chunks in parallel depending on the value of ShortRead:::.set_omp_threads I see this discussed here: https://stat.ethz.ch/pipermail/bioc-devel/2013-May/004355.html and nowhere else. It probably should be documented in ShortRead. Possibly this has already changed for I am using still R 3.1.0. I thought I'd check. Oh, and, in my
2009 Aug 27
1
R package install problem
Dear R-Help, I would be most grateful if you could inspect the attached install file. I would like to be able to use ShortRead to generate QA reports for Genome Analyzer output data. OS: Linux CentOS 5.3 on HP Proliant server . In process of installing "R" package after configuration flagging some missing modules-libraries the install process will not perform 'make' function.
2010 Oct 09
1
A competition to create a recommendation engine for R packages
Hello everyone. There is a new competition, outlined on the blog dataists<http://www.dataists.com/2010/10/using-data-tools-to-find-data-tools-the-yo-dawg-of-data-hacking/>, inviting us to analyse statistics of the use of R packages (collected from 52 R users), to create a R-package suggestion engine for ourselves. Since I noticed several bloggers already wrote about it (as I have detailed
2010 Sep 14
2
Multiple CPU HowTo in Linux?
Hello all, I upgraded my R workstation, and to my dismay, only one core appears to be used during intensive computation of a bioconductor function. What I have now is two dual-core Xeon 5160 CPUs and 10 GB RAM. When I fully load it, top reports about 25% user, 75% idle and 0.98 short-term load. The archives gave nothing helpful besides mention of snow. I thought of posting to HPC, but this system
2010 Jul 21
3
String processing - is there a better way
I have a two part question Part 1) I am trying to remove characters in a string based on the position of a key character in another string.? I have a solution that works but it requires a for-loop.? A vectorized way of doing this has alluded me.? CleanRead<-function(x,y) { ? if (!is.character(x)) ??? x <- as.character(x) ? if (!is.character(y)) ??? y <- as.character(y) ?
2010 Feb 24
1
build, data and vignettes
Based on some testing it seems to me that if I have a package with a dataset in /data a Sweave vignette in inst/doc (but no associated pdf file) the vignette loads the data in /data through data(dataset) and I do a R CMD build R will try to build the pdf version of the vignette, but will be unable to find the dataset in data because the package is not yet installed. However, if I do
2012 Oct 26
0
parallel::pvec FUN types differ when v is a list; code simplifications?
In pvec(list(1, 2), FUN, mc.cores=2) FUN sees integer() arguments whereas pvec(list(1, 2, 3), FUN, mc.cores=2) FUN sees list() arguments; the latter seems consistent with pvec's description. This came up in a complicated Bioconductor thread about generics and parallel evaluation https://stat.ethz.ch/pipermail/bioc-devel/2012-October/003745.html One relevant point is that a
2012 Oct 30
0
map similarity spatial autocorrelation in R
Hi, I have two global raster maps, each of the same variable but from different sources. The values range from 0 to 5 in whole numbers. Is there a statistical test in R that can quantify the similarity of the spatial patterns (i.e., highs and lows)? Thanks, -- View this message in context: http://r.789695.n4.nabble.com/map-similarity-spatial-autocorrelation-in-R-tp4647877.html Sent from the
2013 May 15
0
fast time series similarity (iSAX, UCR DTW, UCR ED) implementations for R?
Hello. I'm looking for a fast way to group by similarity many (5-10k) long (2-10k points) time series. Using PAM on distance matrix obtained via as.dist(1-abs(cor(data))) produces usable results but it's rather slow and doesn't catch slightly shifted time series. DTW implementation from package 'dtw' is orders of magnitude slower even with global window constraints which
2007 Sep 23
0
Beginners question about Percentage similarity in R?
I have been reading a paper whereby the authors took values from Sorensons dissimilarity index and values from a percentage similarity index and applies G-Testing to the table of values. This is carried out to assess the differences in spider faunas (Strattton and Uetz, 1979). I like the method but have been trying to work out what function in R to use to get the percentage similarity. I have
2005 Jun 06
1
Similarity between variables
Hi, I would like to know the similarity between variables, but I don't exactly how begin and as from what dataframe or matrix! I have a matrix where in row I have 'Good', 'Medium','Bad' and in columns I have my Criterions ! What function and package should I use? Thanks a lot Sabine --------------------------------- ils, photos et vidéos ! [[alternative
2006 Apr 23
1
converting similarity matrix formats
Dear all, I am using a program that generates similarity matrices in the following non-redundant pairwise format. a b 0.4 a c 0.5 a d 0.3 b c 0.9 b d 0.6 c d 0.2 matrix(c('a','a','a','b','b','c','b','c','d','c','d','d',.4,.5,.3,.9,.6,.2),byrow=F,nrow=6) I would like to convert this to a
2006 Aug 13
1
Gower Similarity Coefficient
I'm interested in clustering my data using the Gower Similarity Coefficient, and I was wondering if R is capable of using that metric Timothy Rye [[alternative HTML version deleted]]
2006 Dec 11
1
similarity test with R
>x=c(3.05176E-05,0.000457764,0.003204346,0.0138855,0.04165649,0.09164429,0.1527405,0.1963806,0.1963806,0.1527405,0.09164429,0.04165649,0.0138855,0.003204346,0.000457764,3.05176E-05) >y=c(0.0000306,0.0004566,0.0031985,0.0139083,0.0415539,0.0917678,0.1528134,0.1962831,0.1962994,0.1527996,0.0917336,0.0415497,0.0139308,0.0031917,0.0004529,0.0000301) I tried chisq.test, t-test, prop.test, etc,
2008 Jun 27
1
Similarity matching with probabilities
Hello, It's just a strange coincidence that someone posted just very recently a question about matching. I know there are several match function in the base package (such as match, pmatch, charmatch, and the gsub etc) but I can't seem to use them wisely to be able to get what I need. suppose I have the following strings: "tets" "estt" "rtes7"
2009 Oct 29
2
similarity measure for binary data
I am doing hierarchical clustering with cluster package.  I couldnot find similarity measures like matching coefficient, Jaccard coefficient and sokal and sneath. Could anyone please tell package with similarity measures for binary data? kind regards, Ms.Karunambigai M PhD Scholar Dept. of Biostatistics NIMHANS Bangalore India From cricket scores to your friends. Try the Yahoo! India
2010 Mar 26
1
similarity measure
hi all, I am doing hierarchical clustering using similarity measures for binary data using package ade4 and hclust function. For method=8 and method = 9 of dist.binary, I am getting Na values. Hence, hclust function is giving error as Error in hclust(d8, method = "ward") :   NA/NaN/Inf in foreign function call (arg 11). I think the fact that due to zero in the denominator of the
2011 Dec 04
1
similarity matrix
Hello R-users, I've got a file with individuals as colums and the clusters where they occur in as rows. And I wanted a similarity matrix which tells me how many times each individual occurs with another. My eventual goal is to make Venn-diagrams from the occurence of my individuals. So I've this: cluster ind1 ind2 ind3 etc. 1 0 1 2 2 3 0 1 3
2012 Apr 19
1
Compare String Similarity
Dear All, I need to estimate the level of similarity of two strings. For example: string1 <- c("depending","audience","research", "school"); string2 <- c("audience","push","drama","button","depending"); The words in string may occur in different order though. What function would you recommend to use