thr3ads.net - similar to: "how to group a large list of strings into categories based on string similarity?"

Displaying 20 results from an estimated 100 matches similar to: "how to group a large list of strings into categories based on string similarity?"

ShortRead with BWA

2010 Apr 27

ShortRead with BWA

Dear folks, Please welcome a newbie both to R and the mailing list :). I am currently working on a sequencing project, and heard about R as well as some of its packages for next gen sequencing, and decided to give it a try. Starting with ShortRead, I found a document (http://www.bioconductor.org/packages/2.5/bioc/vignettes/ShortRead/inst/doc/ShortRead_and_HilbertVis.pdf) which does mention

ShortRead::FastqStreamer and parallelization

2014 Nov 18

ShortRead::FastqStreamer and parallelization

Hi, I understand ShortRead::FastqStreamer will read chunks in parallel depending on the value of ShortRead:::.set_omp_threads I see this discussed here: https://stat.ethz.ch/pipermail/bioc-devel/2013-May/004355.html and nowhere else. It probably should be documented in ShortRead. Possibly this has already changed for I am using still R 3.1.0. I thought I'd check. Oh, and, in my

R package install problem

2009 Aug 27

R package install problem

Dear R-Help, I would be most grateful if you could inspect the attached install file. I would like to be able to use ShortRead to generate QA reports for Genome Analyzer output data. OS: Linux CentOS 5.3 on HP Proliant server . In process of installing "R" package after configuration flagging some missing modules-libraries the install process will not perform 'make' function.

A competition to create a recommendation engine for R packages

2010 Oct 09

A competition to create a recommendation engine for R packages

Hello everyone. There is a new competition, outlined on the blog dataists<http://www.dataists.com/2010/10/using-data-tools-to-find-data-tools-the-yo-dawg-of-data-hacking/>, inviting us to analyse statistics of the use of R packages (collected from 52 R users), to create a R-package suggestion engine for ourselves. Since I noticed several bloggers already wrote about it (as I have detailed

Multiple CPU HowTo in Linux?

2010 Sep 14

Multiple CPU HowTo in Linux?

Hello all, I upgraded my R workstation, and to my dismay, only one core appears to be used during intensive computation of a bioconductor function. What I have now is two dual-core Xeon 5160 CPUs and 10 GB RAM. When I fully load it, top reports about 25% user, 75% idle and 0.98 short-term load. The archives gave nothing helpful besides mention of snow. I thought of posting to HPC, but this system

String processing - is there a better way

2010 Jul 21

String processing - is there a better way

I have a two part question Part 1) I am trying to remove characters in a string based on the position of a key character in another string.? I have a solution that works but it requires a for-loop.? A vectorized way of doing this has alluded me.? CleanRead<-function(x,y) { ? if (!is.character(x)) ??? x <- as.character(x) ? if (!is.character(y)) ??? y <- as.character(y) ?

build, data and vignettes

2010 Feb 24

build, data and vignettes

Based on some testing it seems to me that if I have a package with a dataset in /data a Sweave vignette in inst/doc (but no associated pdf file) the vignette loads the data in /data through data(dataset) and I do a R CMD build R will try to build the pdf version of the vignette, but will be unable to find the dataset in data because the package is not yet installed. However, if I do

parallel::pvec FUN types differ when v is a list; code simplifications?

2012 Oct 26

parallel::pvec FUN types differ when v is a list; code simplifications?

In pvec(list(1, 2), FUN, mc.cores=2) FUN sees integer() arguments whereas pvec(list(1, 2, 3), FUN, mc.cores=2) FUN sees list() arguments; the latter seems consistent with pvec's description. This came up in a complicated Bioconductor thread about generics and parallel evaluation https://stat.ethz.ch/pipermail/bioc-devel/2012-October/003745.html One relevant point is that a

2012 Oct 30

map similarity spatial autocorrelation in R

Hi, I have two global raster maps, each of the same variable but from different sources. The values range from 0 to 5 in whole numbers. Is there a statistical test in R that can quantify the similarity of the spatial patterns (i.e., highs and lows)? Thanks, -- View this message in context: http://r.789695.n4.nabble.com/map-similarity-spatial-autocorrelation-in-R-tp4647877.html Sent from the

fast time series similarity (iSAX, UCR DTW, UCR ED) implementations for R?

2013 May 15

fast time series similarity (iSAX, UCR DTW, UCR ED) implementations for R?

Hello. I'm looking for a fast way to group by similarity many (5-10k) long (2-10k points) time series. Using PAM on distance matrix obtained via as.dist(1-abs(cor(data))) produces usable results but it's rather slow and doesn't catch slightly shifted time series. DTW implementation from package 'dtw' is orders of magnitude slower even with global window constraints which

Beginners question about Percentage similarity in R?

2007 Sep 23

Beginners question about Percentage similarity in R?

I have been reading a paper whereby the authors took values from Sorensons dissimilarity index and values from a percentage similarity index and applies G-Testing to the table of values. This is carried out to assess the differences in spider faunas (Strattton and Uetz, 1979). I like the method but have been trying to work out what function in R to use to get the percentage similarity. I have

Similarity between variables

2005 Jun 06

Similarity between variables

Hi, I would like to know the similarity between variables, but I don't exactly how begin and as from what dataframe or matrix! I have a matrix where in row I have 'Good', 'Medium','Bad' and in columns I have my Criterions ! What function and package should I use? Thanks a lot Sabine --------------------------------- ils, photos et vidéos ! [[alternative

converting similarity matrix formats

2006 Apr 23

converting similarity matrix formats

Dear all, I am using a program that generates similarity matrices in the following non-redundant pairwise format. a b 0.4 a c 0.5 a d 0.3 b c 0.9 b d 0.6 c d 0.2 matrix(c('a','a','a','b','b','c','b','c','d','c','d','d',.4,.5,.3,.9,.6,.2),byrow=F,nrow=6) I would like to convert this to a

Gower Similarity Coefficient

2006 Aug 13

Gower Similarity Coefficient

I'm interested in clustering my data using the Gower Similarity Coefficient, and I was wondering if R is capable of using that metric Timothy Rye [[alternative HTML version deleted]]

similarity test with R

2006 Dec 11

similarity test with R

>x=c(3.05176E-05,0.000457764,0.003204346,0.0138855,0.04165649,0.09164429,0.1527405,0.1963806,0.1963806,0.1527405,0.09164429,0.04165649,0.0138855,0.003204346,0.000457764,3.05176E-05) >y=c(0.0000306,0.0004566,0.0031985,0.0139083,0.0415539,0.0917678,0.1528134,0.1962831,0.1962994,0.1527996,0.0917336,0.0415497,0.0139308,0.0031917,0.0004529,0.0000301) I tried chisq.test, t-test, prop.test, etc,

Similarity matching with probabilities

2008 Jun 27

Similarity matching with probabilities

Hello, It's just a strange coincidence that someone posted just very recently a question about matching. I know there are several match function in the base package (such as match, pmatch, charmatch, and the gsub etc) but I can't seem to use them wisely to be able to get what I need. suppose I have the following strings: "tets" "estt" "rtes7"

similarity measure for binary data

2009 Oct 29

similarity measure for binary data

I am doing hierarchical clustering with cluster package. I couldnot find similarity measures like matching coefficient, Jaccard coefficient and sokal and sneath. Could anyone please tell package with similarity measures for binary data? kind regards, Ms.Karunambigai M PhD Scholar Dept. of Biostatistics NIMHANS Bangalore India From cricket scores to your friends. Try the Yahoo! India

similarity measure

2010 Mar 26

similarity measure

hi all, I am doing hierarchical clustering using similarity measures for binary data using package ade4 and hclust function. For method=8 and method = 9 of dist.binary, I am getting Na values. Hence, hclust function is giving error as Error in hclust(d8, method = "ward") : NA/NaN/Inf in foreign function call (arg 11). I think the fact that due to zero in the denominator of the

similarity matrix

2011 Dec 04

similarity matrix

Hello R-users, I've got a file with individuals as colums and the clusters where they occur in as rows. And I wanted a similarity matrix which tells me how many times each individual occurs with another. My eventual goal is to make Venn-diagrams from the occurence of my individuals. So I've this: cluster ind1 ind2 ind3 etc. 1 0 1 2 2 3 0 1 3

Compare String Similarity

2012 Apr 19

Compare String Similarity

Dear All, I need to estimate the level of similarity of two strings. For example: string1 <- c("depending","audience","research", "school"); string2 <- c("audience","push","drama","button","depending"); The words in string may occur in different order though. What function would you recommend to use

similar to: how to group a large list of strings into categories based on string similarity?