Hi; I have a matrix of 154 elements by 66241 sub-elements. The elements are chains of characters, sub elements are simply sub-chains of a certain length. For each element, I computed a count of the ocurrence of sub-elements (scan of strings). I thus have a matrix of numerical values (between 0 and max number of occurences). One the other hand, I computed distances and hierarchical clustering of all elements by another information-content based methodology. I would like to test, for a cluster of elements (for ex. elements 1 to 10, versus 11 to 154) the significance of occurence of the counts for each sub-element (66241). I could test them one by one like this: sub1<-c(0,2,0,6,3,2,5,4,3,... sub1_C<-c(sub1[1],sub1[2],sub1[3],... sub1_O<-c(sub1[11],sub1[12],sub1[13],... t.test(sub1_C, sub1_O, alternative = c("greater"), mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95) QUESTION 1: how could this be BATCH done for all elements - loading data in a table, matrix or data.frame (testing the significance of count means of cluster(1-10) versus cluster(11-154)... Elements (clusters) to test are not ordered (for ex. elements 1,15,4,7,9,11,12 against 2,150,40,...) Does anyone think of better statistics to be used in such a context [STRING CONTENT ANALYSIS]? I thought of using Bayesian type analyses, but don't know how. Thank for hints, regards. Fran?ois PS. Pls provide "for newbies" details.