Dear all, I am working with taxonomic data, represented as a list of classes, orders, families, genera and finally species. > class(mydata) [1] "data.frame" > mode(mydata) [1] "list" > names(mydata) [1] "tclass" "torder" "tfamily" "tgenus" "tspecies" > length(mydata$tclass) [1] 161590 The first 10 rows look like the following: > mydata[1:10,] tclass torder tfamily tgenus 1 Chlorophyta Chlorophyceae Dunaliellaceae Collodictyon 2 Chlorophyta Chlorophyceae Dunaliellaceae Collodictyon 3 Chlorophyta Chlorophyceae Dunaliellaceae Collodictyon 4 Chlorophyta Chlorophyceae Dunaliellaceae Dunaliella 5 Chlorophyta Chlorophyceae Dunaliellaceae Dunaliella 6 Chlorophyta Chlorophyceae Dunaliellaceae Dunaliella 7 Chlorophyta Chlorophyceae Chlamydomonadaceae Brachiomonas 8 Chlorophyta Chlorophyceae Chlamydomonadaceae Brachiomonas 9 Chlorophyta Chlorophyceae Chlamydomonadaceae Brachiomonas 10 Chlorophyta Chlorophyceae Chlamydomonadaceae Brachiomonas tspecies 1 Collodictyontriciliatum 2 Collodictyonciliatum 3 Collodictyonsemiciliatum 4 Dunaliellasalina 5 Dunaliellabardawil 6 Dunaliellatertiolecta 7 Brachiomonassubmarina 8 Brachiomonassimplex 9 Brachiomonasellipsoidalis 10 Brachiomonaswestiana In total I have 115 (unique) classes, containing 733 orders, containing 16 185 families, etc What I am trying to do is to obtain a subtree represented by let's say n1 random classes, containing n2 random orders (but restricted to those that belong to the classes chosen earlier), containing n3 random families etc and all the way down to species, where the number of species will be n5. So the elements I chose at each subsequent level will be defined by elements that are already chosen at the level above. If I randomly chose lets say 3 classes A,B and C I want to restrict our randomly chosen orders (lets say a1,a2,a3, b1,b2) to only those classes that are already chosen. Similarly I also need to restrict list of families to those orders that are chosen and that are known to belong to classes A,B,C. So I want to obtain a subtree spanning across all taxonomic levels, with randomly defined number of elements at each taxonomic level but in a such way that at the end I will not end up with orphaned nodes i.e. species without classes. I have been trying to use 'sample' like following: tcla<-sample(tclass,10,replace=T) #I pick 10 random elements, but I want it to be a random number; torder1<-torder[tclass==tcla] # I match list of orders with those that belong to classes defined earlier; tord<-sample(torder1, 10,replace=T) # pick 10 orders from classes that are already chosen; etc all the way down to species level. The problem with this approach is that I may obtain branches without any leaves. How to get rid of those branches? And after all I want to repeat this procedure lets say 1000 times, each time obtaining different number of elements at each taxonomic level. Sorry for this long-winded post, I hope it is clear what I am trying to do. I would appreciate any tips! Thanks, Olga