Hi,You could use: dat1 <- structure(list(Haplotype = c("H1", "H1", "H1", "H2", "H2", "H2", "H3", "H3", "H3", "H4", "H4", "H4", "H4", "H4", "H4"), Frequency = c(0.8278, 0.02248, 0.1494, 0.8238, 0.02248, 0.1497, 0.1497, 0.02248, 0.8244, 0.628, 0.02248, 0.1483, 0.1637, 0.01081, 0.01798)), .Names = c("Haplotype", "Frequency"), class = "data.frame", row.names = c(NA, -15L)) with(dat1,tapply(Frequency,list(Haplotype),function(x) sum(pi*x*log2(1/(pi*x))))) ?#with(dat1,tapply(Frequency,list(Haplotype),function(x) sum(pi*x*log(1/(pi*x))))) #or sapply(split(dat1[,-1],dat1$Haplotype),function(x) sum(pi*x*log2(1/(pi*x)))) A.K. Hi all. I am seeking help in writing an R loop to calculate the shannon's information content (SIC) for every unique haplotype. The data includes the haplotypes in column 1 and frequency of haplotypes in column 2. As you can see in the example data with just 4 unique haplotypes, there are different numbers of each haplotype, with a frequency corresponding to each one. The frequency for all haplotype H* sums up to 1. The equation for SIC is ?i (?hi*log(1/(?hi))) where ??hi is the frequency of the hi haplotype. Haplotype Frequency H1 0.8278 H1 0.02248 H1 0.1494 H2 0.8238 H2 0.02248 H2 0.1497 H3 0.1497 H3 0.02248 H3 0.8244 H4 0.628 H4 0.02248 H4 0.1483 H4 0.1637 H4 0.01081 H4 0.01798 In this example, the SIC for H1 would be (?*0.8278*log(1/(?*0.8278))) + (?*0.02248*log(1/(?*0.02248))) + (?*0.1494*log(1/(?*0.1494))) and the final output should give 4 SIC values, one corresponding to each unique haplotype. I believe using lappy() is the correct method of going foward, but my R skills are very elementary to know what to do next. Thank you for any help.