If you want a function automating Karl's code, here it is. It returns an object of S3 class "htest", R's standard for hypothesis tests functions. The returned object can then be subset in the usual ways, ht$statistic, ht$parameter, ht$p.value, etc. library(QSutils) hutcheson.test <- function(x1, x2){ dataname1 <- deparse(substitute(x1)) dataname2 <- deparse(substitute(x2)) method <- "Hutcheson's t-test for Shannon diversity equality" alternative <- "the diversities of the two samples are not equal" h1 <- Shannon(x1) varh1 <- ShannonVar(x1) n1 <- sum(x1) h2 <- Shannon(x2) varh2 <- ShannonVar(x2) n2 <- sum(x2) degfree <- (varh1 + varh2)**2 / (varh1**2/n1 + varh2**2/n2) tstat <- (h1 - h2)/sqrt(varh1 + varh2) p.value <- 2*pt(-abs(tstat), df = degfree) ht <- list( statistic = c(t = tstat), parameter = c(df = degfree), p.value = p.value, alternative = alternative, method = method, data.name = paste(dataname1, dataname2, sep = ", ") ) class(ht) <- "htest" ht } earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0) later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0) hutcheson.test(earlier, later) With the data you provided: bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3) bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21) bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0) bird_1959 <- c(0,0,14,59,26,68,0) bird_1960 <- c(0,0,73,66,71,25,0,109,63,1) hutcheson.test(bird_1956, bird_1957) Note that like David said earlier, there might be better ways to interpret Shannon's diversity index. If h is the sample's diversity, exp(h) gives the number of equally-common species with equivalent diversity. s1 <- Shannon(earlier) s2 <- Shannon(later) c(earlier = s1, later = s2) exp(c(earlier = s1, later = s2)) # Both round to 3 eq_common <- rep(1, 3) # Can be 1 specimen or any other number Shannon(eq_common) # Slightly greater than the samples' diversity round(exp(sapply(birds, Shannon))) # Your data #------------------------------------- Earlier Karl wrote [1] that Here var(h) is calculated as in ref 1 cited by Rui Barradas - I guess that explains the minor numerical differences obtained with the code above and the published variances. I don't believe the published variances were computed with the published variance estimator. The code below computes the variances like QSutils and with formula (4) in Hutcheson's paper. The latter does not give the same results. var_est <- function(n){ s <- length(n) N <- sum(n) p <- n/N i <- p != 0 inv.p <- numeric(s) inv.p[i] <- 1/p[i] log.p <- numeric(s) log.p[i] <- log(p[i]) # term1 <- (sum(p * log.p^2) - sum(p * log.p)^2)/N term2 <- (s - 1)/(2*N^2) # numer3 <- -1 + sum(inv.p) - sum(inv.p * log.p) + sum(inv.p)*sum(p * log.p) denom3 <- 6*N^3 term3 <- numer3/denom3 list( Bioc = term1 + term2, Hutch = term1 + term2 + term3 ) } Vh1 <- var_est(earlier) Vh1 all.equal(ShannonVar(earlier), Vh1$Bioc) ShannonVar(earlier) - Vh1$Bioc # FAQ 7.31 Vh2 <- var_est(later) Vh2 identical(ShannonVar(later), Vh2$Bioc) # TRUE [1] https://stat.ethz.ch/pipermail/r-help/2020-September/468664.html Hope this helps, Rui Barradas ?s 09:38 de 10/09/20, Luigi Marongiu escreveu:> Update: > I can see that you used the function Shannon from the package QSutils. > This would supplement the iNext package I used and solve the problem. > Thank you. > > On Thu, Sep 10, 2020 at 10:35 AM Luigi Marongiu > <marongiu.luigi at gmail.com> wrote: >> >> Thank you very much for the code, that was very helpful. >> I got the article by Hutcheson -- I don't know if I can distribute it >> , given the possible copyrights, or if I can attach it here -- but it >> does not report numbers directly: it refers to a previous article >> counting bird death on a telegraph each year. The numbers >> are: >> bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3) >> bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21) >> bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0) >> bird_1959 <- c(0,0,14,59,26,68,0) >> bird_1960 <- c(0,0,73,66,71,25,0,109,63,1) >> >> This for sake of the argument. >> As for my problem, I implemented the Shannon index with the package >> iNext, which only gives me the index itself and the 95% CI. Even when >> I implemented it with vegan, I only got the index. Essentially I don't >> have a count of species I could feed into the Hutcheson's. Is there a >> way to extract these data? Or to run a Hutcheson's on the final index? >> Thank you >> >> On Tue, Sep 8, 2020 at 7:43 PM Karl Schilling >> <karl.schilling at uni-bonn.de> wrote: >>> >>> Dear Luigi, >>> >>> below some code I cobbled together based on the Hutcheson paper you >>> mentioned. I was lucky to find code to calculate h and, importantly, its >>> variance in the R-package QSutils - you may find it on the Bioconductor >>> website. >>> >>> here is the code, along with an example. I also attach the code as an >>> R-file. >>> >>> Hope that helps. >>> All my best >>> >>> Karl >>> PS don't forget to adjust for multiple testing if you compare more than >>> two groups. >>> K >>> >>> >>> # load package needed >>> # QSutils is on Bioconductor >>> library(QSutils) >>> >>> # here some exemplary data - these are the data from Pilou 1966 that are >>> used >>> # in the second example of Hutcheson, J theor Biol 129:151-154 (1970) >>> >>> earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0) >>> later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0) >>> # numbers of the first example used by Hutcheson were unfortunately not >>> # available to me >>> >>> # here starts the code ; you may replace the variables "earlier" and "later" >>> # by your own numbers. >>> >>> # calculate h, var(h) etc >>> h1 <- Shannon(earlier) >>> varh1 <- ShannonVar(earlier) >>> n1 <- sum (earlier) >>> h2 <- Shannon(later) >>> varh2 <- ShannonVar(later) >>> n2 <- sum (later) >>> degfree <- (varh1 + varh2)**2 /(varh1**2/n1 + varh2**2 /n2) >>> >>> # compare numbers with those in the paper >>> h1 >>> h2 >>> varh1 >>> varh2 >>> # I assume that minor numerical differences are due to differences in the >>> # numerical precision of computers in the early seventies and today / KS >>> >>> # this is the actual t-test >>> t <- (h1-h2) /sqrt(varh1 + varh2) >>> p <- 2*pt(-abs(t),df= degfree) >>> p >>> >>> # that's it >>> # Best >>> # Karl >>> -- >>> Karl Schilling, MD >>> Professor of Anatomy and Cell Biology >>> Anatomisches Institut >>> Rheinische Friedrich-Wilhelms-Universit?t >>> Nussallee 10 >>> >>> D-53115 Bonn >>> Germany >>> >>> phone ++49-228-73-2602 >>> >> >> >> -- >> Best regards, >> Luigi > > >
Hello, Sorry, there's an instruction missing. See inline. ?s 11:44 de 10/09/20, Rui Barradas escreveu:> If you want a function automating Karl's code, here it is. It returns an > object of S3 class "htest", R's standard for hypothesis tests functions. > The returned object can then be subset in the usual ways, ht$statistic, > ht$parameter, ht$p.value, etc. > > > library(QSutils) > > hutcheson.test <- function(x1, x2){ > ? dataname1 <- deparse(substitute(x1)) > ? dataname2 <- deparse(substitute(x2)) > ? method <- "Hutcheson's t-test for Shannon diversity equality" > ? alternative <- "the diversities of the two samples are not equal" > ? h1 <- Shannon(x1) > ? varh1 <- ShannonVar(x1) > ? n1 <- sum(x1) > ? h2 <- Shannon(x2) > ? varh2 <- ShannonVar(x2) > ? n2 <- sum(x2) > ? degfree <- (varh1 + varh2)**2 / (varh1**2/n1 + varh2**2/n2) > ? tstat <- (h1 - h2)/sqrt(varh1 + varh2) > ? p.value <- 2*pt(-abs(tstat), df = degfree) > ? ht <- list( > ??? statistic = c(t = tstat), > ??? parameter = c(df = degfree), > ??? p.value = p.value, > ??? alternative = alternative, > ??? method = method, > ??? data.name = paste(dataname1, dataname2, sep = ", ") > ? ) > ? class(ht) <- "htest" > ? ht > } > > earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0) > later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0) > > hutcheson.test(earlier, later) > > > > With the data you provided: > > > bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3) > bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21) > bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0) > bird_1959 <- c(0,0,14,59,26,68,0) > bird_1960 <- c(0,0,73,66,71,25,0,109,63,1) > > hutcheson.test(bird_1956, bird_1957) > > > > > Note that like David said earlier, there might be better ways to > interpret Shannon's diversity index. If h is the sample's diversity, > exp(h) gives the number of equally-common species with equivalent > diversity. > > > s1 <- Shannon(earlier) > s2 <- Shannon(later) > c(earlier = s1, later = s2) > exp(c(earlier = s1, later = s2))?? # Both round to 3 > eq_common <- rep(1, 3)???????????? # Can be 1 specimen or any other number > Shannon(eq_common)???????????????? # Slightly greater than the samples' > diversity > ># Create a list with all the data birds <- mget(ls(pattern = "^bird"))> round(exp(sapply(birds, Shannon))) # Your dataHope this helps, Rui Barradas> > > #------------------------------------- > > > Earlier Karl wrote [1] that > > > Here var(h) is calculated as in ref 1 cited by Rui Barradas - I guess > that explains the minor numerical differences obtained with the code > above and the published variances. > > > I don't believe the published variances were computed with the published > variance estimator. The code below computes the variances like QSutils > and with formula (4) in Hutcheson's paper. The latter does not give the > same results. > > var_est <- function(n){ > ? s <- length(n) > ? N <- sum(n) > ? p <- n/N > ? i <- p != 0 > ? inv.p <- numeric(s) > ? inv.p[i] <- 1/p[i] > ? log.p <- numeric(s) > ? log.p[i] <- log(p[i]) > ? # > ? term1 <- (sum(p * log.p^2) - sum(p * log.p)^2)/N > ? term2 <- (s - 1)/(2*N^2) > ? # > ? numer3 <- -1 + sum(inv.p) - sum(inv.p * log.p) + sum(inv.p)*sum(p * > log.p) > ? denom3 <- 6*N^3 > ? term3 <- numer3/denom3 > ? list( > ??? Bioc = term1 + term2, > ??? Hutch = term1 + term2 + term3 > ? ) > } > > Vh1 <- var_est(earlier) > Vh1 > all.equal(ShannonVar(earlier), Vh1$Bioc) > ShannonVar(earlier) - Vh1$Bioc??????????? # FAQ 7.31 > > Vh2 <- var_est(later) > Vh2 > identical(ShannonVar(later), Vh2$Bioc)??? # TRUE > > > > [1] https://stat.ethz.ch/pipermail/r-help/2020-September/468664.html > > > Hope this helps, > > Rui Barradas > > > ?s 09:38 de 10/09/20, Luigi Marongiu escreveu: >> Update: >> I can see that you used the function Shannon from the package QSutils. >> This would supplement the iNext package I used and solve the problem. >> Thank you. >> >> On Thu, Sep 10, 2020 at 10:35 AM Luigi Marongiu >> <marongiu.luigi at gmail.com> wrote: >>> >>> Thank you very much for the code, that was very helpful. >>> I got the article by Hutcheson -- I don't know if I can distribute it >>> , given the possible copyrights, or if I can attach it here -- but it >>> does not report numbers directly: it refers to a previous article >>> counting bird death on a telegraph each year. The numbers >>> are: >>> bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3) >>> bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21) >>> bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0) >>> bird_1959 <- c(0,0,14,59,26,68,0) >>> bird_1960 <- c(0,0,73,66,71,25,0,109,63,1) >>> >>> This for sake of the argument. >>> As for my problem, I implemented the Shannon index with the package >>> iNext, which only gives me the index itself and the 95% CI. Even when >>> I implemented it with vegan, I only got the index. Essentially I don't >>> have a count of species I could feed into the Hutcheson's. Is there a >>> way to extract these data? Or to run a Hutcheson's on the final index? >>> Thank you >>> >>> On Tue, Sep 8, 2020 at 7:43 PM Karl Schilling >>> <karl.schilling at uni-bonn.de> wrote: >>>> >>>> Dear Luigi, >>>> >>>> below some code I cobbled together based on the Hutcheson paper you >>>> mentioned. I was lucky to find code to calculate h and, importantly, >>>> its >>>> variance in the R-package QSutils - you may find it on the Bioconductor >>>> website. >>>> >>>> here is the code, along with an example. I also attach the code as an >>>> R-file. >>>> >>>> Hope that helps. >>>> All my best >>>> >>>> Karl >>>> PS don't forget to adjust for multiple testing if you compare more than >>>> two groups. >>>> K >>>> >>>> >>>> # load package needed >>>> # QSutils is on Bioconductor >>>> library(QSutils) >>>> >>>> # here some exemplary data - these are the data from Pilou 1966 that >>>> are >>>> used >>>> # in the second example of Hutcheson, J theor Biol 129:151-154 (1970) >>>> >>>> earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0) >>>> later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0) >>>> # numbers of the first example used by Hutcheson were unfortunately not >>>> # available to me >>>> >>>> # here starts the code ; you may replace the variables "earlier" and >>>> "later" >>>> # by your own numbers. >>>> >>>> # calculate h, var(h) etc >>>> h1 <- Shannon(earlier) >>>> varh1 <- ShannonVar(earlier) >>>> n1 <- sum (earlier) >>>> h2 <- Shannon(later) >>>> varh2 <- ShannonVar(later) >>>> n2 <- sum (later) >>>> degfree <- (varh1 + varh2)**2 /(varh1**2/n1 + varh2**2 /n2) >>>> >>>> # compare numbers with those in the paper >>>> h1 >>>> h2 >>>> varh1 >>>> varh2 >>>> # I assume that minor numerical differences are due to differences >>>> in the >>>> # numerical precision of computers in the early seventies and today >>>> / KS >>>> >>>> # this is the actual t-test >>>> t <- (h1-h2) /sqrt(varh1 + varh2) >>>> p <- 2*pt(-abs(t),df= degfree) >>>> p >>>> >>>> # that's it >>>> # Best >>>> # Karl >>>> -- >>>> Karl Schilling, MD >>>> Professor of Anatomy and Cell Biology >>>> Anatomisches Institut >>>> Rheinische Friedrich-Wilhelms-Universit?t >>>> Nussallee 10 >>>> >>>> D-53115 Bonn >>>> Germany >>>> >>>> phone ++49-228-73-2602 >>>> >>> >>> >>> -- >>> Best regards, >>> Luigi >> >> >> > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hello, thank you for the code. To explain better, when I used vegan, I did not count the species directly but simply prepared a dataframe where, for each species, I counted the number of samples bearing such species: ```> str(new_df)'data.frame': 3 obs. of 46 variables: $ NC_001416 Enterobacteria phage lambda : int 5 4 5 $ NC_001623 Autographa californica nucl...: int 7 7 7 $ NC_001895 Enterobacteria phage P2 : int 1 0 0 $ NC_004745 Yersinia phage L-413C : int 1 0 0 ``` here the triplettes refer to healthy, tumor and metastasis. The outcome is: ``` # Shannon index diversity(new_df) #> Normal Tumour Metastasis #> 2.520139 3.109512 1.890404 ``` Using iNext, I provided a list of all the species counted in a samples ```> new_list$Healthy [1] 5 7 1 1 1 8 1 1 2 1 2 1 1 1 1 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 $Tumour [1] 4 7 0 0 0 7 0 0 1 0 1 0 0 0 0 2 0 0 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 0 0 0 0 $Metastasis [1] 5 7 0 0 0 9 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 1 ```>From this I get:``` mod = iNEXT(new_list, q=0, datatype="abundance") mod$AsyEst #Site Diversity Observed Estimator s.e. LCL UCL #1 Normal Species richness 18.000 41.368 19.683 23.563 116.155 #2 Normal Shannon diversity 12.430 21.343 5.183 12.430 31.501 #4 Tumour Species richness 30.000 94.776 42.936 49.848 241.396 #5 Tumour Shannon diversity 22.410 53.135 14.486 24.743 81.526 #7 Metastasis Species richness 10.000 27.379 22.821 12.443 133.640 #8 Metastasis Shannon diversity 6.622 9.980 3.102 6.622 16.059 ``` So here the Shannon index is 12 instead of 2.5... Using Karl's function, I get: ``` # compute Shannon norm_sIdx <- Shannon(array(as.numeric(unlist(new_list[1])))) canc_sIdx <- Shannon(array(as.numeric(unlist(new_list[2])))) meta_sIdx <- Shannon(array(as.numeric(unlist(new_list[3])))) norm_var <- ShannonVar(array(as.numeric(unlist(new_list[1])))) canc_var <- ShannonVar(array(as.numeric(unlist(new_list[2])))) meta_var <- ShannonVar(array(as.numeric(unlist(new_list[3])))) norm_sum <- sum(array(as.numeric(unlist(new_list[1])))) canc_sum <- sum(array(as.numeric(unlist(new_list[2])))) meta_sum <- sum(array(as.numeric(unlist(new_list[3])))) # compute Hutcheson degfree <- (norm_var + canc_var)**2 /(norm_var**2/norm_sum + canc_var**2 /canc_sum) test <- (norm_sIdx-canc_sIdx) /sqrt(norm_var + canc_var) (p <- 2*pt(-abs(test),df= degree))> [1] 0.01825784``` remarkably, the indices are the same as obtained by vegan: ```> norm_sIdx[1] 2.520139> canc_sIdx[1] 3.109512> meta_sIdx[1] 1.890404 ``` I tried Rui's function but I got an error, so I wrote it as ``` hutcheson = function(A, B){ # compute Shannon index, variance and sum of elements A_index <- Shannon(A) B_index <- Shannon(B) A_var <- ShannonVar(A) B_var <- ShannonVar(B) A_sum <- sum(A) B_sum <- sum(B) # compute Hutcheson DF <- (A_var + B_var)^2 /(A_var^2/A_sum + B_var^2/B_sum) test <- (A_index-B_index) /sqrt(A_var + B_var) p <- 2*pt(-abs(test),df= DF) # closure cat("Hutcheson's t-test for Shannon diversity equality\n\tShannon index first group: ", round(A_index, 4), "\n\tShannon index second group: ", round(B_index, 4), "\n\tp-value : ", round(p, 4), "\n", sep = "") return(p) } ``` and I got: ```> n_t = hutcheson(array(as.numeric(unlist(new_list[1]))), array(as.numeric(unlist(new_list[2]))))Hutcheson's t-test for Shannon diversity equality Shannon index first group: 2.5201 Shannon index second group: 3.1095 p-value : 0.0183> n_m = hutcheson(array(as.numeric(unlist(new_list[1]))), array(as.numeric(unlist(new_list[3]))))Hutcheson's t-test for Shannon diversity equality Shannon index first group: 2.5201 Shannon index second group: 1.8904 p-value : 0.0371> t_m = hutcheson(array(as.numeric(unlist(new_list[2]))), array(as.numeric(unlist(new_list[3]))))Hutcheson's t-test for Shannon diversity equality Shannon index first group: 3.1095 Shannon index second group: 1.8904 p-value : 0 ``` new_list[1]|[2]|[3] refer to healthy, tumor and metastasis. applied to the original Hutcheson data: ```> bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3) > bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21) > hutcheson(bird_1956, bird_1957)Hutcheson's t-test for Shannon diversity equality Shannon index first group: 1.8429 Shannon index second group: 1.0689 p-value : 0 ``` This is to compare two groups at the time. I'll probably have to compensate for multiple testing... But if this all OK, then the case is closed. Thank you On Thu, Sep 10, 2020 at 1:04 PM Rui Barradas <ruipbarradas at sapo.pt> wrote:> > Hello, > > Sorry, there's an instruction missing. See inline. > > ?s 11:44 de 10/09/20, Rui Barradas escreveu: > > If you want a function automating Karl's code, here it is. It returns an > > object of S3 class "htest", R's standard for hypothesis tests functions. > > The returned object can then be subset in the usual ways, ht$statistic, > > ht$parameter, ht$p.value, etc. > > > > > > library(QSutils) > > > > hutcheson.test <- function(x1, x2){ > > dataname1 <- deparse(substitute(x1)) > > dataname2 <- deparse(substitute(x2)) > > method <- "Hutcheson's t-test for Shannon diversity equality" > > alternative <- "the diversities of the two samples are not equal" > > h1 <- Shannon(x1) > > varh1 <- ShannonVar(x1) > > n1 <- sum(x1) > > h2 <- Shannon(x2) > > varh2 <- ShannonVar(x2) > > n2 <- sum(x2) > > degfree <- (varh1 + varh2)**2 / (varh1**2/n1 + varh2**2/n2) > > tstat <- (h1 - h2)/sqrt(varh1 + varh2) > > p.value <- 2*pt(-abs(tstat), df = degfree) > > ht <- list( > > statistic = c(t = tstat), > > parameter = c(df = degfree), > > p.value = p.value, > > alternative = alternative, > > method = method, > > data.name = paste(dataname1, dataname2, sep = ", ") > > ) > > class(ht) <- "htest" > > ht > > } > > > > earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0) > > later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0) > > > > hutcheson.test(earlier, later) > > > > > > > > With the data you provided: > > > > > > bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3) > > bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21) > > bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0) > > bird_1959 <- c(0,0,14,59,26,68,0) > > bird_1960 <- c(0,0,73,66,71,25,0,109,63,1) > > > > hutcheson.test(bird_1956, bird_1957) > > > > > > > > > > Note that like David said earlier, there might be better ways to > > interpret Shannon's diversity index. If h is the sample's diversity, > > exp(h) gives the number of equally-common species with equivalent > > diversity. > > > > > > s1 <- Shannon(earlier) > > s2 <- Shannon(later) > > c(earlier = s1, later = s2) > > exp(c(earlier = s1, later = s2)) # Both round to 3 > > eq_common <- rep(1, 3) # Can be 1 specimen or any other number > > Shannon(eq_common) # Slightly greater than the samples' > > diversity > > > > > > # Create a list with all the data > birds <- mget(ls(pattern = "^bird")) > > > round(exp(sapply(birds, Shannon))) # Your data > > > Hope this helps, > > Rui Barradas > > > > > > > #------------------------------------- > > > > > > Earlier Karl wrote [1] that > > > > > > Here var(h) is calculated as in ref 1 cited by Rui Barradas - I guess > > that explains the minor numerical differences obtained with the code > > above and the published variances. > > > > > > I don't believe the published variances were computed with the published > > variance estimator. The code below computes the variances like QSutils > > and with formula (4) in Hutcheson's paper. The latter does not give the > > same results. > > > > var_est <- function(n){ > > s <- length(n) > > N <- sum(n) > > p <- n/N > > i <- p != 0 > > inv.p <- numeric(s) > > inv.p[i] <- 1/p[i] > > log.p <- numeric(s) > > log.p[i] <- log(p[i]) > > # > > term1 <- (sum(p * log.p^2) - sum(p * log.p)^2)/N > > term2 <- (s - 1)/(2*N^2) > > # > > numer3 <- -1 + sum(inv.p) - sum(inv.p * log.p) + sum(inv.p)*sum(p * > > log.p) > > denom3 <- 6*N^3 > > term3 <- numer3/denom3 > > list( > > Bioc = term1 + term2, > > Hutch = term1 + term2 + term3 > > ) > > } > > > > Vh1 <- var_est(earlier) > > Vh1 > > all.equal(ShannonVar(earlier), Vh1$Bioc) > > ShannonVar(earlier) - Vh1$Bioc # FAQ 7.31 > > > > Vh2 <- var_est(later) > > Vh2 > > identical(ShannonVar(later), Vh2$Bioc) # TRUE > > > > > > > > [1] https://stat.ethz.ch/pipermail/r-help/2020-September/468664.html > > > > > > Hope this helps, > > > > Rui Barradas > > > > > > ?s 09:38 de 10/09/20, Luigi Marongiu escreveu: > >> Update: > >> I can see that you used the function Shannon from the package QSutils. > >> This would supplement the iNext package I used and solve the problem. > >> Thank you. > >> > >> On Thu, Sep 10, 2020 at 10:35 AM Luigi Marongiu > >> <marongiu.luigi at gmail.com> wrote: > >>> > >>> Thank you very much for the code, that was very helpful. > >>> I got the article by Hutcheson -- I don't know if I can distribute it > >>> , given the possible copyrights, or if I can attach it here -- but it > >>> does not report numbers directly: it refers to a previous article > >>> counting bird death on a telegraph each year. The numbers > >>> are: > >>> bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3) > >>> bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21) > >>> bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0) > >>> bird_1959 <- c(0,0,14,59,26,68,0) > >>> bird_1960 <- c(0,0,73,66,71,25,0,109,63,1) > >>> > >>> This for sake of the argument. > >>> As for my problem, I implemented the Shannon index with the package > >>> iNext, which only gives me the index itself and the 95% CI. Even when > >>> I implemented it with vegan, I only got the index. Essentially I don't > >>> have a count of species I could feed into the Hutcheson's. Is there a > >>> way to extract these data? Or to run a Hutcheson's on the final index? > >>> Thank you > >>> > >>> On Tue, Sep 8, 2020 at 7:43 PM Karl Schilling > >>> <karl.schilling at uni-bonn.de> wrote: > >>>> > >>>> Dear Luigi, > >>>> > >>>> below some code I cobbled together based on the Hutcheson paper you > >>>> mentioned. I was lucky to find code to calculate h and, importantly, > >>>> its > >>>> variance in the R-package QSutils - you may find it on the Bioconductor > >>>> website. > >>>> > >>>> here is the code, along with an example. I also attach the code as an > >>>> R-file. > >>>> > >>>> Hope that helps. > >>>> All my best > >>>> > >>>> Karl > >>>> PS don't forget to adjust for multiple testing if you compare more than > >>>> two groups. > >>>> K > >>>> > >>>> > >>>> # load package needed > >>>> # QSutils is on Bioconductor > >>>> library(QSutils) > >>>> > >>>> # here some exemplary data - these are the data from Pilou 1966 that > >>>> are > >>>> used > >>>> # in the second example of Hutcheson, J theor Biol 129:151-154 (1970) > >>>> > >>>> earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0) > >>>> later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0) > >>>> # numbers of the first example used by Hutcheson were unfortunately not > >>>> # available to me > >>>> > >>>> # here starts the code ; you may replace the variables "earlier" and > >>>> "later" > >>>> # by your own numbers. > >>>> > >>>> # calculate h, var(h) etc > >>>> h1 <- Shannon(earlier) > >>>> varh1 <- ShannonVar(earlier) > >>>> n1 <- sum (earlier) > >>>> h2 <- Shannon(later) > >>>> varh2 <- ShannonVar(later) > >>>> n2 <- sum (later) > >>>> degfree <- (varh1 + varh2)**2 /(varh1**2/n1 + varh2**2 /n2) > >>>> > >>>> # compare numbers with those in the paper > >>>> h1 > >>>> h2 > >>>> varh1 > >>>> varh2 > >>>> # I assume that minor numerical differences are due to differences > >>>> in the > >>>> # numerical precision of computers in the early seventies and today > >>>> / KS > >>>> > >>>> # this is the actual t-test > >>>> t <- (h1-h2) /sqrt(varh1 + varh2) > >>>> p <- 2*pt(-abs(t),df= degfree) > >>>> p > >>>> > >>>> # that's it > >>>> # Best > >>>> # Karl > >>>> -- > >>>> Karl Schilling, MD > >>>> Professor of Anatomy and Cell Biology > >>>> Anatomisches Institut > >>>> Rheinische Friedrich-Wilhelms-Universit?t > >>>> Nussallee 10 > >>>> > >>>> D-53115 Bonn > >>>> Germany > >>>> > >>>> phone ++49-228-73-2602 > >>>> > >>> > >>> > >>> -- > >>> Best regards, > >>> Luigi > >> > >> > >> > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code.-- Best regards, Luigi