thr3ads.net - R help - [R] Some code to run Hutcheson t-test using R [Sep 2020]

If this information is useful, please help other people find it:
Share via:

Luigi Marongiu

2020-Sep-11 09:11 UTC

[R] Some code to run Hutcheson t-test using R

Hello,
I have just realized in the original paper, the t test is defined as:
`t = h1-h2 -(?1?2)/(var1-var2)^1/2`. is the term -(?1?2) missing in
your formula? How to calculate ?1?2?
Thank you

On Thu, Sep 10, 2020 at 2:41 PM Luigi Marongiu <marongiu.luigi at
gmail.com> wrote:>
> Update:
> I also added the confidence interval for the Shannon index:
> ```
> #! Hutcheson's t-test for Shannon diversity equality
> # thanks to Karl Schilling and Rui Barradas
> hutcheson = function(A, B){
>   # compute Shannon index, variance and sum of elements
>   A_index <- Shannon(A)
>   B_index <- Shannon(B)
>   A_var <- ShannonVar(A)
>   B_var <- ShannonVar(B)
>   A_sum <- sum(A)
>   B_sum <- sum(B)
>   # compute Hutcheson
>   DF <- (A_var + B_var)^2 /(A_var^2/A_sum + B_var^2/B_sum)
>   test <- (A_index-B_index) /sqrt(A_var + B_var)
>   p <- 2*pt(-abs(test),df= DF)
>   if (p < 0.001) {
>     P = "<0.001"
>   } else {
>     P = round(p, 3)
>   }
>   if (p < 0.001) {
>     S = "***"
>   } else if (p < 0.01) {
>     S = "**"
>   } else if (p < 0.05) {
>     S = "*"
>   } else {
>     S = ""
>   }
>   # closure
>   cat("Hutcheson's t-test for Shannon diversity
equality\n\tShannon
> index first group: \t",
>       round(A_index, 3), " (", round((A_index-2*A_var),3),
"-",
> round((A_index+2*A_var),3),
>       ")\n\tShannon index second group: \t",
>       round(B_index, 3), " (", round((B_index-2*B_var),3),
"-",
> round((B_index+2*B_var),3),
>       ")\n\tp-value: ", P, " ", S, "\n", sep
= "")
>   return(p)
> }
> ```
>
> On Thu, Sep 10, 2020 at 2:10 PM Luigi Marongiu <marongiu.luigi at
gmail.com> wrote:
> >
> > Hello,
> > thank you for the code. To explain better, when I used vegan, I did
> > not count the species directly but simply prepared a dataframe where,
> > for each species, I counted the number of samples bearing such
> > species:
> > ```
> >
> > > str(new_df)
> > 'data.frame': 3 obs. of  46 variables:
> >  $ NC_001416 Enterobacteria phage lambda   : int  5 4 5
> >  $ NC_001623 Autographa californica nucl...: int  7 7 7
> >  $ NC_001895 Enterobacteria phage P2       : int  1 0 0
> >  $ NC_004745 Yersinia phage L-413C         : int  1 0 0
> > ```
> > here the triplettes refer to healthy, tumor and metastasis. The
outcome is:
> > ```
> > # Shannon index
> > diversity(new_df)
> > #> Normal     Tumour     Metastasis
> > #> 2.520139   3.109512   1.890404
> > ```
> > Using iNext, I provided a list of all the species counted in a samples
> > ```
> > > new_list
> > $Healthy
> >  [1] 5 7 1 1 1 8 1 1 2 1 2 1 1 1 1 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0 0 0 0 0 0
> >
> > $Tumour
> >  [1] 4 7 0 0 0 7 0 0 1 0 1 0 0 0 0 2 0 0 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1
> > 1 2 1 1 1 1 1 1 1 0 0 0 0
> >
> > $Metastasis
> >  [1] 5 7 0 0 0 9 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 1 0 0 1 1 1 1
> > ```
> > From this I get:
> > ```
> > mod = iNEXT(new_list, q=0, datatype="abundance")
> > mod$AsyEst
> > #Site         Diversity Observed Estimator   s.e.    LCL     UCL
> > #1     Normal  Species richness   18.000    41.368 19.683 23.563
116.155
> > #2     Normal Shannon diversity   12.430    21.343  5.183 12.430 
31.501
> > #4     Tumour  Species richness   30.000    94.776 42.936 49.848
241.396
> > #5     Tumour Shannon diversity   22.410    53.135 14.486 24.743 
81.526
> > #7 Metastasis  Species richness   10.000    27.379 22.821 12.443
133.640
> > #8 Metastasis Shannon diversity    6.622     9.980  3.102  6.622 
16.059
> > ```
> > So here the Shannon index is 12 instead of 2.5...
> > Using Karl's function, I get:
> > ```
> > # compute Shannon
> > norm_sIdx <- Shannon(array(as.numeric(unlist(new_list[1]))))
> > canc_sIdx <- Shannon(array(as.numeric(unlist(new_list[2]))))
> > meta_sIdx <- Shannon(array(as.numeric(unlist(new_list[3]))))
> > norm_var <- ShannonVar(array(as.numeric(unlist(new_list[1]))))
> > canc_var <- ShannonVar(array(as.numeric(unlist(new_list[2]))))
> > meta_var <- ShannonVar(array(as.numeric(unlist(new_list[3]))))
> > norm_sum <- sum(array(as.numeric(unlist(new_list[1]))))
> > canc_sum <- sum(array(as.numeric(unlist(new_list[2]))))
> > meta_sum <- sum(array(as.numeric(unlist(new_list[3]))))
> > # compute Hutcheson
> > degfree <- (norm_var + canc_var)**2 /(norm_var**2/norm_sum +
> > canc_var**2 /canc_sum)
> > test <- (norm_sIdx-canc_sIdx) /sqrt(norm_var + canc_var)
> > (p <- 2*pt(-abs(test),df= degree))
> > > [1] 0.01825784
> > ```
> > remarkably, the indices are the same as obtained by vegan:
> > ```
> > > norm_sIdx
> > [1] 2.520139
> > > canc_sIdx
> > [1] 3.109512
> > > meta_sIdx
> > [1] 1.890404
> > ```
> >
> > I tried Rui's function but I got an error, so I wrote it as
> > ```
> > hutcheson = function(A, B){
> >   # compute Shannon index, variance and sum of elements
> >   A_index <- Shannon(A)
> >   B_index <- Shannon(B)
> >   A_var <- ShannonVar(A)
> >   B_var <- ShannonVar(B)
> >   A_sum <- sum(A)
> >   B_sum <- sum(B)
> >   # compute Hutcheson
> >   DF <- (A_var + B_var)^2 /(A_var^2/A_sum + B_var^2/B_sum)
> >   test <- (A_index-B_index) /sqrt(A_var + B_var)
> >   p <- 2*pt(-abs(test),df= DF)
> >   # closure
> >   cat("Hutcheson's t-test for Shannon diversity
equality\n\tShannon
> > index first group: ",
> >       round(A_index, 4), "\n\tShannon index second group: ",
round(B_index, 4),
> >       "\n\tp-value : ", round(p, 4), "\n", sep =
"")
> >   return(p)
> > }
> > ```
> > and I got:
> > ```
> >
> > > n_t = hutcheson(array(as.numeric(unlist(new_list[1]))),
array(as.numeric(unlist(new_list[2]))))
> > Hutcheson's t-test for Shannon diversity equality
> > Shannon index first group: 2.5201
> > Shannon index second group: 3.1095
> > p-value : 0.0183
> > > n_m = hutcheson(array(as.numeric(unlist(new_list[1]))),
array(as.numeric(unlist(new_list[3]))))
> > Hutcheson's t-test for Shannon diversity equality
> > Shannon index first group: 2.5201
> > Shannon index second group: 1.8904
> > p-value : 0.0371
> > > t_m = hutcheson(array(as.numeric(unlist(new_list[2]))),
array(as.numeric(unlist(new_list[3]))))
> > Hutcheson's t-test for Shannon diversity equality
> > Shannon index first group: 3.1095
> > Shannon index second group: 1.8904
> > p-value : 0
> > ```
> > new_list[1]|[2]|[3] refer to healthy, tumor and metastasis. applied to
> > the original Hutcheson data:
> > ```
> > > bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
> > > bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
> > > hutcheson(bird_1956, bird_1957)
> > Hutcheson's t-test for Shannon diversity equality
> > Shannon index first group: 1.8429
> > Shannon index second group: 1.0689
> > p-value : 0
> >
> > ```
> > This is to compare two groups at the time. I'll probably have to
> > compensate for multiple testing...
> > But if this all OK, then the case is closed.
> > Thank you
> >
> > On Thu, Sep 10, 2020 at 1:04 PM Rui Barradas <ruipbarradas at
sapo.pt> wrote:
> > >
> > > Hello,
> > >
> > > Sorry, there's an instruction missing. See inline.
> > >
> > > ?s 11:44 de 10/09/20, Rui Barradas escreveu:
> > > > If you want a function automating Karl's code, here it
is. It returns an
> > > > object of S3 class "htest", R's standard for
hypothesis tests functions.
> > > > The returned object can then be subset in the usual ways,
ht$statistic,
> > > > ht$parameter, ht$p.value, etc.
> > > >
> > > >
> > > > library(QSutils)
> > > >
> > > > hutcheson.test <- function(x1, x2){
> > > >    dataname1 <- deparse(substitute(x1))
> > > >    dataname2 <- deparse(substitute(x2))
> > > >    method <- "Hutcheson's t-test for Shannon
diversity equality"
> > > >    alternative <- "the diversities of the two
samples are not equal"
> > > >    h1 <- Shannon(x1)
> > > >    varh1 <- ShannonVar(x1)
> > > >    n1 <- sum(x1)
> > > >    h2 <- Shannon(x2)
> > > >    varh2 <- ShannonVar(x2)
> > > >    n2 <- sum(x2)
> > > >    degfree <- (varh1 + varh2)**2 / (varh1**2/n1 +
varh2**2/n2)
> > > >    tstat <- (h1 - h2)/sqrt(varh1 + varh2)
> > > >    p.value <- 2*pt(-abs(tstat), df = degfree)
> > > >    ht <- list(
> > > >      statistic = c(t = tstat),
> > > >      parameter = c(df = degfree),
> > > >      p.value = p.value,
> > > >      alternative = alternative,
> > > >      method = method,
> > > >      data.name = paste(dataname1, dataname2, sep = ",
")
> > > >    )
> > > >    class(ht) <- "htest"
> > > >    ht
> > > > }
> > > >
> > > > earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0)
> > > > later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0)
> > > >
> > > > hutcheson.test(earlier, later)
> > > >
> > > >
> > > >
> > > > With the data you provided:
> > > >
> > > >
> > > > bird_1956 <-
c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
> > > > bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
> > > > bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0)
> > > > bird_1959 <- c(0,0,14,59,26,68,0)
> > > > bird_1960 <- c(0,0,73,66,71,25,0,109,63,1)
> > > >
> > > > hutcheson.test(bird_1956, bird_1957)
> > > >
> > > >
> > > >
> > > >
> > > > Note that like David said earlier, there might be better
ways to
> > > > interpret Shannon's diversity index. If h is the
sample's diversity,
> > > > exp(h) gives the number of equally-common species with
equivalent
> > > > diversity.
> > > >
> > > >
> > > > s1 <- Shannon(earlier)
> > > > s2 <- Shannon(later)
> > > > c(earlier = s1, later = s2)
> > > > exp(c(earlier = s1, later = s2))   # Both round to 3
> > > > eq_common <- rep(1, 3)             # Can be 1 specimen or
any other number
> > > > Shannon(eq_common)                 # Slightly greater than
the samples'
> > > > diversity
> > > >
> > > >
> > >
> > > # Create a list with all the data
> > > birds <- mget(ls(pattern = "^bird"))
> > >
> > > > round(exp(sapply(birds, Shannon))) # Your data
> > >
> > >
> > > Hope this helps,
> > >
> > > Rui Barradas
> > >
> > > >
> > > >
> > > > #-------------------------------------
> > > >
> > > >
> > > > Earlier Karl wrote [1] that
> > > >
> > > >
> > > > Here var(h) is calculated as in ref 1 cited by Rui Barradas
- I guess
> > > > that explains the minor numerical differences obtained with
the code
> > > > above and the published variances.
> > > >
> > > >
> > > > I don't believe the published variances were computed
with the published
> > > > variance estimator. The code below computes the variances
like QSutils
> > > > and with formula (4) in Hutcheson's paper. The latter
does not give the
> > > > same results.
> > > >
> > > > var_est <- function(n){
> > > >    s <- length(n)
> > > >    N <- sum(n)
> > > >    p <- n/N
> > > >    i <- p != 0
> > > >    inv.p <- numeric(s)
> > > >    inv.p[i] <- 1/p[i]
> > > >    log.p <- numeric(s)
> > > >    log.p[i] <- log(p[i])
> > > >    #
> > > >    term1 <- (sum(p * log.p^2) - sum(p * log.p)^2)/N
> > > >    term2 <- (s - 1)/(2*N^2)
> > > >    #
> > > >    numer3 <- -1 + sum(inv.p) - sum(inv.p * log.p) +
sum(inv.p)*sum(p *
> > > > log.p)
> > > >    denom3 <- 6*N^3
> > > >    term3 <- numer3/denom3
> > > >    list(
> > > >      Bioc = term1 + term2,
> > > >      Hutch = term1 + term2 + term3
> > > >    )
> > > > }
> > > >
> > > > Vh1 <- var_est(earlier)
> > > > Vh1
> > > > all.equal(ShannonVar(earlier), Vh1$Bioc)
> > > > ShannonVar(earlier) - Vh1$Bioc            # FAQ 7.31
> > > >
> > > > Vh2 <- var_est(later)
> > > > Vh2
> > > > identical(ShannonVar(later), Vh2$Bioc)    # TRUE
> > > >
> > > >
> > > >
> > > > [1]
https://stat.ethz.ch/pipermail/r-help/2020-September/468664.html
> > > >
> > > >
> > > > Hope this helps,
> > > >
> > > > Rui Barradas
> > > >
> > > >
> > > > ?s 09:38 de 10/09/20, Luigi Marongiu escreveu:
> > > >> Update:
> > > >> I can see that you used the function Shannon from the
package QSutils.
> > > >> This would supplement the iNext package I used and solve
the problem.
> > > >> Thank you.
> > > >>
> > > >> On Thu, Sep 10, 2020 at 10:35 AM Luigi Marongiu
> > > >> <marongiu.luigi at gmail.com> wrote:
> > > >>>
> > > >>> Thank you very much for the code, that was very
helpful.
> > > >>> I got the article by Hutcheson -- I don't know
if I can distribute it
> > > >>> , given the possible copyrights, or if I can attach
it here -- but it
> > > >>> does not report numbers directly: it refers to a
previous article
> > > >>> counting bird death on a telegraph each year. The
numbers
> > > >>> are:
> > > >>> bird_1956 <-
c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
> > > >>> bird_1957 <-
c(4,111,53,66,146,222,126,61,0,2323,21)
> > > >>> bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0)
> > > >>> bird_1959 <- c(0,0,14,59,26,68,0)
> > > >>> bird_1960 <- c(0,0,73,66,71,25,0,109,63,1)
> > > >>>
> > > >>> This for sake of the argument.
> > > >>> As for my problem, I implemented the Shannon index
with the package
> > > >>> iNext, which only gives me the index itself and the
95% CI. Even when
> > > >>> I implemented it with vegan, I only got the index.
Essentially I don't
> > > >>> have a count of species I could feed into the
Hutcheson's. Is there a
> > > >>> way to extract these data? Or to run a
Hutcheson's on the final index?
> > > >>> Thank you
> > > >>>
> > > >>> On Tue, Sep 8, 2020 at 7:43 PM Karl Schilling
> > > >>> <karl.schilling at uni-bonn.de> wrote:
> > > >>>>
> > > >>>> Dear Luigi,
> > > >>>>
> > > >>>> below some code I cobbled together based on the
Hutcheson paper you
> > > >>>> mentioned. I was lucky to find code to calculate
h and, importantly,
> > > >>>> its
> > > >>>> variance in the R-package QSutils - you may find
it on the Bioconductor
> > > >>>> website.
> > > >>>>
> > > >>>> here is the code, along with an example. I also
attach the code as an
> > > >>>> R-file.
> > > >>>>
> > > >>>> Hope that helps.
> > > >>>> All my best
> > > >>>>
> > > >>>> Karl
> > > >>>> PS don't forget to adjust for multiple
testing if you compare more than
> > > >>>> two groups.
> > > >>>> K
> > > >>>>
> > > >>>>
> > > >>>> # load package needed
> > > >>>> # QSutils is on Bioconductor
> > > >>>> library(QSutils)
> > > >>>>
> > > >>>> # here some exemplary data - these are the data
from Pilou 1966 that
> > > >>>> are
> > > >>>> used
> > > >>>> # in the second example of Hutcheson, J theor
Biol 129:151-154 (1970)
> > > >>>>
> > > >>>> earlier <-
c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0)
> > > >>>> later <-
c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0)
> > > >>>> # numbers of the first example used by Hutcheson
were unfortunately not
> > > >>>> # available to me
> > > >>>>
> > > >>>> # here starts the code ; you may replace the
variables "earlier" and
> > > >>>> "later"
> > > >>>> # by your own numbers.
> > > >>>>
> > > >>>> # calculate h, var(h) etc
> > > >>>> h1 <- Shannon(earlier)
> > > >>>> varh1 <- ShannonVar(earlier)
> > > >>>> n1 <- sum (earlier)
> > > >>>> h2 <- Shannon(later)
> > > >>>> varh2 <- ShannonVar(later)
> > > >>>> n2 <- sum (later)
> > > >>>> degfree <- (varh1 + varh2)**2 /(varh1**2/n1 +
varh2**2 /n2)
> > > >>>>
> > > >>>> # compare numbers with those in the paper
> > > >>>> h1
> > > >>>> h2
> > > >>>> varh1
> > > >>>> varh2
> > > >>>> # I assume that minor numerical differences are
due to differences
> > > >>>> in the
> > > >>>> # numerical precision of computers in the early
seventies and today
> > > >>>> / KS
> > > >>>>
> > > >>>> # this is the actual t-test
> > > >>>> t <- (h1-h2) /sqrt(varh1 + varh2)
> > > >>>> p <- 2*pt(-abs(t),df= degfree)
> > > >>>> p
> > > >>>>
> > > >>>> # that's it
> > > >>>> # Best
> > > >>>> # Karl
> > > >>>> --
> > > >>>> Karl Schilling, MD
> > > >>>> Professor of Anatomy and Cell Biology
> > > >>>> Anatomisches Institut
> > > >>>> Rheinische Friedrich-Wilhelms-Universit?t
> > > >>>> Nussallee 10
> > > >>>>
> > > >>>> D-53115 Bonn
> > > >>>> Germany
> > > >>>>
> > > >>>> phone ++49-228-73-2602
> > > >>>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Best regards,
> > > >>> Luigi
> > > >>
> > > >>
> > > >>
> > > >
> > > > ______________________________________________
> > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible
code.
> >
> >
> >
> > --
> > Best regards,
> > Luigi
>
>
>
> --
> Best regards,
> Luigi


-- 
Best regards,
Luigi

Luigi Marongiu

2020-Sep-11 09:39 UTC

head link

[R] Some code to run Hutcheson t-test using R

Actually,
in the working example, Hutcheson himself did not report the term
?1?2: `t0 = h1-h2/(var1-var2)^1/2`. so I think we can live without it.
Case closed.
Thank you

On Fri, Sep 11, 2020 at 11:11 AM Luigi Marongiu
<marongiu.luigi at gmail.com> wrote:>
> Hello,
> I have just realized in the original paper, the t test is defined as:
> `t = h1-h2 -(?1?2)/(var1-var2)^1/2`. is the term -(?1?2) missing in
> your formula? How to calculate ?1?2?
> Thank you
>
> On Thu, Sep 10, 2020 at 2:41 PM Luigi Marongiu <marongiu.luigi at
gmail.com> wrote:
> >
> > Update:
> > I also added the confidence interval for the Shannon index:
> > ```
> > #! Hutcheson's t-test for Shannon diversity equality
> > # thanks to Karl Schilling and Rui Barradas
> > hutcheson = function(A, B){
> >   # compute Shannon index, variance and sum of elements
> >   A_index <- Shannon(A)
> >   B_index <- Shannon(B)
> >   A_var <- ShannonVar(A)
> >   B_var <- ShannonVar(B)
> >   A_sum <- sum(A)
> >   B_sum <- sum(B)
> >   # compute Hutcheson
> >   DF <- (A_var + B_var)^2 /(A_var^2/A_sum + B_var^2/B_sum)
> >   test <- (A_index-B_index) /sqrt(A_var + B_var)
> >   p <- 2*pt(-abs(test),df= DF)
> >   if (p < 0.001) {
> >     P = "<0.001"
> >   } else {
> >     P = round(p, 3)
> >   }
> >   if (p < 0.001) {
> >     S = "***"
> >   } else if (p < 0.01) {
> >     S = "**"
> >   } else if (p < 0.05) {
> >     S = "*"
> >   } else {
> >     S = ""
> >   }
> >   # closure
> >   cat("Hutcheson's t-test for Shannon diversity
equality\n\tShannon
> > index first group: \t",
> >       round(A_index, 3), " (", round((A_index-2*A_var),3),
"-",
> > round((A_index+2*A_var),3),
> >       ")\n\tShannon index second group: \t",
> >       round(B_index, 3), " (", round((B_index-2*B_var),3),
"-",
> > round((B_index+2*B_var),3),
> >       ")\n\tp-value: ", P, " ", S, "\n",
sep = "")
> >   return(p)
> > }
> > ```
> >
> > On Thu, Sep 10, 2020 at 2:10 PM Luigi Marongiu <marongiu.luigi at
gmail.com> wrote:
> > >
> > > Hello,
> > > thank you for the code. To explain better, when I used vegan, I
did
> > > not count the species directly but simply prepared a dataframe
where,
> > > for each species, I counted the number of samples bearing such
> > > species:
> > > ```
> > >
> > > > str(new_df)
> > > 'data.frame': 3 obs. of  46 variables:
> > >  $ NC_001416 Enterobacteria phage lambda   : int  5 4 5
> > >  $ NC_001623 Autographa californica nucl...: int  7 7 7
> > >  $ NC_001895 Enterobacteria phage P2       : int  1 0 0
> > >  $ NC_004745 Yersinia phage L-413C         : int  1 0 0
> > > ```
> > > here the triplettes refer to healthy, tumor and metastasis. The
outcome is:
> > > ```
> > > # Shannon index
> > > diversity(new_df)
> > > #> Normal     Tumour     Metastasis
> > > #> 2.520139   3.109512   1.890404
> > > ```
> > > Using iNext, I provided a list of all the species counted in a
samples
> > > ```
> > > > new_list
> > > $Healthy
> > >  [1] 5 7 1 1 1 8 1 1 2 1 2 1 1 1 1 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0
> > > 0 0 0 0 0 0 0 0 0 0 0 0 0
> > >
> > > $Tumour
> > >  [1] 4 7 0 0 0 7 0 0 1 0 1 0 0 0 0 2 0 0 1 1 1 1 1 2 1 2 1 1 1 1
1 1 1
> > > 1 2 1 1 1 1 1 1 1 0 0 0 0
> > >
> > > $Metastasis
> > >  [1] 5 7 0 0 0 9 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0
> > > 0 0 0 0 0 0 1 0 0 1 1 1 1
> > > ```
> > > From this I get:
> > > ```
> > > mod = iNEXT(new_list, q=0, datatype="abundance")
> > > mod$AsyEst
> > > #Site         Diversity Observed Estimator   s.e.    LCL     UCL
> > > #1     Normal  Species richness   18.000    41.368 19.683 23.563
116.155
> > > #2     Normal Shannon diversity   12.430    21.343  5.183 12.430 
31.501
> > > #4     Tumour  Species richness   30.000    94.776 42.936 49.848
241.396
> > > #5     Tumour Shannon diversity   22.410    53.135 14.486 24.743 
81.526
> > > #7 Metastasis  Species richness   10.000    27.379 22.821 12.443
133.640
> > > #8 Metastasis Shannon diversity    6.622     9.980  3.102  6.622 
16.059
> > > ```
> > > So here the Shannon index is 12 instead of 2.5...
> > > Using Karl's function, I get:
> > > ```
> > > # compute Shannon
> > > norm_sIdx <- Shannon(array(as.numeric(unlist(new_list[1]))))
> > > canc_sIdx <- Shannon(array(as.numeric(unlist(new_list[2]))))
> > > meta_sIdx <- Shannon(array(as.numeric(unlist(new_list[3]))))
> > > norm_var <- ShannonVar(array(as.numeric(unlist(new_list[1]))))
> > > canc_var <- ShannonVar(array(as.numeric(unlist(new_list[2]))))
> > > meta_var <- ShannonVar(array(as.numeric(unlist(new_list[3]))))
> > > norm_sum <- sum(array(as.numeric(unlist(new_list[1]))))
> > > canc_sum <- sum(array(as.numeric(unlist(new_list[2]))))
> > > meta_sum <- sum(array(as.numeric(unlist(new_list[3]))))
> > > # compute Hutcheson
> > > degfree <- (norm_var + canc_var)**2 /(norm_var**2/norm_sum +
> > > canc_var**2 /canc_sum)
> > > test <- (norm_sIdx-canc_sIdx) /sqrt(norm_var + canc_var)
> > > (p <- 2*pt(-abs(test),df= degree))
> > > > [1] 0.01825784
> > > ```
> > > remarkably, the indices are the same as obtained by vegan:
> > > ```
> > > > norm_sIdx
> > > [1] 2.520139
> > > > canc_sIdx
> > > [1] 3.109512
> > > > meta_sIdx
> > > [1] 1.890404
> > > ```
> > >
> > > I tried Rui's function but I got an error, so I wrote it as
> > > ```
> > > hutcheson = function(A, B){
> > >   # compute Shannon index, variance and sum of elements
> > >   A_index <- Shannon(A)
> > >   B_index <- Shannon(B)
> > >   A_var <- ShannonVar(A)
> > >   B_var <- ShannonVar(B)
> > >   A_sum <- sum(A)
> > >   B_sum <- sum(B)
> > >   # compute Hutcheson
> > >   DF <- (A_var + B_var)^2 /(A_var^2/A_sum + B_var^2/B_sum)
> > >   test <- (A_index-B_index) /sqrt(A_var + B_var)
> > >   p <- 2*pt(-abs(test),df= DF)
> > >   # closure
> > >   cat("Hutcheson's t-test for Shannon diversity
equality\n\tShannon
> > > index first group: ",
> > >       round(A_index, 4), "\n\tShannon index second group:
", round(B_index, 4),
> > >       "\n\tp-value : ", round(p, 4), "\n",
sep = "")
> > >   return(p)
> > > }
> > > ```
> > > and I got:
> > > ```
> > >
> > > > n_t = hutcheson(array(as.numeric(unlist(new_list[1]))),
array(as.numeric(unlist(new_list[2]))))
> > > Hutcheson's t-test for Shannon diversity equality
> > > Shannon index first group: 2.5201
> > > Shannon index second group: 3.1095
> > > p-value : 0.0183
> > > > n_m = hutcheson(array(as.numeric(unlist(new_list[1]))),
array(as.numeric(unlist(new_list[3]))))
> > > Hutcheson's t-test for Shannon diversity equality
> > > Shannon index first group: 2.5201
> > > Shannon index second group: 1.8904
> > > p-value : 0.0371
> > > > t_m = hutcheson(array(as.numeric(unlist(new_list[2]))),
array(as.numeric(unlist(new_list[3]))))
> > > Hutcheson's t-test for Shannon diversity equality
> > > Shannon index first group: 3.1095
> > > Shannon index second group: 1.8904
> > > p-value : 0
> > > ```
> > > new_list[1]|[2]|[3] refer to healthy, tumor and metastasis.
applied to
> > > the original Hutcheson data:
> > > ```
> > > > bird_1956 <-
c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
> > > > bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
> > > > hutcheson(bird_1956, bird_1957)
> > > Hutcheson's t-test for Shannon diversity equality
> > > Shannon index first group: 1.8429
> > > Shannon index second group: 1.0689
> > > p-value : 0
> > >
> > > ```
> > > This is to compare two groups at the time. I'll probably have
to
> > > compensate for multiple testing...
> > > But if this all OK, then the case is closed.
> > > Thank you
> > >
> > > On Thu, Sep 10, 2020 at 1:04 PM Rui Barradas <ruipbarradas at
sapo.pt> wrote:
> > > >
> > > > Hello,
> > > >
> > > > Sorry, there's an instruction missing. See inline.
> > > >
> > > > ?s 11:44 de 10/09/20, Rui Barradas escreveu:
> > > > > If you want a function automating Karl's code, here
it is. It returns an
> > > > > object of S3 class "htest", R's standard
for hypothesis tests functions.
> > > > > The returned object can then be subset in the usual
ways, ht$statistic,
> > > > > ht$parameter, ht$p.value, etc.
> > > > >
> > > > >
> > > > > library(QSutils)
> > > > >
> > > > > hutcheson.test <- function(x1, x2){
> > > > >    dataname1 <- deparse(substitute(x1))
> > > > >    dataname2 <- deparse(substitute(x2))
> > > > >    method <- "Hutcheson's t-test for
Shannon diversity equality"
> > > > >    alternative <- "the diversities of the two
samples are not equal"
> > > > >    h1 <- Shannon(x1)
> > > > >    varh1 <- ShannonVar(x1)
> > > > >    n1 <- sum(x1)
> > > > >    h2 <- Shannon(x2)
> > > > >    varh2 <- ShannonVar(x2)
> > > > >    n2 <- sum(x2)
> > > > >    degfree <- (varh1 + varh2)**2 / (varh1**2/n1 +
varh2**2/n2)
> > > > >    tstat <- (h1 - h2)/sqrt(varh1 + varh2)
> > > > >    p.value <- 2*pt(-abs(tstat), df = degfree)
> > > > >    ht <- list(
> > > > >      statistic = c(t = tstat),
> > > > >      parameter = c(df = degfree),
> > > > >      p.value = p.value,
> > > > >      alternative = alternative,
> > > > >      method = method,
> > > > >      data.name = paste(dataname1, dataname2, sep =
", ")
> > > > >    )
> > > > >    class(ht) <- "htest"
> > > > >    ht
> > > > > }
> > > > >
> > > > > earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0)
> > > > > later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0)
> > > > >
> > > > > hutcheson.test(earlier, later)
> > > > >
> > > > >
> > > > >
> > > > > With the data you provided:
> > > > >
> > > > >
> > > > > bird_1956 <-
c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
> > > > > bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
> > > > > bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0)
> > > > > bird_1959 <- c(0,0,14,59,26,68,0)
> > > > > bird_1960 <- c(0,0,73,66,71,25,0,109,63,1)
> > > > >
> > > > > hutcheson.test(bird_1956, bird_1957)
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Note that like David said earlier, there might be
better ways to
> > > > > interpret Shannon's diversity index. If h is the
sample's diversity,
> > > > > exp(h) gives the number of equally-common species with
equivalent
> > > > > diversity.
> > > > >
> > > > >
> > > > > s1 <- Shannon(earlier)
> > > > > s2 <- Shannon(later)
> > > > > c(earlier = s1, later = s2)
> > > > > exp(c(earlier = s1, later = s2))   # Both round to 3
> > > > > eq_common <- rep(1, 3)             # Can be 1
specimen or any other number
> > > > > Shannon(eq_common)                 # Slightly greater
than the samples'
> > > > > diversity
> > > > >
> > > > >
> > > >
> > > > # Create a list with all the data
> > > > birds <- mget(ls(pattern = "^bird"))
> > > >
> > > > > round(exp(sapply(birds, Shannon))) # Your data
> > > >
> > > >
> > > > Hope this helps,
> > > >
> > > > Rui Barradas
> > > >
> > > > >
> > > > >
> > > > > #-------------------------------------
> > > > >
> > > > >
> > > > > Earlier Karl wrote [1] that
> > > > >
> > > > >
> > > > > Here var(h) is calculated as in ref 1 cited by Rui
Barradas - I guess
> > > > > that explains the minor numerical differences obtained
with the code
> > > > > above and the published variances.
> > > > >
> > > > >
> > > > > I don't believe the published variances were
computed with the published
> > > > > variance estimator. The code below computes the
variances like QSutils
> > > > > and with formula (4) in Hutcheson's paper. The
latter does not give the
> > > > > same results.
> > > > >
> > > > > var_est <- function(n){
> > > > >    s <- length(n)
> > > > >    N <- sum(n)
> > > > >    p <- n/N
> > > > >    i <- p != 0
> > > > >    inv.p <- numeric(s)
> > > > >    inv.p[i] <- 1/p[i]
> > > > >    log.p <- numeric(s)
> > > > >    log.p[i] <- log(p[i])
> > > > >    #
> > > > >    term1 <- (sum(p * log.p^2) - sum(p * log.p)^2)/N
> > > > >    term2 <- (s - 1)/(2*N^2)
> > > > >    #
> > > > >    numer3 <- -1 + sum(inv.p) - sum(inv.p * log.p) +
sum(inv.p)*sum(p *
> > > > > log.p)
> > > > >    denom3 <- 6*N^3
> > > > >    term3 <- numer3/denom3
> > > > >    list(
> > > > >      Bioc = term1 + term2,
> > > > >      Hutch = term1 + term2 + term3
> > > > >    )
> > > > > }
> > > > >
> > > > > Vh1 <- var_est(earlier)
> > > > > Vh1
> > > > > all.equal(ShannonVar(earlier), Vh1$Bioc)
> > > > > ShannonVar(earlier) - Vh1$Bioc            # FAQ 7.31
> > > > >
> > > > > Vh2 <- var_est(later)
> > > > > Vh2
> > > > > identical(ShannonVar(later), Vh2$Bioc)    # TRUE
> > > > >
> > > > >
> > > > >
> > > > > [1]
https://stat.ethz.ch/pipermail/r-help/2020-September/468664.html
> > > > >
> > > > >
> > > > > Hope this helps,
> > > > >
> > > > > Rui Barradas
> > > > >
> > > > >
> > > > > ?s 09:38 de 10/09/20, Luigi Marongiu escreveu:
> > > > >> Update:
> > > > >> I can see that you used the function Shannon from
the package QSutils.
> > > > >> This would supplement the iNext package I used and
solve the problem.
> > > > >> Thank you.
> > > > >>
> > > > >> On Thu, Sep 10, 2020 at 10:35 AM Luigi Marongiu
> > > > >> <marongiu.luigi at gmail.com> wrote:
> > > > >>>
> > > > >>> Thank you very much for the code, that was very
helpful.
> > > > >>> I got the article by Hutcheson -- I don't
know if I can distribute it
> > > > >>> , given the possible copyrights, or if I can
attach it here -- but it
> > > > >>> does not report numbers directly: it refers to
a previous article
> > > > >>> counting bird death on a telegraph each year.
The numbers
> > > > >>> are:
> > > > >>> bird_1956 <-
c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
> > > > >>> bird_1957 <-
c(4,111,53,66,146,222,126,61,0,2323,21)
> > > > >>> bird_1958 <-
c(0,3,32,228,56,102,0,11,2,220,0)
> > > > >>> bird_1959 <- c(0,0,14,59,26,68,0)
> > > > >>> bird_1960 <- c(0,0,73,66,71,25,0,109,63,1)
> > > > >>>
> > > > >>> This for sake of the argument.
> > > > >>> As for my problem, I implemented the Shannon
index with the package
> > > > >>> iNext, which only gives me the index itself and
the 95% CI. Even when
> > > > >>> I implemented it with vegan, I only got the
index. Essentially I don't
> > > > >>> have a count of species I could feed into the
Hutcheson's. Is there a
> > > > >>> way to extract these data? Or to run a
Hutcheson's on the final index?
> > > > >>> Thank you
> > > > >>>
> > > > >>> On Tue, Sep 8, 2020 at 7:43 PM Karl Schilling
> > > > >>> <karl.schilling at uni-bonn.de> wrote:
> > > > >>>>
> > > > >>>> Dear Luigi,
> > > > >>>>
> > > > >>>> below some code I cobbled together based on
the Hutcheson paper you
> > > > >>>> mentioned. I was lucky to find code to
calculate h and, importantly,
> > > > >>>> its
> > > > >>>> variance in the R-package QSutils - you may
find it on the Bioconductor
> > > > >>>> website.
> > > > >>>>
> > > > >>>> here is the code, along with an example. I
also attach the code as an
> > > > >>>> R-file.
> > > > >>>>
> > > > >>>> Hope that helps.
> > > > >>>> All my best
> > > > >>>>
> > > > >>>> Karl
> > > > >>>> PS don't forget to adjust for multiple
testing if you compare more than
> > > > >>>> two groups.
> > > > >>>> K
> > > > >>>>
> > > > >>>>
> > > > >>>> # load package needed
> > > > >>>> # QSutils is on Bioconductor
> > > > >>>> library(QSutils)
> > > > >>>>
> > > > >>>> # here some exemplary data - these are the
data from Pilou 1966 that
> > > > >>>> are
> > > > >>>> used
> > > > >>>> # in the second example of Hutcheson, J
theor Biol 129:151-154 (1970)
> > > > >>>>
> > > > >>>> earlier <-
c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0)
> > > > >>>> later <-
c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0)
> > > > >>>> # numbers of the first example used by
Hutcheson were unfortunately not
> > > > >>>> # available to me
> > > > >>>>
> > > > >>>> # here starts the code ; you may replace
the variables "earlier" and
> > > > >>>> "later"
> > > > >>>> # by your own numbers.
> > > > >>>>
> > > > >>>> # calculate h, var(h) etc
> > > > >>>> h1 <- Shannon(earlier)
> > > > >>>> varh1 <- ShannonVar(earlier)
> > > > >>>> n1 <- sum (earlier)
> > > > >>>> h2 <- Shannon(later)
> > > > >>>> varh2 <- ShannonVar(later)
> > > > >>>> n2 <- sum (later)
> > > > >>>> degfree <- (varh1 + varh2)**2
/(varh1**2/n1 + varh2**2 /n2)
> > > > >>>>
> > > > >>>> # compare numbers with those in the paper
> > > > >>>> h1
> > > > >>>> h2
> > > > >>>> varh1
> > > > >>>> varh2
> > > > >>>> # I assume that minor numerical differences
are due to differences
> > > > >>>> in the
> > > > >>>> # numerical precision of computers in the
early seventies and today
> > > > >>>> / KS
> > > > >>>>
> > > > >>>> # this is the actual t-test
> > > > >>>> t <- (h1-h2) /sqrt(varh1 + varh2)
> > > > >>>> p <- 2*pt(-abs(t),df= degfree)
> > > > >>>> p
> > > > >>>>
> > > > >>>> # that's it
> > > > >>>> # Best
> > > > >>>> # Karl
> > > > >>>> --
> > > > >>>> Karl Schilling, MD
> > > > >>>> Professor of Anatomy and Cell Biology
> > > > >>>> Anatomisches Institut
> > > > >>>> Rheinische Friedrich-Wilhelms-Universit?t
> > > > >>>> Nussallee 10
> > > > >>>>
> > > > >>>> D-53115 Bonn
> > > > >>>> Germany
> > > > >>>>
> > > > >>>> phone ++49-228-73-2602
> > > > >>>>
> > > > >>>
> > > > >>>
> > > > >>> --
> > > > >>> Best regards,
> > > > >>> Luigi
> > > > >>
> > > > >>
> > > > >>
> > > > >
> > > > > ______________________________________________
> > > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE
and more, see
> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > PLEASE do read the posting guide
> > > > > http://www.R-project.org/posting-guide.html
> > > > > and provide commented, minimal, self-contained,
reproducible code.
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Luigi
> >
> >
> >
> > --
> > Best regards,
> > Luigi
>
>
>
> --
> Best regards,
> Luigi


-- 
Best regards,
Luigi

Karl Schilling

2020-Sep-11 12:36 UTC

head link

[R] Some code to run Hutcheson t-test using R

Dear Luigi:

no, ?1?2 is not "missing"

First, it should actually be "?1 - ?2".

And as your usual null-hypothesis when comparing h1 and h2 is that they 
are not different (i.e. ?1 = ?2), the latter term adds up to 0 and may 
be omitted.

Karl Schilling

R help - Sep 2020 - Some code to run Hutcheson t-test using R

[R] Some code to run Hutcheson t-test using R

[R] Some code to run Hutcheson t-test using R

[R] Some code to run Hutcheson t-test using R