thr3ads.net - R help - [R] table problems [Jun 2002]

If this information is useful, please help other people find it:
Share via:

Robin Hankin

2002-Jun-12 02:08 UTC

[R] table problems

dear helplist,

my student has fifty trees, numbered one to fifty, and a vector
recording which tree a certain possum slept in on 12 nights.

R> c
 [1]  3 14 17 22 26 26 17 40 43 25 46 46
R> 

Thus it slept in tree #3 on Monday, then tree #14 on Tues, and so on.
I wish to test the null hypothesis that the animal chooses trees
randomly; try

R> table(c)
c
 3 14 17 22 25 26 40 43 46 
 1  1  2  1  1  2  1  1  2 >
Thus it slept in tree #3 once, tree #14 once, tree #17 twice, etc.

Now on the null hypothesis the expected number of sleeps per tree is
12/50=0.24; so how do I carry out a chisquare test on the data,
including the trees that it never slept in?

chisq.test() doesn't "know" that there are actually fifty distinct
trees (most of which were never slept in) and not nine.

 > chisq.test(table(c))

	Chi-squared test for given probabilities

data:  table(c) 
X-squared = 1.5, df = 8, p-value = 0.9927

of course this isn't right because chisquared is > 25.8 due to the
animal sleeping in tree #17 and tree #46 twice (and of course, df
should be 49 because I have 50 trees).


help anyone?


-- 

Robin Hankin, Lecturer,
School of Geographical and Environmental Science
Private Bag 92019 Auckland
New Zealand

r.hankin at auckland.ac.nz
tel 0064-9-373-7599 x6820; FAX 0064-9-373-7042

as of: Wed Jun 12 13:51:00 NZST 2002
This (linux) system up continuously for:  286 days, 20 hours, 33 minutes
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

ripley@stats.ox.ac.uk

2002-Jun-12 07:13 UTC

head link

[R] table problems

On Wed, 12 Jun 2002, Robin Hankin wrote:
>
> dear helplist,
>
> my student has fifty trees, numbered one to fifty, and a vector
> recording which tree a certain possum slept in on 12 nights.
>
> R> c
>  [1]  3 14 17 22 26 26 17 40 43 25 46 46
> R>
>
> Thus it slept in tree #3 on Monday, then tree #14 on Tues, and so on.
> I wish to test the null hypothesis that the animal chooses trees
> randomly; try
>
> R> table(c)
> c
>  3 14 17 22 25 26 40 43 46
>  1  1  2  1  1  2  1  1  2
> >
>
> Thus it slept in tree #3 once, tree #14 once, tree #17 twice, etc.
Try tabulate(c), which goes to 46.  Or, better,

tab <- rep(0,50)
names(tab) <- 1:50
tab[names(table(c))] <- table(c)

> Now on the null hypothesis the expected number of sleeps per tree is
> 12/50=0.24; so how do I carry out a chisquare test on the data,
> including the trees that it never slept in?
>
> chisq.test() doesn't "know" that there are actually fifty
distinct
> trees (most of which were never slept in) and not nine.
>
>  > chisq.test(table(c))
>
> 	Chi-squared test for given probabilities
>
> data:  table(c)
> X-squared = 1.5, df = 8, p-value = 0.9927
>
> of course this isn't right because chisquared is > 25.8 due to the
> animal sleeping in tree #17 and tree #46 twice (and of course, df
> should be 49 because I have 50 trees).
> chisq.test(tab)
        Chi-squared test for given probabilities

data:  tab
X-squared = 63, df = 49, p-value = 0.08625

Warning message:
Chi-squared approximation may be incorrect in: chisq.test(tab)


The warning is serious: the approximation is probably dreadful for data
this sparse.  In any case, is the null hypothesis plausible: the animal
independently and uniform chooses a tree each night to sleep in, from
exactly the 50 trees your student labelled?

You could easily get a more accurate significance by simulation:

doone <- function(...)
{
   c <- sample(1:50, 12, replace = T)
   tab <- rep(0,50)
   names(tab) <- 1:50
   tab[names(table(c))] <- table(c)
   chisq.test(tab)$statistic
}
> table(round(sapply(1:1000, doone),3))
    38 46.333 54.667     63 71.333 79.667     88 96.333
   232    394    229     97     34      6      7      1

(Note how discrete the distribution is.)

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Jim Lemon

2002-Jun-12 07:42 UTC

head link

[R] table problems

Robin Hankin wrote:> 
>... I've been advised that the correct distribution for the number of
> sleeps in the 50 trees would be multinomial with n=50 and the
> p_i=12/50 for i=1 to 50.  I'll check out Feller when I get home
> tonight, but I suspect that M-B is the limiting case?
> I grabbed my thesis and refreshed my memory, and I think it will be what
you want. Here is a table for the discrete case with 50 cells and 12
trials.

N choices     probability     cumulative p
      1        2.048e-19       2.048e-19
      2        2.05421e-14     2.05423e-14
      3        4.16787e-11     4.16992e-11
      4        1.3844e-08      1.38857e-08
      5        1.43652e-06     1.45041e-06
      6        6.20311e-05     6.34815e-05
      7        0.00129369      0.00135717
      8        0.0141003       0.0154574
      9        0.0829514       0.0984088
     10        0.260324        0.358733
     11        0.403082        0.761815
     12        0.238185        1
 
Expected value = 10.76

What this is telling you is that if your possum is going back to the
same tree (which the ringtail at my place does) he will have to restrict
his choices to 8 or less to avoid a Type I error. Of course, your design
probably includes more than 1 possum, and you would have to sort out the
alpha for your number of subjects (and repeats?). I've appended the
source code for generating tables such as this.

Jim
-------------- next part --------------
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <math.h>

double Factorial(double n) {
 if(n > 1) return(n * Factorial(n-1));
 return(1);
}

double NChooseR(double n,double r) {
 double nfact;
 double rfact;
 double nminusrfact;

 nfact = Factorial(n);
 rfact = Factorial(r);
 nminusrfact = Factorial(n-r);
 return(nfact/(rfact * nminusrfact));
}

double PFilled(double n,double r,int f) {
 double psum=0;
 double combfact;
 int v;

 combfact = NChooseR(n,n-f);
 for(v=0;v<=f;v++)
  psum += pow(-1,v) * NChooseR(f,v) * pow((f-v)/n,r);
 return(combfact*psum);
}

double MBdist(double ncells,double nballs) {
 double psum;
 double ptotal = 0;
 double ev = 0;
 int filled = 1;

 while(filled <= nballs) {
  psum = PFilled(ncells,nballs,filled);
  ptotal += psum;
  fprintf(stdout,"     %2d        %-16g%g\n",filled,psum,ptotal);
  ev += psum * filled++;
 }
 if(ptotal < 0.99 || ptotal > 1.01)
  fprintf(stdout,"Warning: total p is %f\n",ptotal);
 return(ev);
}
 
int main(int argc,char *argv[]) {
 if(argc == 3) {
  fprintf(stdout,"N choices     probability     cumulative p\n");
  printf("\nExpected value =
%.2f\n",MBdist(atof(argv[1]),atof(argv[2])));
  return(0);
 }
 return(1);
}

David Brahm

2002-Jun-13 13:49 UTC

head link

[R] table problems

Robin Hankin <r.hankin at auckland.ac.nz> has a vector of trees visited:
  R> c <- c(3,14,17,22,26,26,17,40,43,25,46,46)
and wants a count of visits for trees 1:50.  table(c) is inadequate because it
omits the zero cases.

Brian D. Ripley <ripley at stats.ox.ac.uk>
suggested:> Try tabulate(c), which goes to 46.  Or, better,
> tab <- rep(0,50)
> names(tab) <- 1:50
> tab[names(table(c))] <- table(c)
But tabulate has a second argument (nbins) which does what you want in 1 line:
  R> tabulate(c, 50)
-- 
                              -- David Brahm (brahm at alum.mit.edu)
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

ripley@stats.ox.ac.uk

2002-Jun-13 14:51 UTC

head link

[R] table problems

On Thu, 13 Jun 2002, David Brahm wrote:
> Robin Hankin <r.hankin at auckland.ac.nz> has a vector of trees
visited:
>   R> c <- c(3,14,17,22,26,26,17,40,43,25,46,46)
> and wants a count of visits for trees 1:50.  table(c) is inadequate because
it
> omits the zero cases.
>
> Brian D. Ripley <ripley at stats.ox.ac.uk> suggested:
> > Try tabulate(c), which goes to 46.  Or, better,
> > tab <- rep(0,50)
> > names(tab) <- 1:50
> > tab[names(table(c))] <- table(c)
>
> But tabulate has a second argument (nbins) which does what you want in 1
line:
>   R> tabulate(c, 50)
Yes if you want 1:50.  Not for other label sets, which is why I did it
the more general way.  I did mention tabulate: should have givne more
details, I guess.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Maybe Matching Threads

Search for more apparently analagous threads

R help - Jun 2002 - table problems

[R] table problems

[R] table problems

[R] table problems

[R] table problems

[R] table problems

Maybe Matching Threads