thr3ads.net - R help - [R] Testing significance in a design with unequal but proportional sample sizes [Mar 2004]

If this information is useful, please help other people find it:
Share via:

Michael Rennie

2004-Mar-04 18:30 UTC

[R] Testing significance in a design with unequal but proportional sample sizes

Hi, all

I have a rather un-ideal dataset that I am trying to work with, and would 
appreciate any advice you have on the matter.

I have 4 years worth of data taken at 3 depth-zones from which samples have 
been taken at random. I am looking at the abundance of organism A between depth 
zones and across years, and am interested in the possible interaction of 
organism A distributions shifting between depth zones over time. Unfortunately, 
the sample sizes (n) differ between depth zones, as follows:

             Year
             1   2   3   4
Depth Zone 1 15  15  15  15
           2 10  10  10  10
           3 5   5   5   5

As such, I have a 2-way anova with unequal but proportional subclass numbers.
Sokal and Rolf (3rd Ed., 1995) have a nifty method of working out sums of 
squares in this type of scenario (page 357, 358, box 11.6).  However, they 
don't tell you how to calculate the probabilities, but refer the reader on
to
Snedecor and Cochran (1967), which I am on my way to consult shortly.

I'm curious as to whether there is a more straightforward method of coding
this
into R, rather than having to more or less customize my own statistical test.  
I found some discussions in the archives revolving around type III sums of 
squares from 2001, but the lack of consensus around the discussion did little 
to assure me that I should try this approach.

Anyone with advice, code or suggestions, I'd love to hear any of it.

Cheers,

Mike

-- 
Michael Rennie
Ph.D. Candidate
University of Toronto at Mississauga
3359 Mississauga Rd. N.
Mississauga ON  L5L 1C6
Ph: 905-828-5452  Fax: 905-828-3792

Tom Blackwell

2004-Mar-04 20:06 UTC

head link

[R] Testing significance in a design with unequal but proportional sample sizes

Michael  -

Since your email says that the data are "the abundance of organism A",
I am moved to ask whether the abundances are integer counts, sometimes
zero, and whether the "samples" are perhaps dips of a net, or the
contents of a filter after pumping a certain amount of water through it,
or something akin to 'quadrats' in forest sampling.

If the abundances are integer counts, then it would be natural to
analyze the data with a log-linear model using R's  glm()  rather
than with anova.  Snedecor and Cochran is an excellent book, but for
this purpose Venables and Ripley's MASS (Modern Applied Statistics
with S and S-plus) might be better.

-  tom blackwell  -  u michigan medical school  -  ann arbor  -

On Thu, 4 Mar 2004, Michael Rennie wrote:
> Hi, all
>
> I have a rather un-ideal dataset that I am trying to work with, and would
> appreciate any advice you have on the matter.
>
> I have 4 years worth of data taken at 3 depth-zones from which samples have
> been taken at random. I am looking at the abundance of organism A between
depth
> zones and across years, and am interested in the possible interaction of
> organism A distributions shifting between depth zones over time.
Unfortunately,
> the sample sizes (n) differ between depth zones, as follows:
>
>              Year
>              1   2   3   4
> Depth Zone 1 15  15  15  15
>            2 10  10  10  10
>            3 5   5   5   5
>
> As such, I have a 2-way anova with unequal but proportional subclass
numbers.
> Sokal and Rolf (3rd Ed., 1995) have a nifty method of working out sums of
> squares in this type of scenario (page 357, 358, box 11.6).  However, they
> don't tell you how to calculate the probabilities, but refer the reader
on to
> Snedecor and Cochran (1967), which I am on my way to consult shortly.
>
> I'm curious as to whether there is a more straightforward method of
coding this
> into R, rather than having to more or less customize my own statistical
test.
> I found some discussions in the archives revolving around type III sums of
> squares from 2001, but the lack of consensus around the discussion did
little
> to assure me that I should try this approach.
>
> Anyone with advice, code or suggestions, I'd love to hear any of it.
>
> Cheers,
>
> Mike
> --
> Michael Rennie
> Ph.D. Candidate
> University of Toronto at Mississauga
> 3359 Mississauga Rd. N.
> Mississauga ON  L5L 1C6
> Ph: 905-828-5452  Fax: 905-828-3792
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>

pallier

2004-Mar-04 23:08 UTC

head link

[R] Testing significance in a design with unequal but proportional sample sizes

Hello,

This is a follow up on the question about the analysis of unbalanced
data, based on my (limited) understanding of what goes in such cases.

When the data is unbalanced in a factorial design,
the main effect of a given factor can be defined in several ways.
Which type of main effet is relevant depends on the scientific question.

Some textbooks distinguish between weighted and unweighted mean effects.

If you use the 'aov' function with an unbalanced design, it will report 
(for the
first factor in the formula), the f-ratio associated to  the "weighted
means" solution. That is, the computation of the main effect ignores the
unbalance: The effect size of a factor 'a' is computed regardless of the
distributions of the units among other factors.

Consider:

 > x<-scan()
1: 1 2 3
4: 4 5 6 7 8
9: 1 2 3 4 5
14: 6 7 8
17:
Read 16 items
 > a<-factor(rep(c(1,2),c(8,8)))
 > b<-factor(rep(c(1,2,1,2),c(3,5,5,3)))
 >
 > tapply(x,list(a=a,b=b),mean)
   b
a   1 2
  1 2 6
  2 3 7
 > tapply(x,a,mean)
  1   2
4.5 4.5


If all units are given the same weights (that is we ignore the factor
'b'),
then the main effect of a is 0.
This is confirmed by:

 > summary(aov(x~a*b))
            Df    Sum Sq   Mean Sq   F value    Pr(>F)
a            1 2.417e-32 2.417e-32 1.209e-32 1.0000000
b            1        60        60        30 0.0001413 ***
a:b          1 5.621e-31 5.621e-31 2.810e-31 1.0000000
Residuals   12        24         2

This is called the weighted means approach because the subgroups defined 
by the
crossing of a and b are given weights proportional the their size.


Now, another approach is to forget about the individual units
and just consider the table of means:

 > tapply(x,list(a=a,b=b),mean)
   b
a   1 2
  1 2 6
  2 3 7


Forgetting about the samples' sizes, one way to defined the main effect 
of 'a'
is as the mean of 2 and 6 versus the mean of 3 and 7:

 > t=tapply(x,list(a=a,b=b),mean)
 > diff(apply(t,1,mean))
2
1


That is '1'

One can compute a "fake" Mean Square associated to 'a' as 
(n-1)*effect-size=15*1=15,
and compare it to the MSE from the previous ANOVA (2 with 12 d.f.)

The f-ratio=15/2=7.5 reaches significance:
 > pf(7.5,1,12)
[1] 0.9820225
 >

If I am correct, this is what textbooks call the "unweighted means" 
approach.
In many cases, it is this type of main effect which is relevant.
(Especialy when the unbalance is due to random missing observations.)

I do not know if there is a solution with R
for easily computing the unweigthed main effects and assessing
their significance. (Anyone?)

Actually, the different types of main effects defined above just 
correspond to different
contrasts on the cell means. So if there is an easy solution to compute 
arbitrary contrasts
on the cell means in a factorial design, this could an approach to this
question. (Anyone?)


Christophe

Prof Brian Ripley

2004-Mar-05 07:59 UTC

head link

[R] Testing significance in a design with unequal but proportional sample sizes

On Fri, 5 Mar 2004, pallier wrote:

...
> Actually, the different types of main effects defined above just 
> correspond to different
> contrasts on the cell means. So if there is an easy solution to compute 
> arbitrary contrasts
> on the cell means in a factorial design, this could an approach to this
> question. (Anyone?)
There are at least three such ways.  ?contrasts (for the assignment
function contrasts<-)  and ?C, as well as the contrasts= argument to aov 
(the function you were discussing ...).

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Mar 2004 - Testing significance in a design with unequal but proportional sample sizes

[R] Testing significance in a design with unequal but proportional sample sizes

[R] Testing significance in a design with unequal but proportional sample sizes

[R] Testing significance in a design with unequal but proportional sample sizes

[R] Testing significance in a design with unequal but proportional sample sizes

Possibly Parallel Threads