thr3ads.net - R help - [R] How to do poisson distribution test like this? [Jul 2009]

If this information is useful, please help other people find it:
Share via:

Mao Jianfeng

2009-Jul-28 10:03 UTC

[R] How to do poisson distribution test like this?

Dear R-listers,

I want to reperfrom a poisson distribution test that presented in a
recent-published biological research paper (Plant Physiology 2008, vol 148,
pp. 1189-1200). That test is about the occurrence number of a kind of gene
in separate chromosomes.

For instance:

The observed gene number in chromosome A is 36.
The expected gene number in chromosome A is 30.

Then, the authors got a probability 0.137 by distribution test on this
trial. In this test, a Poisson distribution was used to determine the
significance of the gene distribution.

Questions:

How can I reperform this test in R?

Thank you in advance.

Mao Jian-Feng
Institue of Botany,
CAS, China

	[[alternative HTML version deleted]]

(Ted Harding)

2009-Jul-28 10:42 UTC

head link

[R] How to do poisson distribution test like this?

On 28-Jul-09 10:03:41, Mao Jianfeng wrote:> Dear R-listers,
> I want to reperfrom a poisson distribution test that presented
> in a recent-published biological research paper (Plant Physiology
> 2008, vol 148, pp. 1189-1200). That test is about the occurrence
> number of a kind of gene in separate chromosomes.
> 
> For instance:
> 
> The observed gene number in chromosome A is 36.
> The expected gene number in chromosome A is 30.
> 
> Then, the authors got a probability 0.137 by distribution test on this
> trial. In this test, a Poisson distribution was used to determine the
> significance of the gene distribution.
> 
> Questions:
> How can I reperform this test in R?
> 
> Thank you in advance.
> Mao Jian-Feng
> Institue of Botany,
> CAS, China
Since it is not clear what test procedure they used, I have done
a couple of numerical experiments in R:

1. Compare the upper-tail probability of the POisson distribution
   with mean mu = 30 of the event that at least 36 are observed:

   1-ppois(36,30)
   # [1] 0.1196266

   Not quite the 0.137 that they got.

2. A similar comparison, using a Normal approximation to the Poisson
   (mean mu = 30, SD = sqrt(mu)):

   1 - pnorm(6/sqrt(30))
   # [1] 0.1366608

   which, after rounding, is exactly the 0.137 that they got.

So it seems they have used an upper-tail test based on the Normal
approximation to the Poisson distribution.

Method 1 (using the exact Poisson distribution) is preferable, since
it is accurate (given the assumption of Poisson distribution).
So that would, in principle, be the best way to do it in R (as
illustrated).

Possibly their adoption of Method 2 is based on a naive acceptance
of the "rule-of-thumb" from some textbook; or maybe their available
software does not offer ready access to the exact Poisson distribution
(which wouldn't happen if they used R -- see Method 1). As stated,
it is inaacurate compared with Method 1, so is not to be preferred.

However, if you need to reproduce their method (regardless of merit),
then use Method 2.

Hoping this helps,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 28-Jul-09                                       Time: 11:42:08
------------------------------ XFMail ------------------------------

glen_b

2009-Jul-29 03:21 UTC

head link

[R] How to do poisson distribution test like this?

I have gone to have a look at the paper. (http://
www.plantphysiol.org/cgi/content/full/148/3/1189 ) to try to work out what
they're actually doing, in the hope that I might be able to figure out their
procedure so we can give a more complete answer to the question.
Unfortunately, I'm more confused now than before.

The numbers the OP refers to are in table I:
http:// www.plantphysiol.org/cgi/content/full/148/3/1189/TBL1

The first four rows are (abbreviating the column titles):
Chromosome   length(Mb)   Obs_No._Genes   Exp_No._Genes  Distribution_Test
   LG I                32.16           36                      30              
0.137
   LG II               23.44           25                      22              
0.235
   LG III              17.45           13                      17              
0.235
   LG IV              15.08           10                      14              
0.159

They are (somehow) working out the probability above or below the observed
count, depending on wther or not the observed count happened to be above or
below the expected! This means their p-values are on average about half what
they should be. 

However, they don't seem to be using Poisson probabilities (as they claim) -
I can't reproduce the 
results with a Poisson distribution. Nor can I reproduce them with a normal
approximation, nor with a normal approximation with continuity correction.
It's not at all clear how they've got their numbers, but they definitely
don't seem to have done what they claim to have done (see below).

The note under Table 1 says (using some sort of "pseudo" LaTeX to
indicate
the greek letters)
"for distribution test $P(m(i,j) < \lambda(i,j)) <= \alpha$ or 
$P(m(i,j) >
\lambda(i,j)) < \alpha$, a Poisson distribution was used to determine the
significance of the F-box gene distribution in the Populus genome"

-- that is, if they did what this says, they're working out probabilities of
MORE extreme, not "at least as extreme", a second error.

[Note that (it's stated later) the expected numbers are estimated. The
don't
seem to take this into account (though the effect may be small). They're
also doing a whole bunch of tests in this paper - 19 "p-values" in
table 1,
another 273 in table 3, 18 more in table 4, 28 more in table 5, another 24
in table 7 ... and so on.]

Under the "Materials and Methods" section, subsection
"Localization of F-Box
Genes in the Genome", they claim that the "probabilities $P(m(i) <
\lambda(i))$ and $P(m(i) > \lambda(i))$ were evaluated under the cumulative
Posson distribution at \alpha <= 0.05 and  \alpha <= 0.01 significance
levels."

Which is weird, because the actual p-values they seem to regard as worth
mentioning in the paper are the ones at or below 0.0001

[Is this sort of thing pretty typical for papers in biology? Don't referees
even do a basic check of one or two numbers? Apparently I put too much
effort into refereeing.]

Can anyone see what they're actually doing?
-- 
View this message in context:
http://www.nabble.com/How-to-do-poisson-distribution-test-like-this--tp24696413p24711764.html
Sent from the R help mailing list archive at Nabble.com.

glen_b

2009-Jul-29 03:28 UTC

head link

[R] How to do poisson distribution test like this?

Corrected links (the originals somehow aquired an extra space, sorry)

Paper: http://www.plantphysiol.org/cgi/content/full/148/3/1189
Table I: http://www.plantphysiol.org/cgi/content/full/148/3/1189/TBL1

-- 
View this message in context:
http://www.nabble.com/How-to-do-poisson-distribution-test-like-this--tp24696413p24711818.html
Sent from the R help mailing list archive at Nabble.com.

(Ted Harding)

2009-Jul-29 11:43 UTC

head link

[R] How to do poisson distribution test like this?

Follow-up:

On 29-Jul-09 03:28:24, glen_b wrote:> Corrected links (the originals somehow aquired an extra space, sorry)
> 
> Paper: http://www.plantphysiol.org/cgi/content/full/148/3/1189
> Table I: http://www.plantphysiol.org/cgi/content/full/148/3/1189/TBL1
> --
I have now scanned though the Paper, and find the explanation of
their calculation of Expected Length (which agrees with the method
I described in my previous email) in a somewhat garbled description
on page 1198 (page 10/12 of the PDF file):

  "The expected gene number lambda_i on chromosome i would be
   a sample from a Poisson distribution lambda_i = m*L_i/Sum L_i
   where m is the total number of genes detected within the
   assembled sequences and L_i is the length of chromosome i."

I agree with your earlier comment, Glen, that the paper would have
benefited from more careful scrutiny by referees/editor!

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 29-Jul-09                                       Time: 12:43:26
------------------------------ XFMail ------------------------------

Maybe Matching Threads

Search for more seemingly similar threads

R help - Jul 2009 - How to do poisson distribution test like this?

[R] How to do poisson distribution test like this?

[R] How to do poisson distribution test like this?

[R] How to do poisson distribution test like this?

[R] How to do poisson distribution test like this?

[R] How to do poisson distribution test like this?

Maybe Matching Threads