Dear all, There is a simple question regarding gene set enrichment analysis. Say, we have a simple denominator and numerator, therefore hypergeometric test looks like: p=phyper(white-1,total white,total black,drawn). However, there is a question regarding database size. Say, my denominator (total genes on array) is equal to 10000. However, database (say GO database) harbor only 8000 from this 10000. The question is should I subtract genes from all values in phyper that do not fall into the database? By other words: original function ie: phyper(50,200,9800,500). subtract genes that didn't fall into database for example: phyper(50,180,7700,400). Should I correct my gene lists with database records? Which way is correct? Thank you in advance for the replies.