Hi, I'm hoping someone has some insight about sample size and logit estimation that could help me. I inherited a logit model from a client in the direct marketing area. The previous consultant used approximately 143,800 observations in the training data set, of which only 50 (0.03%) were the target ( = 1) value for the dependent variable. The literature I could find gives very little guidance on sample sizes (Hosmer & Lemeshow have some material, but they basically say that little has been done). Does anyone know of some literature or even rules-of-thumb about sample sizes and/or ratio of target to non-target values of the dependent variable? The use of 143,800 observations is excessive. Does this do anything to the significance of the estimates (e.g., am I always guaranteed very small p-values?)? Is oversampling of the target value the key and if so, how do I calculate weights for the estimations? Any guidance or suggestions in this area are definitely welcome. Walt Paczkowski _________________________________ Walter R. Paczkowski, Ph.D. Data Analytics Corp. 44 Hamilton Lane Plainsboro, NJ 08536 (V) 609-936-8999 (F) 609-936-3733