Hi, I need to analyze the influences of several factors on a variable that is a measure of fecundity, consisting of 73 observations ranging from 0 to 5. The variable is continuous and highly positive skewed, none of the typical transformations was able to normalize the data. Thus, I was thinking in analyzing these data using a generalized linear model where I can specify a distribution other than normal. I'm thinking it may fit a gamma or exponential distribution. But I'm not sure if the data meets the assumptions of those distributions because their definitions are too complex for my understanding! I tried to use R to asses the fit to a particular distribution. I used the fitdistr function from the MASS package and was able to obtain an estimate for the rate for the exponential distribution. But I couldn't get the gamma to work. If I don't provide initial estimates it says "Error in optim (... initial value in 'vmmin' is not finite)", if I provide some initial values it says "Error in optim (... non-finite finite-difference value [1]). I then tried to test the fit of the exponential distribution using the Kolmogorov-Smirnov goodness of fit test (ks.test), but I got the warning message "cannot compute correct p-values with ties". This is strange given that the details for the ks.test says that continuous variables do not generate ties. I'll greatly appreciate any ideas on how to proceed with thisThanks, Andrea _________________________________________________________________ Discover the new Windows Vista [[alternative HTML version deleted]]
andrea previtali wrote:> Hi, > I need to analyze the influences of several factors on a variable that is a measure of fecundity, consisting of 73 observations ranging from 0 to 5. The > variable is continuous and highly positive skewed, none of the typical > transformations was able to normalize the data. Thus, I was thinking in analyzing these data using a generalized linear model where I > can specify a distribution other than normal. I'm thinking it may fit a > gamma or exponential distribution. But I'm not sure if the data meets > the assumptions of those distributions because their definitions are > too complex for my understanding!Roughly, the exponential distribution is the model of a random variable describing the time/distance between two independent events that occur at the same constant rate. The gamma distribution is the model of a random variable that can be thought of as the sum of exponential random variables. I don't think fecundity data, the count of reproductive cells, qualifies as a random variable to be modeled by either of these distributions. If the count of reproductive cells is very large, and you are modeling this count as a function of animal size, such as length, you should consider the lognormal distribution, since the count of cells grow multiplicatively (volumetrically) with the increase in length. In that case you can model your response variable using glm with family=gaussian(link="log"). Rub?n