Leif Kirschenbaum
2006-Mar-23  08:46 UTC
[R] Estimation of skewness from quantiles of near-normal distribution
I have summary statistics from many sets (10,000's) of near-normal continuous data. From previously generated QQplots of these data I can visually see that most of them are normal with a few which are not normal. I have the raw data for a few (700) of these sets. I have applied several tests of normality, skew, and kurtosis to these sets to see which test might yield a parameter which identifies the sets which are visibly non-normal on the QQplot. My conclusions thus far has been that the skew is the best determinant of non-normality for these particular data. Given that I do not have ready access to the sets (10,000's) of data, only to summary statistics which have been calculated on these sets, is there a method by which I may estimate the skew given the following summary statistics: 0.1% 1% 5% 10% 25% 75% 90% 95% 99% 99.9% mean median N sigma N is usually about 900, and so I would discount the 0.1%, 1%, 99%, and 99.9% quantiles as unreliable due to noisiness in the distributions. I know that for instance there are general rules for calculated sigma of a normal distribution given quantiles, and so am wondering if there are any general rules for calculating skew given a set of quantiles, mean, and sigma. I am currently thinking of trying polynomial fits on the QQplot using the raw data I have and then empirically trying to derive a relationship between the quantiles and the skew. Thank you for any ideas. Leif Kirschenbaum Senior Yield Engineer Reflectivity, Inc. (408) 737-8100 x307 leif at reflectivity.com
kumar zaman
2006-Mar-23  17:51 UTC
[R] Estimation of skewness from quantiles of near-normal distribution
This pertains to the first paragraph, you can use Dagostino test which is an omnibus test combining both skewness and kurtosis and has a high power, istead of only skewness of the data. Try ?dagoTest Ahmed Leif Kirschenbaum <leif@reflectivity.com> wrote: I have summary statistics from many sets (10,000's) of near-normal continuous data. From previously generated QQplots of these data I can visually see that most of them are normal with a few which are not normal. I have the raw data for a few (700) of these sets. I have applied several tests of normality, skew, and kurtosis to these sets to see which test might yield a parameter which identifies the sets which are visibly non-normal on the QQplot. My conclusions thus far has been that the skew is the best determinant of non-normality for these particular data. Given that I do not have ready access to the sets (10,000's) of data, only to summary statistics which have been calculated on these sets, is there a method by which I may estimate the skew given the following summary statistics: 0.1% 1% 5% 10% 25% 75% 90% 95% 99% 99.9% mean median N sigma N is usually about 900, and so I would discount the 0.1%, 1%, 99%, and 99.9% quantiles as unreliable due to noisiness in the distributions. I know that for instance there are general rules for calculated sigma of a normal distribution given quantiles, and so am wondering if there are any general rules for calculating skew given a set of quantiles, mean, and sigma. I am currently thinking of trying polynomial fits on the QQplot using the raw data I have and then empirically trying to derive a relationship between the quantiles and the skew. Thank you for any ideas. Leif Kirschenbaum Senior Yield Engineer Reflectivity, Inc. (408) 737-8100 x307 leif@reflectivity.com ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html --------------------------------- [[alternative HTML version deleted]]