Roger Leigh
2006-Sep-08 09:41 UTC
[R] Computing skewness and kurtosis with the moments package
Hi, I'm a newcomer to R, having previously used SPSS. One problem I have run into is computing kurtosis. A test dataset is here: http://www.whinlatter.ukfsn.org/2401.dat> library(moments) > data <- read.table("2401.dat", header=T) > attach(data) > loglen <- log10(Length)With SPSS, I get Skewness -0.320 Kurtosis -1.138 With R:> skewness(loglen)[1] -0.317923> kurtosis(loglen)[1] 1.860847 Using the example skew and kurtosis functions from M. J. Crawley's "Statistics: An introduction using R": pp 69 and 72:> mskew(loglen)[1] -0.3158337> mkurtosis(loglen)[1] -1.155441 The kurtosis value here matches the SPSS calculation somewhat more closely, but is still not exactly the same. Looking at the functions, there is some difference between them:> skewnessfunction (x, na.rm = FALSE) { if (is.matrix(x)) apply(x, 2, skewness, na.rm = na.rm) else if (is.vector(x)) { if (na.rm) x <- x[!is.na(x)] n <- length(x) (sum((x - mean(x))^3)/n)/(sum((x - mean(x))^2)/n)^(3/2) } else if (is.data.frame(x)) sapply(x, skewness, na.rm = na.rm) else skewness(as.vector(x), na.rm = na.rm) }> mskewfunction(x) { m3 <- sum((x - mean(x))^3)/length(x) s3 <- sqrt(var(x))^3 m3/s3 }> kurtosisfunction (x, na.rm = FALSE) { if (is.matrix(x)) apply(x, 2, kurtosis, na.rm = na.rm) else if (is.vector(x)) { if (na.rm) x <- x[!is.na(x)] n <- length(x) n * sum((x - mean(x))^4)/(sum((x - mean(x))^2)^2) } else if (is.data.frame(x)) sapply(x, kurtosis, na.rm = na.rm) else kurtosis(as.vector(x), na.rm = na.rm) }> mkurtosisfunction(x) { m4 <- sum((x - mean(x))^4)/length(x) s4 <- var(x)^2 m4/s4 - 3 } Are any of these functions incorrect, or are there several different methods of computing the skew and kurtosis values? Are there any more appropriate R packages I should consider using? Many thanks, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `- GPG Public Key: 0x25BFB848 Please GPG sign your mail. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 188 bytes Desc: not available Url : https://stat.ethz.ch/pipermail/r-help/attachments/20060908/a99f6758/attachment.bin
Ben Bolker
2006-Sep-08 20:24 UTC
[R] Computing skewness and kurtosis with the moments package
Roger Leigh <rleigh <at> whinlatter.ukfsn.org> writes:> > With SPSS, I get > Skewness -0.320 > Kurtosis -1.138 > > With R: > > skewness(loglen) > [1] -0.317923 > > kurtosis(loglen) > [1] 1.860847 > > Using the example skew and kurtosis functions from M. J. Crawley's > "Statistics: An introduction using R": pp 69 and 72: > > > mskew(loglen) > [1] -0.3158337 > > mkurtosis(loglen) > [1] -1.155441 >There are two differences between the R functions; (1) Crawley subtracts 3 from E[x^4]/E[x^2]^2, the kurtosis function in the moments package doesn't. (2) Crawley uses var(x), which is sum((x-mean(x))^2)/(n-1), rather than sum((x-mean(x))^2)/n [I'm not sure I got all the parentheses in the right place.] With these differences corrected the two sets of functions give the same answers. They still don't give exactly the same answers as SPSS, but ... I wouldn't worry about it too much since the uncertainties in the skew and kurtosis are likely to be much larger. from _Numerical Recipes_: if the difference between n and n?1 ever matters to you, then you are probably up to no good anyway - e.g., trying to substantiate a questionable hypothesis with marginal data. cheers Ben Bolker
Possibly Parallel Threads
- moments, skewness, kurtosis
- about comparison of KURTOSIS in package: moments and fBasics
- Generating series of distributions with the same skewness and different kurtosis or with same kurtosis and different skewness?
- Summary : Generate random data from dist. with 0 skewness and some kurtosis
- alternative option in skewness and kurtosis tests?