Why do you think the mean should not change when you change more of the data
values? At any rate, your function agrees with winsor.mean() in package
psych:
> library(psych)
> winsor.mean(a, trim=.05)
[1] 95.8525> winsor.mean(a, trim=.06)
[1] 95.9064> winsor.mean(a, trim=.07)
[1] 95.91> winsor.mean(a, trim=.08)
[1] 95.7628
You do have a typo in this line:
x <- x[!i.na] # Should be x <- x[!is.na]
and you have a bug when k=0. It should give the same result as mean(a), but
it does not because your function adds an additional value to the end of the
data equal to the maximum.
----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of dms at riseup.net
> Sent: Wednesday, November 14, 2012 3:05 PM
> To: r-help at r-project.org
> Subject: [R] Winsorisation function
>
>
> Dear all,
> someone can find what I doing wrong with the following function. It is
> for winsorisation mean. At my eyes it is ok, but for reason I sometimes
> it
> is changing the results when I change the k value.
>
> wmean <-
> function (x, na.rm = FALSE, k = 1) {
> if (any(i.na <- is.na(x))) {
> if (na.rm)
> x <- x[!i.na]
> else return(NA)
> }
> n <- length(x)
> if (!(k %in% (0:n)))
> stop("Invalid argument for 'k'.")
> x <- sort(x)
> x[1:k] <- x[k+1] # Here I solve the lower values
> x[(n-k+1):n] <- x[n-k] #Then I go over the higher ones
> return(mean(x))
> }
>
>
> set.seed(51)
> a<- sample(200,100)
>
> > wmean(a, k=5)
> [1] 95.85
> > wmean(a, k=6)
> [1] 95.91
> > wmean(a, k=7)
> [1] 95.91
> > wmean(a, k=8)
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.