Hi:
Is this what you're after?
fout <- function(x) {
lim <- median(x) + c(-2, 2) * mad(x)
x[x < lim[1] | x > lim[2]]
}> apply(datafr1, 2, fout)
$var1
[1] 17.5462078 18.4548214 0.7083442 1.9207578 -1.2296787 17.4948240
[7] 19.5702558 1.6181150 20.9791652 -1.3542099 1.8215087 -1.0296303
[13] 20.5237930 17.5366497 18.5657566 0.9335419 19.7519983 17.8607968
[19] 19.1307524 19.6145711 21.8037136 19.1532175 -2.6688409 19.6949309
[25] 1.9712347
$var2
[1] 37.3822087 35.6490641 35.6000785 38.5981086 -1.6504275 37.1419290
[7] 37.7605230 40.3508689 0.6639900 2.4695841 38.8209491 39.9087921
[13] 38.9907585 35.8279437 2.7870799 37.0941113 0.6308583 36.4556638
[19] -10.2384849 2.8480199 -7.7680457 35.7076539 -0.5467739 3.4702765
[25] 40.4818580 3.2864273 1.4917174
$var3
[1] 74.252563 68.396391 68.845461 -5.006545 66.083402 76.036577
[7] 75.112586 -6.374241 63.883549 64.041216 -19.764360 -15.051017
[13] -9.782767 64.696013 70.970648 -4.562031 -22.135003 70.549310
[19] 69.495915 -4.095587 86.612375 87.029526 70.072126 -6.421695
[25] 65.737536
$var4
[1] 81.476483 87.098767 -10.451616 91.927329 86.588952 85.080950
[7] 84.958645 -9.456368 86.270876 -22.936779 83.314032
Double checks:> apply(datafr1, 2, function(x) median(x) + c(-2, 2) * mad(x))
var1 var2 var3 var4
[1,] 2.12167 3.779415 -3.736066 -3.471752
[2,] 17.37176 34.929800 62.969733 80.224799> apply(datafr1, 2, range)
var1 var2 var3 var4
[1,] -2.668841 -10.23848 -22.13500 -22.93678
[2,] 21.803714 40.48186 87.02953 91.92733
Assuming you wanted to do this columnwise (by variable), it appears to be
doing the right thing.
HTH,
Dennis
On Thu, Mar 17, 2011 at 7:04 PM, Ram H. Sharma
<sharma.ram.h@gmail.com>wrote:
> Dear R community members
>
> I have been struggling on this simple question, but never get appropriate
> solution. So please help.
>
> # my data, though I have a large number of variables
> var1 <- rnorm(500, 10,4)
> var2 <- rnorm(500, 20, 8)
> var3 <- rnorm(500, 30, 18)
> var4 <- rnorm(500, 40, 20)
> datafr1 <- data.frame(var1, var2, var3, var4)
>
> # my unsuccessful codes
> nvar <- ncol(datafr1)
> for (i in 1:nvar) {
> out1 <- NULL
> out2 <- NULL
> medianx <- median(getdata[,i], na.rm = TRUE)
> show(madx <- mad(getdata[,i], na.rm = TRUE))
> MD1 <- c(medianx + 2*madx)
> MD2 <- c(medianx - 2*madx)
> out1[i] <- which(getdata[,i] > MD1) # store data that
are
> greater than median + 2 mad
> out2[i] <- which (getdata[,1] < MD2) # store data that
are
> greater than median - 2 mad
> resultdf <- data.frame(out1, out2)
> write.table (resultdf, "out.csv", sep=",")
> }
>
>
> My idea here is to store those value which are either greater than median +
> 2 *MAD or less than median - 2*MAD. Each variable have different length of
> output.
>
> The following last error message:
> Error in data.frame(out1, out2) :
> arguments imply differing number of rows: 2, 0
> In addition: Warning messages:
> 1: In out1[i] <- which(getdata[, i] > MD1) :
> number of items to replace is not a multiple of replacement length
> 2: In out2[i] <- which(getdata[, 1] < MD2) :
> number of items to replace is not a multiple of replacement length
> 3: In out1[i] <- which(getdata[, i] > MD1) :
> number of items to replace is not a multiple of replacement length
>
> Thank you in advance for helping me.
>
> Best regards;
> RHS
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]