thr3ads.net - R help - [R] help please: put output into dataframe [Mar 2011]

If this information is useful, please help other people find it:
Share via:

Ram H. Sharma

2011-Mar-18 02:04 UTC

[R] help please: put output into dataframe

Dear R community members

I have been struggling on this simple question, but never get appropriate
solution. So please help.

 # my data, though I have a large number of variables
var1 <- rnorm(500, 10,4)
var2 <- rnorm(500, 20, 8)
var3 <- rnorm(500, 30, 18)
var4 <- rnorm(500, 40, 20)
datafr1 <- data.frame(var1, var2, var3, var4)

# my unsuccessful codes
 nvar <- ncol(datafr1)
for (i in 1:nvar) {
              out1 <- NULL
              out2 <- NULL
              medianx <- median(getdata[,i], na.rm = TRUE)
              show(madx <- mad(getdata[,i], na.rm = TRUE))
              MD1 <- c(medianx + 2*madx)
              MD2 <- c(medianx - 2*madx)
              out1[i] <- which(getdata[,i] > MD1) # store data that are
greater than median + 2 mad
              out2[i] <- which (getdata[,1] < MD2) # store data that are
greater than median - 2 mad
             resultdf <- data.frame(out1, out2)
             write.table (resultdf, "out.csv", sep=",")
              }


My idea here is to store those value which are either greater than median +
2 *MAD or less than median - 2*MAD. Each variable have different length of
output.

The following last error message:
Error in data.frame(out1, out2) :
  arguments imply differing number of rows: 2, 0
In addition: Warning messages:
1: In out1[i] <- which(getdata[, i] > MD1) :
  number of items to replace is not a multiple of replacement length
2: In out2[i] <- which(getdata[, 1] < MD2) :
  number of items to replace is not a multiple of replacement length
3: In out1[i] <- which(getdata[, i] > MD1) :
  number of items to replace is not a multiple of replacement length

Thank you in advance for helping me.

Best regards;
RHS

	[[alternative HTML version deleted]]

Dennis Murphy

2011-Mar-18 07:16 UTC

head link

[R] help please: put output into dataframe

Hi:

Is this what you're after?

fout <- function(x) {
     lim <- median(x) + c(-2, 2) * mad(x)
     x[x < lim[1] | x > lim[2]]
   }> apply(datafr1, 2, fout)$var1
 [1] 17.5462078 18.4548214  0.7083442  1.9207578 -1.2296787 17.4948240
 [7] 19.5702558  1.6181150 20.9791652 -1.3542099  1.8215087 -1.0296303
[13] 20.5237930 17.5366497 18.5657566  0.9335419 19.7519983 17.8607968
[19] 19.1307524 19.6145711 21.8037136 19.1532175 -2.6688409 19.6949309
[25]  1.9712347

$var2
 [1]  37.3822087  35.6490641  35.6000785  38.5981086  -1.6504275  37.1419290
 [7]  37.7605230  40.3508689   0.6639900   2.4695841  38.8209491  39.9087921
[13]  38.9907585  35.8279437   2.7870799  37.0941113   0.6308583  36.4556638
[19] -10.2384849   2.8480199  -7.7680457  35.7076539  -0.5467739   3.4702765
[25]  40.4818580   3.2864273   1.4917174

$var3
 [1]  74.252563  68.396391  68.845461  -5.006545  66.083402  76.036577
 [7]  75.112586  -6.374241  63.883549  64.041216 -19.764360 -15.051017
[13]  -9.782767  64.696013  70.970648  -4.562031 -22.135003  70.549310
[19]  69.495915  -4.095587  86.612375  87.029526  70.072126  -6.421695
[25]  65.737536

$var4
 [1]  81.476483  87.098767 -10.451616  91.927329  86.588952  85.080950
 [7]  84.958645  -9.456368  86.270876 -22.936779  83.314032

Double checks:> apply(datafr1, 2, function(x) median(x) + c(-2, 2) * mad(x))         var1      var2      var3      var4
[1,]  2.12167  3.779415 -3.736066 -3.471752
[2,] 17.37176 34.929800 62.969733 80.224799> apply(datafr1, 2, range)          var1      var2      var3      var4
[1,] -2.668841 -10.23848 -22.13500 -22.93678
[2,] 21.803714  40.48186  87.02953  91.92733

Assuming you wanted to do this columnwise (by variable), it appears to be
doing the right thing.

HTH,
Dennis


On Thu, Mar 17, 2011 at 7:04 PM, Ram H. Sharma
<sharma.ram.h@gmail.com>wrote:
> Dear R community members
>
> I have been struggling on this simple question, but never get appropriate
> solution. So please help.
>
>  # my data, though I have a large number of variables
> var1 <- rnorm(500, 10,4)
> var2 <- rnorm(500, 20, 8)
> var3 <- rnorm(500, 30, 18)
> var4 <- rnorm(500, 40, 20)
> datafr1 <- data.frame(var1, var2, var3, var4)
>
> # my unsuccessful codes
>  nvar <- ncol(datafr1)
> for (i in 1:nvar) {
>              out1 <- NULL
>              out2 <- NULL
>              medianx <- median(getdata[,i], na.rm = TRUE)
>              show(madx <- mad(getdata[,i], na.rm = TRUE))
>              MD1 <- c(medianx + 2*madx)
>              MD2 <- c(medianx - 2*madx)
>              out1[i] <- which(getdata[,i] > MD1) # store data that
are
> greater than median + 2 mad
>              out2[i] <- which (getdata[,1] < MD2) # store data that
are
> greater than median - 2 mad
>             resultdf <- data.frame(out1, out2)
>             write.table (resultdf, "out.csv", sep=",")
>              }
>
>
> My idea here is to store those value which are either greater than median +
> 2 *MAD or less than median - 2*MAD. Each variable have different length of
> output.
>
> The following last error message:
> Error in data.frame(out1, out2) :
>  arguments imply differing number of rows: 2, 0
> In addition: Warning messages:
> 1: In out1[i] <- which(getdata[, i] > MD1) :
>  number of items to replace is not a multiple of replacement length
> 2: In out2[i] <- which(getdata[, 1] < MD2) :
>  number of items to replace is not a multiple of replacement length
> 3: In out1[i] <- which(getdata[, i] > MD1) :
>  number of items to replace is not a multiple of replacement length
>
> Thank you in advance for helping me.
>
> Best regards;
> RHS
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Maybe Matching Threads

Search for more reasonably related threads

R help - Mar 2011 - help please: put output into dataframe

[R] help please: put output into dataframe

[R] help please: put output into dataframe

Maybe Matching Threads