thr3ads.net - R help - [R] Subset according to groups NA proportion within specific variables [Feb 2011]

If this information is useful, please help other people find it:
Share via:

D. Alain

2011-Feb-21 11:20 UTC

[R] Subset according to groups NA proportion within specific variables

Dear R-List, 

I have a dataframe with one grouping variable (x) and three response variables
(y,z,w).

df<-data.frame(x=c(rep(1,3),rep(2,4),rep(3,5)),y=rnorm(12),z=c(3,4,5,NA,NA,NA,NA,1,2,1,2,1),w=c(1,2,3,3,4,3,5,NA,5,NA,7,8))
>df     x            y            z     w
     1      0.29306106  3      1
     1      0.54797780  4      2
     1     -1.38365548  5      3
     2     -0.20407986 NA    3
     2     -0.87322574 NA    4
     2     -1.23356250 NA    3
     2      0.43929374 NA    5
     3      1.16405483  1    NA
     3      1.07083464  2     5
     3     -0.67463191  1    NA
     3     -0.66410552  2     7
     3     -0.02543358  1     8

Now I want to make a new dataframe df.sub comprising only cases pertaining to
 groups, where the overall proportion of NAs in either of the response variables
y,z,w does not exceed 50%.

In the above example, e.g., this would be a dataframe with all cases of the
groups 1 and 3 (since there are 100% NAs in z for group 2)
>df.sub     x            y            z     w
     1      0.29306106   3      1
     1      0.54797780   4      2
     1     -1.38365548   5      3
      3      1.16405483   1    NA
     3      1.07083464   2     5
     3     -0.67463191   1    NA
     3     -0.66410552   2     7
     3     -0.02543358   1     8

Please excuse me if the problem has already been treated somewhere, but so far I
was not able to find the right threat for my question in RSeek.

Can anyone help? 

Thanks in advance!

D. Alain



	[[alternative HTML version deleted]]

Karl Ove Hufthammer

2011-Feb-21 12:05 UTC

head link

[R] Subset according to groups NA proportion within specific variables

D. Alain wrote:
> Now I want to make a new dataframe df.sub comprising only cases pertaining
> to groups, where the overall proportion of NAs in either of the response
> variables y,z,w does not exceed 50%.
One simple example:

library(plyr)
na.prop = function(x) data.frame(x, missing=nrow(na.omit(x))/nrow(x) )
newdf = ddply(df, .(x), na.prop)

Now you can use ?subset? on ?newdf? to obtain the required rows.

(For very large data sets it may be better to not create an entire data 
frame in ?na.prop?, duplicating the data in ?df?, but instead just return 
the proportion.)
 
-- 
Karl Ove Hufthammer

Dennis Murphy

2011-Feb-21 12:14 UTC

head link

[R] Subset according to groups NA proportion within specific variables

Hi:

Here's one way with package plyr:

df<-data.frame(x=c(rep(1,3),rep(2,4),rep(3,5)),
               y=rnorm(12),
               z=c(3,4,5,NA,NA,NA,NA,1,2,1,2,1),
               w=c(1,2,3,3,4,3,5,NA,5,NA,7,8))

library(plyr)
fun <- function(d) {
   u <- apply(d[, -1], 2, function(y) sum(is.na(y)))/nrow(d)
   if(all(u <= 0.5)) return(d)
  }

ddply(df, 'x', fun)> ddply(df, 'x', fun)  x           y z  w
1 1 -1.22768415 3  1
2 1  0.03108696 4  2
3 1  0.90246871 5  3
4 3 -0.47387908 1 NA
5 3  1.59577665 2  5
6 3 -0.80792438 1 NA
7 3  0.20927614 2  7
8 3 -0.46172477 1  8


On Mon, Feb 21, 2011 at 3:20 AM, D. Alain <dialvac-r@yahoo.de> wrote:
> Dear R-List,
>
> I have a dataframe with one grouping variable (x) and three response
> variables (y,z,w).
>
>
>
df<-data.frame(x=c(rep(1,3),rep(2,4),rep(3,5)),y=rnorm(12),z=c(3,4,5,NA,NA,NA,NA,1,2,1,2,1),w=c(1,2,3,3,4,3,5,NA,5,NA,7,8))
>
> >df
>      x            y            z     w
>      1      0.29306106  3      1
>      1      0.54797780  4      2
>      1     -1.38365548  5      3
>      2     -0.20407986 NA    3
>      2     -0.87322574 NA    4
>      2     -1.23356250 NA    3
>      2      0.43929374 NA    5
>      3      1.16405483  1    NA
>      3      1.07083464  2     5
>      3     -0.67463191  1    NA
>      3     -0.66410552  2     7
>      3     -0.02543358  1     8
>
> Now I want to make a new dataframe df.sub comprising only cases pertaining
> to
>  groups, where the overall proportion of NAs in either of the response
> variables y,z,w does not exceed 50%.
>
> In the above example, e.g., this would be a dataframe with all cases of the
> groups 1 and 3 (since there are 100% NAs in z for group 2)
>
> >df.sub
>      x            y            z     w
>      1      0.29306106   3      1
>      1      0.54797780   4      2
>      1     -1.38365548   5      3
>       3      1.16405483   1    NA
>      3      1.07083464   2     5
>      3     -0.67463191   1    NA
>      3     -0.66410552   2     7
>      3     -0.02543358   1     8
>
> Please excuse me if the problem has already been treated somewhere, but so
> far I was not able to find the right threat for my question in RSeek.
>
> Can anyone help?
>
> Thanks in advance!
>
> D. Alain
>
>
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
	[[alternative HTML version deleted]]

Dimitris Rizopoulos

2011-Feb-21 12:23 UTC

head link

[R] Subset according to groups NA proportion within specific variables

one way is the following:

DF <- data.frame(x = c(rep(1,3),rep(2,4),rep(3,5)),
     y = rnorm(12), z = c(3,4,5,NA,NA,NA,NA,1,2,1,2,1),
     w = c(1,2,3,3,4,3,5,NA,5,NA,7,8)
)

na.ind <- sapply(DF[-1], is.na)
na.ind <- ave(na.ind, rep(DF$x, 3), col(na.ind)) < 0.5
DF[apply(na.ind, 1, all), ]


I hope it helps.

Best,
Dimitris


On 2/21/2011 12:20 PM, D. Alain wrote:> Dear R-List,
>
> I have a dataframe with one grouping variable (x) and three response
variables (y,z,w).
>
>
df<-data.frame(x=c(rep(1,3),rep(2,4),rep(3,5)),y=rnorm(12),z=c(3,4,5,NA,NA,NA,NA,1,2,1,2,1),w=c(1,2,3,3,4,3,5,NA,5,NA,7,8))
>
>> df
>       x            y            z     w
>       1      0.29306106  3      1
>       1      0.54797780  4      2
>       1     -1.38365548  5      3
>       2     -0.20407986 NA    3
>       2     -0.87322574 NA    4
>       2     -1.23356250 NA    3
>       2      0.43929374 NA    5
>       3      1.16405483  1    NA
>       3      1.07083464  2     5
>       3     -0.67463191  1    NA
>       3     -0.66410552  2     7
>       3     -0.02543358  1     8
>
> Now I want to make a new dataframe df.sub comprising only cases pertaining
to
>   groups, where the overall proportion of NAs in either of the response
variables y,z,w does not exceed 50%.
>
> In the above example, e.g., this would be a dataframe with all cases of the
groups 1 and 3 (since there are 100% NAs in z for group 2)
>
>> df.sub
>       x            y            z     w
>       1      0.29306106   3      1
>       1      0.54797780   4      2
>       1     -1.38365548   5      3
>        3      1.16405483   1    NA
>       3      1.07083464   2     5
>       3     -0.67463191   1    NA
>       3     -0.66410552   2     7
>       3     -0.02543358   1     8
>
> Please excuse me if the problem has already been treated somewhere, but so
far I was not able to find the right threat for my question in RSeek.
>
> Can anyone help?
>
> Thanks in advance!
>
> D. Alain
>
>
>
> 	[[alternative HTML version deleted]]
>
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web: http://www.erasmusmc.nl/biostatistiek/

Possibly Parallel Threads

Search for more maybe matching threads

R help - Feb 2011 - Subset according to groups NA proportion within specific variables

[R] Subset according to groups NA proportion within specific variables

[R] Subset according to groups NA proportion within specific variables

[R] Subset according to groups NA proportion within specific variables

[R] Subset according to groups NA proportion within specific variables

Possibly Parallel Threads