thr3ads.net - R help - [R] Drop observations in unbalanced panel data set according to missing values [May 2010]

If this information is useful, please help other people find it:
Share via:

Christian Schoder

2010-May-28 21:58 UTC

[R] Drop observations in unbalanced panel data set according to missing values

Dear R-users,

I use firm-level data in panel structure. I would like to drop all firms that
have less than x observations over the time scale in any of the variables
considered. I would appreciate any help that (a) indicates relevant literature
or websites or (b) indicates the code that could solve the problem.

Here, a detailed illustration of my problem: My data set is of the
form> df   id  y  z
1   a  1  1
2   b NA  2
3   b  3  3
4   c  2  2
5   c  4  4
6   c  5 NA
7   d  6 NA
8   d  5  5
9   d  6  6
10  d  7  7
11  e NA NA
12  e NA  4
13  e  3  3
where id is the index of the firm, and y and z are observations such as assets
and sales. Now I would like to apply a procedure that drops all firms which have
less then 2 observed realizations in y or z. Thus, it should give me a
data.frame which looks like> df1   id  y  z
1   c  2  2
2   c  4  4
3   c  5 NA
4   d  6 NA
5   d  5  5
6   d  6  6
7   d  7  7

Thank you very much!
Christian Schoder

David Winsemius

2010-May-28 22:28 UTC

head link

[R] Drop observations in unbalanced panel data set according to missing values

On May 28, 2010, at 5:58 PM, Christian Schoder wrote:
> Dear R-users,
>
> I use firm-level data in panel structure. I would like to drop all  
> firms that have less than x observations over the time scale in any  
> of the variables considered. I would appreciate any help that (a)  
> indicates relevant literature or websites or (b) indicates the code  
> that could solve the problem.
>
> Here, a detailed illustration of my problem: My data set is of the  
> form
>> df
>   id  y  z
> 1   a  1  1
> 2   b NA  2
> 3   b  3  3
> 4   c  2  2
> 5   c  4  4
> 6   c  5 NA
> 7   d  6 NA
> 8   d  5  5
> 9   d  6  6
> 10  d  7  7
> 11  e NA NA
> 12  e NA  4
> 13  e  3  3
> where id is the index of the firm, and y and z are observations such  
> as assets and sales. Now I would like to apply a procedure that  
> drops all firms which have less then 2 observed realizations in y or  
> z.

I try to avoid naming objects with  common function names like df:

 > dfrm$nrecy <- ave(dfrm$y , dfrm$id, FUN=function(x) sum(!is.na(x)) )
 > dfrm$nrecz <- ave(dfrm$z , dfrm$id, FUN=function(x) sum(!is.na(x)) )
 > dfrm
    id  y  z nrecy nrecz
1   a  1  1     1     1
2   b NA  2     1     2
3   b  3  3     1     2
4   c  2  2     3     2
5   c  4  4     3     2
6   c  5 NA     3     2
7   d  6 NA     4     3
8   d  5  5     4     3
9   d  6  6     4     3
10  d  7  7     4     3
11  e NA NA     1     2
12  e NA  4     1     2
13  e  3  3     1     2
 > dfrm[with(dfrm, pmin(nrecy, nrecz)>1), ]
    id y  z nrecy nrecz
4   c 2  2     3     2
5   c 4  4     3     2
6   c 5 NA     3     2
7   d 6 NA     4     3
8   d 5  5     4     3
9   d 6  6     4     3
10  d 7  7     4     3

Now it does not thereby assure that you will have at least 2 of each  
id with complete observationssince. But if you wanted a solution to  
that problem you would need a better testing data.frame.

> Thus, it should give me a data.frame which looks like
>> df1
>   id  y  z
> 1   c  2  2
> 2   c  4  4
> 3   c  5 NA
> 4   d  6 NA
> 5   d  5  5
> 6   d  6  6
> 7   d  7  7
>
> Thank you very much!
> Christian Schoder
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT

R help - May 2010 - Drop observations in unbalanced panel data set according to missing values

[R] Drop observations in unbalanced panel data set according to missing values

[R] Drop observations in unbalanced panel data set according to missing values