thr3ads.net - R help - [R] How to count number of year per firm in panel data? [Feb 2009]

If this information is useful, please help other people find it:
Share via:

Johannes Habel

2009-Feb-11 16:43 UTC

[R] How to count number of year per firm in panel data?

Hello,

I have an unbalanced panel dataset and would like to exclude all objects
that don't appear at least x times.

Therefore, I would like to include a column indicating for every line how
many periods are available, e.g.

id, year, number
1, 2000, 3
1, 2001, 3
1, 2002, 3
2, 2001, 1
3, ..., ...

This would allow me to exclude companies by setting
"subset=number>=x".
However, I don't know how to create this column, i.e. how to count the years
for each object and include the numbers into the dataset.

Could anybody help me, please?

Alternatively, is there an easier way to achieve my goal?

Thank you very much.

Johannes Habel
-- 
View this message in context:
http://www.nabble.com/How-to-count-number-of-year-per-firm-in-panel-data--tp21958252p21958252.html
Sent from the R help mailing list archive at Nabble.com.

Marc Schwartz

2009-Feb-11 17:38 UTC

head link

[R] How to count number of year per firm in panel data?

on 02/11/2009 10:43 AM Johannes Habel wrote:> Hello,
> 
> I have an unbalanced panel dataset and would like to exclude all objects
> that don't appear at least x times.
> 
> Therefore, I would like to include a column indicating for every line how
> many periods are available, e.g.
> 
> id, year, number
> 1, 2000, 3
> 1, 2001, 3
> 1, 2002, 3
> 2, 2001, 1
> 3, ..., ...
> 
> This would allow me to exclude companies by setting
"subset=number>=x".
> However, I don't know how to create this column, i.e. how to count the
years
> for each object and include the numbers into the dataset.
> 
> Could anybody help me, please?
> 
> Alternatively, is there an easier way to achieve my goal?
> 
> Thank you very much.
> 
> Johannes Habel
You don't need to add the extra column. You can just create a frequency
table of the unique 'id' values, get the subset of values that meet your
count criteria and then use those values in subset().

Let's create a little larger dataset:

set.seed(1)
id <- sample(letters[1:4], 8, replace = TRUE)
> id[1] "b" "b" "c" "d" "a"
"d" "d" "c"


years <- unlist(lapply(split(id, id),
                       function(i) 2000:(2000 + length(i) - 1)))
> years   a   b1   b2   c1   c2   d1   d2   d3
2000 2000 2001 2000 2001 2000 2001 2002


DF <- data.frame(id = sort(id), year = years)
> DF   id year
a   a 2000
b1  b 2000
b2  b 2001
c1  c 2000
c2  c 2001
d1  d 2000
d2  d 2001
d3  d 2002
> subset(DF, id %in% names(which(table(DF$id) >= 3)))   id year
d1  d 2000
d2  d 2001
d3  d 2002


Step by step:
> table(DF$id)
a b c d
1 2 2 3

> table(DF$id) >= 3
    a     b     c     d
FALSE FALSE FALSE  TRUE

> which(table(DF$id) >= 3)d
4

> names(which(table(DF$id) >= 3))[1] "d"


Then use subset() as above, filtering only those id's that are in the
names from the table. If we change the requirement to >= 2:
> subset(DF, id %in% names(which(table(DF$id) >= 2)))   id year
b1  b 2000
b2  b 2001
c1  c 2000
c2  c 2001
d1  d 2000
d2  d 2001
d3  d 2002


See ?table, ?names, ?which and ?"%in%"


If you really need to add the column, you could use aggregate() to get a
count of years for each id as a data frame, then use merge() to add the
column to DF:
> aggregate(DF$year, list(id = DF$id), length)  id x
1  a 1
2  b 2
3  c 2
4  d 3

> merge(DF, aggregate(DF$year, list(id = DF$id), length), by =
"id")  id year x
1  a 2000 1
2  b 2000 2
3  b 2001 2
4  c 2000 2
5  c 2001 2
6  d 2000 3
7  d 2001 3
8  d 2002 3

Then use subset() as you initially considered.

See ?aggregate and ?merge

HTH,

Marc Schwartz

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Feb 2009 - How to count number of year per firm in panel data?

[R] How to count number of year per firm in panel data?

[R] How to count number of year per firm in panel data?

Possibly Parallel Threads