thr3ads.net - R help - [R] Collapsing panel data [Feb 2009]

If this information is useful, please help other people find it:
Share via:

Karina Knaus

2009-Feb-03 08:43 UTC

[R] Collapsing panel data

Dear R-helpers,

I've been thinking about this for some time, maybe someone can help. I have
a fairly large dataset with thousands of firms, call the a, b, c, etc..
such as

  [,1]       [,2]
[1,] "A"    0.5
[2,] ""     0.2
[3,] ""     0.3
[4,] "B"    0.1
[5,] ""     0.9
[6,] "C"    0.4

Or to put it differently two vectors such as

y <- c("A", "", "", "B",
"", "C")
x <- c(0.5, 0.2, 0.3, 0.1, 0.9, 0.4)

The empty lines "" always belong to the firm above. Now I want to
collapse
the dataset so that each firm (A,B, C, etc) has one line only, using
summation.

So what I would like is

yNew <- c("A", "B", "C")
xNew <- c(1, 1, 0.4)

The problem I'm having is that each firm has a different number of entries
for x, so some like C have just one and others have ten or more, so I have
difficulty imagining how to use a loop in this case.
I'd be greatful for any suggestions.
Karina

Petr PIKAL

2009-Feb-03 09:10 UTC

head link

[R] Odp: Collapsing panel data

Hi

r-help-bounces at r-project.org napsal dne 03.02.2009 09:43:04:
> 
> 
> Dear R-helpers,
> 
> I've been thinking about this for some time, maybe someone can help. I 
have> a fairly large dataset with thousands of firms, call the a, b, c, etc..
> such as
> 
>   [,1]       [,2]
> [1,] "A"    0.5
> [2,] ""     0.2
> [3,] ""     0.3
> [4,] "B"    0.1
> [5,] ""     0.9
> [6,] "C"    0.4
> 
> Or to put it differently two vectors such as
> 
> y <- c("A", "", "", "B",
"", "C")
> x <- c(0.5, 0.2, 0.3, 0.1, 0.9, 0.4)
> 
> The empty lines "" always belong to the firm above. Now I want to
collapse> the dataset so that each firm (A,B, C, etc) has one line only, using
> summation.
> 
> So what I would like is
> 
> yNew <- c("A", "B", "C")
> xNew <- c(1, 1, 0.4)
That is what are NA values for. There are quite useful functions for 
handling them.

y <- c("A", "", "", "B",
"", "C")
x <- c(0.5, 0.2, 0.3, 0.1, 0.9, 0.4)
y[y==""]<-NA

from package zoo
y.na<-na.locf(y)

tapply(x,y.na, sum)
  A   B   C 
1.0 1.0 0.4 

or aggregate(...)

Regards
Petr


> 
> The problem I'm having is that each firm has a different number of 
entries> for x, so some like C have just one and others have ten or more, so I 
have> difficulty imagining how to use a loop in this case.
> I'd be greatful for any suggestions.
> Karina
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.

Apparently Analagous Threads

Search for more maybe matching threads

R help - Feb 2009 - Collapsing panel data

[R] Collapsing panel data

[R] Odp: Collapsing panel data

Apparently Analagous Threads