Folks, I have a situation with many firms, observed for many years
(but it's not panel data, so every now and then data for a firm just
goes missing).
Let me show you an example. There are 3 firms "a", "b" and
"c". There
are 3 years: 1981, 1982 and 1983. There's a factor f which takes
values 1, 2 or 3.
set.seed(5)
D = data.frame(
names=c("a", "a", "a", "b",
"b", "c", "c", "c", "d",
"d"),
year= c( 81, 82, 83, 81, 83, 81, 82, 83, 82, 83),
f= sample(1:3, 10, replace=T)
)
print(D)
What I'd like to do is locate situations where a firm is observed for
two consecutive years, and put together conditional probabilities of
state transition.
Expressed as counts, putting time $t$ as the rows and time $t+1$ as
the columms, I'd like to get the table:
1 2 3
1 1
2 1
3 1 1 1
For example, this says that we only observe one situation (in this
data) where f=1 and then we got to see the next year, and in that year
f was 3. So we have a 100% probability of going from 1 -> 3. In the
case of f=3, we have 3 situations where we observe two consecutive
years where the 1st is f=3, and it turns out that the outcomes are
1,2,3 each happening once.
Expressed as conditional probabilities it is:
1 2 3
1 1
2 1
3 .33 .33 .33
How might one do this? And then, how might one go about getting
confidence intervals for the transition probabilities that are seen in
this transition matrix? I'd like to test the null that the state
variable does not change. I.e., if a firm is in f=2, under the null,
it's just stay at f=2 in the following year. That'd be an identity
matrix for the transition probabilities:
1 2 3
1 1
2 1
3 1
How does one talk about the distribution of the transition
probabilities under this null, and thus get a test?
--
Ajay Shah Consultant
ajayshah at mayin.org Department of Economic Affairs
http://www.mayin.org/ajayshah Ministry of Finance, New Delhi
Regarding how to compile the table, have you considered something
like the following:
> Trans <- array(0, dim=c(3,3))
> for(i in 1:(n-1))
+ if(D$names[i]==D$names[i+1])
+ (Trans[D$f[i], D$f[i+1]] <- Trans[D$f[i], D$f[i+1]]+1)
> Trans> Trans <- array(0, dim=c(3,3))
> for(i in 1:(n-1))
+ if(D$names[i]==D$names[i+1])
+ (Trans[D$f[i], D$f[i+1]] <- Trans[D$f[i], D$f[i+1]]+1)
> Trans
> Trans <- array(0, dim=c(3,3))
> for(i in 1:(n-1))
+ if(D$names[i]==D$names[i+1])
+ (Trans[D$f[i], D$f[i+1]] <- Trans[D$f[i], D$f[i+1]]+1)
> Trans
[,1] [,2] [,3]
[1,] 1 0 1
[2,] 0 0 1
[3,] 1 1 1
This is not the answers you got; I got this in R 1.9.1 under
Windows 2000. I'm sure there are ways to do this without a for loop,
but I couldn't think of one immediately.
Regarding how to analyze, have you searched for "discrete markov
estimation" and "hidden markov estimate" from www.r-project.org
->
search -> "R site search"? There are many possibilities.
hope this helps. spencer graves
Ajay Shah wrote:
>Folks, I have a situation with many firms, observed for many years
>(but it's not panel data, so every now and then data for a firm just
>goes missing).
>
>Let me show you an example. There are 3 firms "a", "b"
and "c". There
>are 3 years: 1981, 1982 and 1983. There's a factor f which takes
>values 1, 2 or 3.
>
>set.seed(5)
>D = data.frame(
> names=c("a", "a", "a", "b",
"b", "c", "c", "c", "d",
"d"),
> year= c( 81, 82, 83, 81, 83, 81, 82, 83, 82, 83),
> f= sample(1:3, 10, replace=T)
> )
>print(D)
>
>What I'd like to do is locate situations where a firm is observed for
>two consecutive years, and put together conditional probabilities of
>state transition.
>
>Expressed as counts, putting time $t$ as the rows and time $t+1$ as
>the columms, I'd like to get the table:
>
> 1 2 3
>1 1
>2 1
>3 1 1 1
>
>For example, this says that we only observe one situation (in this
>data) where f=1 and then we got to see the next year, and in that year
>f was 3. So we have a 100% probability of going from 1 -> 3. In the
>case of f=3, we have 3 situations where we observe two consecutive
>years where the 1st is f=3, and it turns out that the outcomes are
>1,2,3 each happening once.
>
>Expressed as conditional probabilities it is:
>
> 1 2 3
>1 1
>2 1
>3 .33 .33 .33
>
>How might one do this? And then, how might one go about getting
>confidence intervals for the transition probabilities that are seen in
>this transition matrix? I'd like to test the null that the state
>variable does not change. I.e., if a firm is in f=2, under the null,
>it's just stay at f=2 in the following year. That'd be an identity
>matrix for the transition probabilities:
>
> 1 2 3
>1 1
>2 1
>3 1
>
>How does one talk about the distribution of the transition
>probabilities under this null, and thus get a test?
>
>
>
--
Spencer Graves, PhD, Senior Development Engineer
O: (408)938-4420; mobile: (408)655-4567
Gabor Grothendieck
2004-Oct-07 13:39 UTC
[R] Computing and testing transition probabilities?
Ajay Shah <ajayshah <at> mayin.org> writes:
:
: Folks, I have a situation with many firms, observed for many years
: (but it's not panel data, so every now and then data for a firm just
: goes missing).
:
: Let me show you an example. There are 3 firms "a", "b" and
"c". There
: are 3 years: 1981, 1982 and 1983. There's a factor f which takes
: values 1, 2 or 3.
:
: set.seed(5)
: D = data.frame(
: names=c("a", "a", "a", "b",
"b", "c", "c", "c", "d",
"d"),
: year= c( 81, 82, 83, 81, 83, 81, 82, 83, 82, 83),
: f= sample(1:3, 10, replace=T)
: )
: print(D)
:
: What I'd like to do is locate situations where a firm is observed for
: two consecutive years, and put together conditional probabilities of
: state transition.
:
: Expressed as counts, putting time $t$ as the rows and time $t+1$ as
: the columms, I'd like to get the table:
:
: 1 2 3
: 1 1
: 2 1
: 3 1 1 1
I do not get the same answer as you but my understanding from
your description is that this is what you are looking for:
R> set.seed(5)
R> D <- data.frame(
+ names =c("a", "a", "a", "b",
"b", "c", "c", "c", "d",
"d"),
+ year = c( 81, 82, 83, 81, 83, 81, 82, 83, 82, 83),
+ f= sample(1:3, 10, replace=T)
+ )
R> print(D)
names year f
1 a 81 1
2 a 82 3
3 a 83 3
4 b 81 1
5 b 83 1
6 c 81 3
7 c 82 2
8 c 83 3
9 d 82 3
10 d 83 1
R> MM <- merge(D, cbind(year = D$year + 1, D), by.x = 2:1, by.y = 1:2)
R> tab <- with(MM, table(f.x, f.y))
R> tab
f.y
f.x 1 2 3
1 0 0 1
2 0 0 1
3 1 1 1
R> prop.table(tab, margin = 1)
f.y
f.x 1 2 3
1 0.0000000 0.0000000 1.0000000
2 0.0000000 0.0000000 1.0000000
3 0.3333333 0.3333333 0.3333333