Folks, I have a situation with many firms, observed for many years (but it's not panel data, so every now and then data for a firm just goes missing). Let me show you an example. There are 3 firms "a", "b" and "c". There are 3 years: 1981, 1982 and 1983. There's a factor f which takes values 1, 2 or 3. set.seed(5) D = data.frame( names=c("a", "a", "a", "b", "b", "c", "c", "c", "d", "d"), year= c( 81, 82, 83, 81, 83, 81, 82, 83, 82, 83), f= sample(1:3, 10, replace=T) ) print(D) What I'd like to do is locate situations where a firm is observed for two consecutive years, and put together conditional probabilities of state transition. Expressed as counts, putting time $t$ as the rows and time $t+1$ as the columms, I'd like to get the table: 1 2 3 1 1 2 1 3 1 1 1 For example, this says that we only observe one situation (in this data) where f=1 and then we got to see the next year, and in that year f was 3. So we have a 100% probability of going from 1 -> 3. In the case of f=3, we have 3 situations where we observe two consecutive years where the 1st is f=3, and it turns out that the outcomes are 1,2,3 each happening once. Expressed as conditional probabilities it is: 1 2 3 1 1 2 1 3 .33 .33 .33 How might one do this? And then, how might one go about getting confidence intervals for the transition probabilities that are seen in this transition matrix? I'd like to test the null that the state variable does not change. I.e., if a firm is in f=2, under the null, it's just stay at f=2 in the following year. That'd be an identity matrix for the transition probabilities: 1 2 3 1 1 2 1 3 1 How does one talk about the distribution of the transition probabilities under this null, and thus get a test? -- Ajay Shah Consultant ajayshah at mayin.org Department of Economic Affairs http://www.mayin.org/ajayshah Ministry of Finance, New Delhi
Regarding how to compile the table, have you considered something like the following: > Trans <- array(0, dim=c(3,3)) > for(i in 1:(n-1)) + if(D$names[i]==D$names[i+1]) + (Trans[D$f[i], D$f[i+1]] <- Trans[D$f[i], D$f[i+1]]+1) > Trans> Trans <- array(0, dim=c(3,3)) > for(i in 1:(n-1)) + if(D$names[i]==D$names[i+1]) + (Trans[D$f[i], D$f[i+1]] <- Trans[D$f[i], D$f[i+1]]+1) > Trans > Trans <- array(0, dim=c(3,3)) > for(i in 1:(n-1)) + if(D$names[i]==D$names[i+1]) + (Trans[D$f[i], D$f[i+1]] <- Trans[D$f[i], D$f[i+1]]+1) > Trans [,1] [,2] [,3] [1,] 1 0 1 [2,] 0 0 1 [3,] 1 1 1 This is not the answers you got; I got this in R 1.9.1 under Windows 2000. I'm sure there are ways to do this without a for loop, but I couldn't think of one immediately. Regarding how to analyze, have you searched for "discrete markov estimation" and "hidden markov estimate" from www.r-project.org -> search -> "R site search"? There are many possibilities. hope this helps. spencer graves Ajay Shah wrote:>Folks, I have a situation with many firms, observed for many years >(but it's not panel data, so every now and then data for a firm just >goes missing). > >Let me show you an example. There are 3 firms "a", "b" and "c". There >are 3 years: 1981, 1982 and 1983. There's a factor f which takes >values 1, 2 or 3. > >set.seed(5) >D = data.frame( > names=c("a", "a", "a", "b", "b", "c", "c", "c", "d", "d"), > year= c( 81, 82, 83, 81, 83, 81, 82, 83, 82, 83), > f= sample(1:3, 10, replace=T) > ) >print(D) > >What I'd like to do is locate situations where a firm is observed for >two consecutive years, and put together conditional probabilities of >state transition. > >Expressed as counts, putting time $t$ as the rows and time $t+1$ as >the columms, I'd like to get the table: > > 1 2 3 >1 1 >2 1 >3 1 1 1 > >For example, this says that we only observe one situation (in this >data) where f=1 and then we got to see the next year, and in that year >f was 3. So we have a 100% probability of going from 1 -> 3. In the >case of f=3, we have 3 situations where we observe two consecutive >years where the 1st is f=3, and it turns out that the outcomes are >1,2,3 each happening once. > >Expressed as conditional probabilities it is: > > 1 2 3 >1 1 >2 1 >3 .33 .33 .33 > >How might one do this? And then, how might one go about getting >confidence intervals for the transition probabilities that are seen in >this transition matrix? I'd like to test the null that the state >variable does not change. I.e., if a firm is in f=2, under the null, >it's just stay at f=2 in the following year. That'd be an identity >matrix for the transition probabilities: > > 1 2 3 >1 1 >2 1 >3 1 > >How does one talk about the distribution of the transition >probabilities under this null, and thus get a test? > > >-- Spencer Graves, PhD, Senior Development Engineer O: (408)938-4420; mobile: (408)655-4567
Gabor Grothendieck
2004-Oct-07 13:39 UTC
[R] Computing and testing transition probabilities?
Ajay Shah <ajayshah <at> mayin.org> writes: : : Folks, I have a situation with many firms, observed for many years : (but it's not panel data, so every now and then data for a firm just : goes missing). : : Let me show you an example. There are 3 firms "a", "b" and "c". There : are 3 years: 1981, 1982 and 1983. There's a factor f which takes : values 1, 2 or 3. : : set.seed(5) : D = data.frame( : names=c("a", "a", "a", "b", "b", "c", "c", "c", "d", "d"), : year= c( 81, 82, 83, 81, 83, 81, 82, 83, 82, 83), : f= sample(1:3, 10, replace=T) : ) : print(D) : : What I'd like to do is locate situations where a firm is observed for : two consecutive years, and put together conditional probabilities of : state transition. : : Expressed as counts, putting time $t$ as the rows and time $t+1$ as : the columms, I'd like to get the table: : : 1 2 3 : 1 1 : 2 1 : 3 1 1 1 I do not get the same answer as you but my understanding from your description is that this is what you are looking for: R> set.seed(5) R> D <- data.frame( + names =c("a", "a", "a", "b", "b", "c", "c", "c", "d", "d"), + year = c( 81, 82, 83, 81, 83, 81, 82, 83, 82, 83), + f= sample(1:3, 10, replace=T) + ) R> print(D) names year f 1 a 81 1 2 a 82 3 3 a 83 3 4 b 81 1 5 b 83 1 6 c 81 3 7 c 82 2 8 c 83 3 9 d 82 3 10 d 83 1 R> MM <- merge(D, cbind(year = D$year + 1, D), by.x = 2:1, by.y = 1:2) R> tab <- with(MM, table(f.x, f.y)) R> tab f.y f.x 1 2 3 1 0 0 1 2 0 0 1 3 1 1 1 R> prop.table(tab, margin = 1) f.y f.x 1 2 3 1 0.0000000 0.0000000 1.0000000 2 0.0000000 0.0000000 1.0000000 3 0.3333333 0.3333333 0.3333333