Folks, I am holding a dataset where firms are observed for a fixed (and small) set of years. The data is in "long" format - one record for one firm for one point in time. A state variable is observed (a factor). I wish to make a markov transition matrix about the time-series evolution of that state variable. The code below does this. But it's hardcoded to the specific years that I observe. How might one generalise this and make a general function which does this? :-) -ans. set.seed(1001) # Raw data in long format -- raw <- data.frame(name=c("f1","f1","f1","f1","f2","f2","f2","f2"), year=c(83, 84, 85, 86, 83, 84, 85, 86), state=sample(1:3, 8, replace=TRUE) ) # Shift to wide format -- fixedup <- reshape(raw, timevar="year", idvar="name", v.names="state", direction="wide") # Now tediously build up records for an intermediate data structure try <- rbind( data.frame(prev=fixedup$state.83, new=fixedup$state.84), data.frame(prev=fixedup$state.84, new=fixedup$state.85), data.frame(prev=fixedup$state.85, new=fixedup$state.86) ) # This is a bad method because it is hardcoded to the specific values # of "year". markov <- table(destination$prev.state, destination$new.state) -- Ajay Shah http://www.mayin.org/ajayshah ajayshah at mayin.org http://ajayshahblog.blogspot.com <*(:-? - wizard who doesn't know the answer.
Is this what you want: set.seed(1001) # Raw data in long format -- raw <- data.frame(name=c("f1","f1","f1","f1","f2","f2","f2","f2"), year=c(83, 84, 85, 86, 83, 84, 85, 86), state=sample(1:3, 8, replace=TRUE) ) # Shift to wide format -- fixedup <- reshape(raw, timevar="year", idvar="name", v.names="state", direction="wide") trans <- as.matrix(fixedup) result <- NULL for (i in 2:(ncol(trans) - 1)){ result <- rbind(result, cbind(name=trans[,1], prev=trans[,i], next=trans[,i+1])) } result markov <- table(try$prev.state, try$new.state) On 1/21/06, Ajay Narottam Shah <ajayshah@mayin.org> wrote:> > Folks, > > I am holding a dataset where firms are observed for a fixed (and > small) set of years. The data is in "long" format - one record for one > firm for one point in time. A state variable is observed (a factor). > > I wish to make a markov transition matrix about the time-series > evolution of that state variable. The code below does this. But it's > hardcoded to the specific years that I observe. How might one > generalise this and make a general function which does this? :-) > > -ans. > > > > set.seed(1001) > > # Raw data in long format -- > raw <- data.frame(name=c("f1","f1","f1","f1","f2","f2","f2","f2"), > year=c(83, 84, 85, 86, 83, 84, 85, 86), > state=sample(1:3, 8, replace=TRUE) > ) > # Shift to wide format -- > fixedup <- reshape(raw, timevar="year", idvar="name", v.names="state", > direction="wide") > # Now tediously build up records for an intermediate data structure > try <- rbind( > data.frame(prev=fixedup$state.83, new=fixedup$state.84), > data.frame(prev=fixedup$state.84, new=fixedup$state.85), > data.frame(prev=fixedup$state.85, new=fixedup$state.86) > ) > # This is a bad method because it is hardcoded to the specific values > # of "year". > markov <- table(destination$prev.state, destination$new.state) > > -- > Ajay Shah > http://www.mayin.org/ajayshah > ajayshah@mayin.org > http://ajayshahblog.blogspot.com > <*(:-? - wizard who doesn't know the answer. > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >-- Jim Holtman Cincinnati, OH +1 513 247 0281 What the problem you are trying to solve? [[alternative HTML version deleted]]
If you can be sure that there are no missing years within firms, I think I would do it this way:> raw <- raw[do.call("order", raw), ] # not needed here> raw01 <- subset(data.frame(raw[-nrow(raw), ], raw[-1, ]), name =name.1) > with(raw01, table(state, state.1))state.1 state 1 2 3 1 1 0 0 2 0 2 1 3 1 1 0 So what would the general function look like? I suppose transitionM <- function(name, year, state) { raw <- data.frame(name = name, year = year, state = state) raw <- raw[do.call("order", raw), ] # needed in general raw01 <- subset(data.frame(raw[-nrow(raw), ], raw[-1, ]), name =name.1) with(raw01, table(state, state.1)) } give it a burl:> with(raw, transitionM(name, year, state))state.1 state 1 2 3 1 1 0 0 2 0 2 1 3 1 1 0 (NB no 'for' loops.) ezy peezy. W. Bill Venables, CMIS, CSIRO Laboratories, PO Box 120, Cleveland, Qld. 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile (rarely used): +61 4 1963 4642 Home Phone: +61 7 3286 7700 mailto:Bill.Venables at csiro.au http://www.cmis.csiro.au/bill.venables/ -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of jim holtman Sent: Sunday, 22 January 2006 11:20 AM To: Ajay Narottam Shah Cc: R-help Subject: Re: [R] Making a markov transition matrix Ignore last reply. I sent the wrong script.> set.seed(1001) > > # Raw data in long format -- > raw <- data.frame(name=c("f1","f1","f1","f1","f2","f2","f2","f2"),+ year=c(83, 84, 85, 86, 83, 84, 85, 86), + state=sample(1:3, 8, replace=TRUE) + )> # Shift to wide format -- > fixedup <- reshape(raw, timevar="year", idvar="name", v.names="state",+ direction="wide")> > trans <- as.matrix(fixedup) > result <- NULL > # loop through all the columns and build up the 'result' > for (i in 2:(ncol(trans) - 1)){+ result <- rbind(result, cbind(name=trans[,1], PREV=trans[,i], NEXT=trans[,i+1])) + }> resultname PREV NEXT 1 "f1" "3" "2" 5 "f2" "2" "3" 1 "f1" "2" "2" 5 "f2" "3" "1" 1 "f1" "2" "2" 5 "f2" "1" "1"> > (markov <- table(result[,"PREV"], result[,"NEXT"]))1 2 3 1 1 0 0 2 0 2 1 3 1 1 0 On 1/21/06, jim holtman <jholtman at gmail.com> wrote:> > Is this what you want: > > > set.seed(1001) > > # Raw data in long format -- > raw <- data.frame(name=c("f1","f1","f1","f1","f2","f2","f2","f2"), > year=c(83, 84, 85, 86, 83, 84, 85, 86), > state=sample(1:3, 8, replace=TRUE) > ) > # Shift to wide format -- > fixedup <- reshape(raw, timevar="year", idvar="name", v.names="state", > direction="wide") > > trans <- as.matrix(fixedup) > result <- NULL > for (i in 2:(ncol(trans) - 1)){ > result <- rbind(result, cbind(name=trans[,1], prev=trans[,i], > next=trans[,i+1])) > } > > result > > markov <- table(try$prev.state, try$new.state) > > > > > > On 1/21/06, Ajay Narottam Shah <ajayshah at mayin.org> wrote: > > > > Folks, > > > > I am holding a dataset where firms are observed for a fixed (and > > small) set of years. The data is in "long" format - one record forone> > firm for one point in time. A state variable is observed (a factor). > > > > I wish to make a markov transition matrix about the time-series > > evolution of that state variable. The code below does this. But it's > > hardcoded to the specific years that I observe. How might one > > generalise this and make a general function which does this? :-) > > > > -ans. > > > > > > > > set.seed(1001) > > > > # Raw data in long format -- > > raw <- data.frame(name=c("f1","f1","f1","f1","f2","f2","f2","f2"), > > year=c(83, 84, 85, 86, 83, 84, 85, 86), > > state=sample(1:3, 8, replace=TRUE) > > ) > > # Shift to wide format -- > > fixedup <- reshape(raw, timevar="year", idvar="name",v.names="state",> > direction="wide") > > # Now tediously build up records for an intermediate data structure > > try <- rbind( > > data.frame(prev=fixedup$state.83, new=fixedup$state.84), > > data.frame(prev=fixedup$state.84, new=fixedup$state.85), > > data.frame(prev=fixedup$state.85, new=fixedup$state.86) > > ) > > # This is a bad method because it is hardcoded to the specificvalues> > # of "year". > > markov <- table(destination$prev.state, destination$new.state) > > > > -- > > Ajay Shahhttp://www.mayin.org/ajayshah> > > > ajayshah at mayin.org > > http://ajayshahblog.blogspot.com > > <*(:-? - wizard who doesn't know the answer. > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > >http://www.R-project.org/posting-guide.html<http://www.r-project.org/pos ting-guide.html>> > > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 247 0281 > > What the problem you are trying to solve?-- Jim Holtman Cincinnati, OH +1 513 247 0281 What the problem you are trying to solve? [[alternative HTML version deleted]] ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
See: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/42934.html On 1/21/06, Ajay Narottam Shah <ajayshah at mayin.org> wrote:> Folks, > > I am holding a dataset where firms are observed for a fixed (and > small) set of years. The data is in "long" format - one record for one > firm for one point in time. A state variable is observed (a factor). > > I wish to make a markov transition matrix about the time-series > evolution of that state variable. The code below does this. But it's > hardcoded to the specific years that I observe. How might one > generalise this and make a general function which does this? :-) > > -ans. > > > > set.seed(1001) > > # Raw data in long format -- > raw <- data.frame(name=c("f1","f1","f1","f1","f2","f2","f2","f2"), > year=c(83, 84, 85, 86, 83, 84, 85, 86), > state=sample(1:3, 8, replace=TRUE) > ) > # Shift to wide format -- > fixedup <- reshape(raw, timevar="year", idvar="name", v.names="state", > direction="wide") > # Now tediously build up records for an intermediate data structure > try <- rbind( > data.frame(prev=fixedup$state.83, new=fixedup$state.84), > data.frame(prev=fixedup$state.84, new=fixedup$state.85), > data.frame(prev=fixedup$state.85, new=fixedup$state.86) > ) > # This is a bad method because it is hardcoded to the specific values > # of "year". > markov <- table(destination$prev.state, destination$new.state) > > -- > Ajay Shah http://www.mayin.org/ajayshah > ajayshah at mayin.org http://ajayshahblog.blogspot.com > <*(:-? - wizard who doesn't know the answer. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
On Sun, Jan 22, 2006 at 01:47:00PM +1100, Bill.Venables at csiro.au wrote:> If this is a real problem, here is a slightly tidier version of the > function I gave on R-help: > > transitionM <- function(name, year, state) { > raw <- data.frame(name = name, state = state)[order(name, year), ] > raw01 <- subset(data.frame(raw[-nrow(raw), ], raw[-1, ]), > name == name.1) > with(raw01, table(state, state.1)) > } > > Notice that this does assume there are 'no gaps' in the time series > within firms, but it does not require that each firm have responses for > the same set of years. > > Estimating the transition probability matrix when there are gaps within > firms is a more interesting problem, both statistically and, when you > figure that out, computationally.With help from Gabor, here's my best effort. It should work even if there are gaps in the timeseries within firms, and it allows different firms to have responses in different years. It is wrapped up as a function which eats a data frame. Somebody should put this function into Hmisc or gtools or something of the sort. # Problem statement: # # You are holding a dataset where firms are observed for a fixed # (and small) set of years. The data is in "long" format - one # record for one firm for one point in time. A state variable is # observed (a factor). # You wish to make a markov transition matrix about the time-series # evolution of that state variable. set.seed(1001) # Raw data in long format -- raw <- data.frame(name=c("f1","f1","f1","f1","f2","f2","f2","f2"), year=c(83, 84, 85, 86, 83, 84, 85, 86), state=sample(1:3, 8, replace=TRUE) ) transition.probabilities <- function(D, timevar="year", idvar="name", statevar="state") { merged <- merge(D, cbind(nextt=D[,timevar] + 1, D), by.x = c(timevar, idvar), by.y = c("nextt", idvar)) t(table(merged[, grep(statevar, names(merged), value = TRUE)])) } transition.probabilities(raw, timevar="year", idvar="name", statevar="state") -- Ajay Shah http://www.mayin.org/ajayshah ajayshah at mayin.org http://ajayshahblog.blogspot.com <*(:-? - wizard who doesn't know the answer.
That solution for the case 'with gaps' merely omits transitions where the transition information is not for a single time step. (Mine can be modified for this as well - see below.) But if you know that a firm went from state i in year y to state j in year y+3, say, without knowing the intermediate states, that must tell you something about the 1-step transition matrix as well. How do you use this information? That's a much more difficult problem but you can do it using maximum likelihood, e.g. You think about how to calculate the likelihood function - and then to optimise it. This is getting a bit away from the original 'programming trick' question, but it is an interesting problem that occurs more often than I had realised. I'd be interested in knowing if anyone had done anything slick in this area. Bill Venables. -----Original Message----- From: Ajay Narottam Shah [mailto:ajayshah at mayin.org] Sent: Sunday, 22 January 2006 5:15 PM To: R-help Cc: jholtman at gmail.com; Venables, Bill (CMIS, Cleveland) Subject: Re: [R] Making a markov transition matrix On Sun, Jan 22, 2006 at 01:47:00PM +1100, Bill.Venables at csiro.au wrote:> If this is a real problem, here is a slightly tidier version of the > function I gave on R-help: > > transitionM <- function(name, year, state) { > raw <- data.frame(name = name, state = state)[order(name, year), ] > raw01 <- subset(data.frame(raw[-nrow(raw), ], raw[-1, ]), > name == name.1) > with(raw01, table(state, state.1)) > }To modify this solution for the 'with gaps' case, omitting multiple step transitions, you need to include the year in the 'raw' data frame and then just change the subset condition to name == name.1 & year == year.1 - 1> > Notice that this does assume there are 'no gaps' in the time series > within firms, but it does not require that each firm have responsesfor> the same set of years. > > Estimating the transition probability matrix when there are gapswithin> firms is a more interesting problem, both statistically and, when you > figure that out, computationally.With help from Gabor, here's my best effort. It should work even if there are gaps in the timeseries within firms, and it allows different firms to have responses in different years. It is wrapped up as a function which eats a data frame. Somebody should put this function into Hmisc or gtools or something of the sort. # Problem statement: # # You are holding a dataset where firms are observed for a fixed # (and small) set of years. The data is in "long" format - one # record for one firm for one point in time. A state variable is # observed (a factor). # You wish to make a markov transition matrix about the time-series # evolution of that state variable. set.seed(1001) # Raw data in long format -- raw <- data.frame(name=c("f1","f1","f1","f1","f2","f2","f2","f2"), year=c(83, 84, 85, 86, 83, 84, 85, 86), state=sample(1:3, 8, replace=TRUE) ) transition.probabilities <- function(D, timevar="year", idvar="name", statevar="state") { merged <- merge(D, cbind(nextt=D[,timevar] + 1, D), by.x = c(timevar, idvar), by.y = c("nextt", idvar)) t(table(merged[, grep(statevar, names(merged), value = TRUE)])) } transition.probabilities(raw, timevar="year", idvar="name", statevar="state") -- Ajay Shah http://www.mayin.org/ajayshah ajayshah at mayin.org http://ajayshahblog.blogspot.com <*(:-? - wizard who doesn't know the answer.
Ajay--you seem to have gotten your question answered regarding putting your dataframe in the correct format, etc. If you haven't already, you might want to check out the MSM package for multi-state Markov and hidden Markov models in continuous time. It's been quite useful for some of my work regarding estimating Markov chains/matrices and is actively maintained. Thanks, Charles <<< you wrote >>> Folks, I am holding a dataset where firms are observed for a fixed (and small) set of years. The data is in "long" format - one record for one firm for one point in time. A state variable is observed (a factor). I wish to make a markov transition matrix about the time-series evolution of that state variable. The code below does this. But it's hardcoded to the specific years that I observe. How might one generalise this and make a general function which does this? :-) -ans. set.seed(1001) # Raw data in long format -- raw <- data.frame(name=c("f1","f1","f1","f1","f2","f2","f2","f2"), year=c(83, 84, 85, 86, 83, 84, 85, 86), state=sample(1:3, 8, replace=TRUE) ) # Shift to wide format -- fixedup <- reshape(raw, timevar="year", idvar="name", v.names="state", direction="wide") # Now tediously build up records for an intermediate data structure try <- rbind( data.frame(prev=fixedup$state.83, new=fixedup$state.84), data.frame(prev=fixedup$state.84, new=fixedup$state.85), data.frame(prev=fixedup$state.85, new=fixedup$state.86) ) # This is a bad method because it is hardcoded to the specific values # of "year". markov <- table(destination$prev.state, destination$new.state) -- Ajay Shah http://www.mayin.org/ajayshah ajayshah@mayin.org http://ajayshahblog.blogspot.com <*(:-? - wizard who doesn't know the answer. [[alternative HTML version deleted]]