Valerio Leone Sciabolazza
2019-Jan-25 16:36 UTC
[R] how to run a multinomial logistic regression with fixed effects
Dear list users, I am looking for a R package implementing a multinomial logistic regression with fixed effects (Chamberlain 1980, Review of Economic Studies 47: 225?238). Over the years, a number of questions have been asked in the R help and in stack-related websites in order to find how to use this model in a fixed-effects framework. In some cases, it was suggested to use existing routines, mainly nnet::multinom and mlogit::mlogit. However, this doesn?t look like a viable solution, because these packages are not exactly designed for this task. Perhaps unsurprisingly, I was not able to find any working example that can serve my purpose. Others have suggested to use a Poisson transformation. This is discussed for example in a working paper on the arxiv https://arxiv.org/pdf/1707.08538.pdf. However, I found no useful guides to implement this approach in R. Finally, many have been addressed to packages using Bayesian estimation strategies. I was wondering if anyone in this list can provide an example or any resource that I can use to begin working on this model using R. Let me stress that I am not interested in working with Bayesian methods. My data looks like this: set.seed(123) # number of observations n <- 100 # number of possible choice possible_choice <- letters[1:4] # number of years years <- 3 # individual characteristics x1 <- runif(n * 3, 5.0, 70.5) x2 <- sample(1:n^2, n * 3, replace = F) # actual choice at time 1 actual_choice_year_1 <- possible_choice[sample(1:4, n, replace = T, prob = rep(1/4, 4))] actual_choice_year_2 <- possible_choice[sample(1:4, n, replace = T, prob = c(0.4, 0.3, 0.2, 0.1))] actual_choice_year_3 <- possible_choice[sample(1:4, n, replace = T, prob = c(0.2, 0.5, 0.2, 0.1))] # create long dataset df <- data.frame(choice = c(actual_choice_year_1, actual_choice_year_2, actual_choice_year_3), x1 = x1, x2 = x2, individual_fixed_effect = as.character(rep(1:n, years)), time_fixed_effect = as.character(rep(1:years, each = n)), stringsAsFactors = F) Ideally, what I would like to estimate is a formula of the kind formula("choice ~ x1 + x2 + individual_fixed_effect + time_fixed_effect") To this purpose, I have tried to use the package mlogit. Consequenly, following the vignette of this package I have rearranged my data as library(mlogit) # create wide dataset data_mlogit <- mlogit.data(df, id.var = "individual_fixed_effect", group.var = "time_fixed_effect", choice = "choice", shape = "wide") This allow me to run a multinomial logit regression without fixed effects by typing # formula formula_mlogit <- formula("choice ~ 1| x1 + x2") # run multinomial regression fit <- mlogit(formula_mlogit, data_mlogit) summary(fit) Apparently, in order to include fixed effects and use a panel estimation strategy, one should set the argument panel equal to TRUE in the function mlogit. However, according to the help of this function, this argument is evaluated only if another argument, rpar, is not NULL. The argument rpar is used to set the distribution of random variables in the model specification. Unfortunately, no random variables are included in my specification, hence I have no parameters to use in rpar. As a result, mlogit seems not the best choice in this context. On stackexchange, a possible solution was proposed few years ago https://stats.stackexchange.com/questions/51148/unable-to-provide-random-parameter-with-mlogit However, I don't understand how to actually implement it. Regards, Valerio Leone Sciabolazza, Ph.D. Department of Business and Economics University of Naples, Parthenope. valerio.leonesciabolazza at uniparthenope.it www.valerioleonesciabolazza.com P.s. Recently, Stata provided a package (femlogit) for the estimation of this model with fixed effects.