I want to analyse data with an unordered, multi-level outcome variable, y. I am
asking for the appropriate method (or R procedure) to use for this analysis.
> N <- 500
> set.seed(1234)
> data0 <- data.frame(y = as.factor(sample(LETTERS[1:3], N, repl = T, 
+     prob = c(10, 12, 14))), x1 = sample(1:7, N, repl = T, prob = c(8, 
+     8, 9, 15, 9, 9, 8)), x2 = sample(1:7, N, repl = T, prob = 7:1), 
+     x3 = round(rnorm(N, 0, 1), 0) + 4)
In [1], the use of "mlogit" is described (same package name).
"mlogit" produces a nice output, but for some reason it does not
accept the "." operator in my formula. In addition, it requires me to
transform my data frame before running the analysis:
> library(mlogit)
> data1 <- mlogit.data(data0, varying = NULL, choice = "y",
shape = "wide")
> summary(mod1 <- mlogit(y ~ 1 | ., data = data1, reflevel =
"A"))
Error in terms.formula(object) : '.' in formula and no 'data'
argument
> form2 <- paste("y", " ~ 1 | ",
paste(names(data0)[-1], sep = "",
+     collapse = " + "), sep = "", collapse =
"")> summary(mod2 <- mlogit(as.formula(form2), data = data1, reflevel =
"A"))
Call:
mlogit(formula = as.formula(form2), data = data1, reflevel = "A", 
    method = "nr", print.level = 0)
Frequencies of alternatives:
    A     B     C 
0.282 0.338 0.380 
nr method
3 iterations, 0h:0m:0s 
g'(-H)^-1g = 8.8E-07 
gradient close to zero 
Coefficients :
         Estimate Std. Error t-value Pr(>|t|)
altB     0.269645   0.569943  0.4731   0.6361
altC     0.642654   0.551250  1.1658   0.2437
altB:x1  0.018484   0.061061  0.3027   0.7621
altC:x1  0.075489   0.059837  1.2616   0.2071
altB:x2 -0.102349   0.068194 -1.5008   0.1334
altC:x2 -0.031934   0.065491 -0.4876   0.6258
altB:x3  0.034394   0.115189  0.2986   0.7653
altC:x3 -0.137739   0.112124 -1.2284   0.2193
Log-Likelihood: -542.05
McFadden R^2:  0.0065888 
Likelihood ratio test : chisq = 7.1903 (p.value=0.30361)
This is no big problem if the data frame holds only few variables, however, it
can easily become confusing if the number variables in the data frame increases.
So, I can use the function "multinom" (package "nnet"), as
described in [2] and [3]:
> library(nnet)
> summary(mod3 <- multinom(y ~ ., data = data0, na.action = na.omit, 
+     Hess = T, model = T))
# weights:  15 (8 variable)
initial  value 549.306144 
iter  10 value 542.075434
final  value 542.046320 
converged
Call:
multinom(formula = y ~ ., data = data0, na.action = na.omit, 
    Hess = T, model = T)
Coefficients:
  (Intercept)         x1          x2          x3
B   0.2696466 0.01848389 -0.10234898  0.03439345
C   0.6426552 0.07548893 -0.03193395 -0.13773888
Std. Errors:
  (Intercept)         x1         x2        x3
B   0.5699428 0.06106070 0.06819403 0.1151887
C   0.5512500 0.05983652 0.06549051 0.1121242
Residual Deviance: 1084.093 
AIC: 1100.093 
The estimates produced by this function are the same like in the analysis above
("mlogit"), and in principle, I should be able to calculate the z and
fit statistics by hand. However, this seem more long-winded than using the
"mlogit" function.
Alternatively, I may describe my results in the following way: Given that a
person knows B and C (which is the case for all my cases), what drives some to
use B, and what drives some to use C? Basically, I am wondering whether this
analysis can be done using a set of two, parallel binary logistic regressions,
one describing the step from A to B, and another one describing the step from A
to C, while keeping predictors the same in both regressions.
More generally, I am not sure whether each multinomial logistic model could not
just as well be analysed using consecutive binary logistic regressions. This has
two advantages in my opinion, one, the model statistics and fit measures are
well-described in binary models compared to multinomial ones (see [1] at the
bottom), and two, they are much easier to communicate. But I am wondering,
analogously to multiple mean comparisons, whether I run into any traps, and I
have no idea what these traps look like.
I would kindly appreciate any advice.
Thanks, S?ren
[1] http://www.ats.ucla.edu/stat/r/dae/mlogit.htm
[2] Venables, W. N. and Ripley, B. D. (2002). Modern applied statistics with S
(4th ed.). New York: Springer.
[3] http://www.stat.washington.edu/quinn/classes/536/S/multinomexample.html