I have a set of data with ~ 250,000 observations summarized in ~ 1000 rows that I'm trying to analyze with mlogit. Based on the discussion in https://stat.ethz.ch/pipermail/r-help/2010-June/241161.html I understand that using weights= does not (fully) do what I need. I tried expanding my data to one row per observation to sidestep this issue but after waiting several hours for mlogit to finish I decided this was not a feasible strategy and I needed to use weights= and make whatever adjustments are necessary for the inferences. My solution is the following: Define W = sum(weights) / length(weights) Multiply the Log-Likelihood by W Divide the Std. Error's by sqrt(W) (and therefore multiply the t-value's by sqrt(W)) Can anyone confirm that this is correct (at least as a large-N approximation)? The code below provides a test case where I compare duplicating rows to using weights and adjusting the inferences (the original code was from Kenneth Train's exercises using the mlogit package for R). The last few lines printed (Ratios: ...) show that the coefficients in the two cases are the same to a high accuracy and the Log-Likelihood, Std. Error's and t-value's also have the expected ratios to a decent accuracy. However it would be good to know that this approach is conceptually sound. Thanks, Ron library("mlogit") data("Heating", package = "mlogit") H <- mlogit.data(Heating, shape="wide", choice="depvar", varying=c(3:12)) m <- mlogit(depvar~ic+oc|0, H) # print(summary(m)) w <- sample(1:200, nrow(Heating), replace=TRUE) # random weights i <- rep(1:nrow(Heating), times=w) # index vector for duplicating rows according to the weights H2 <- mlogit.data(Heating[i,], shape="wide", choice="depvar", varying=c(3:12)) m2 <- mlogit(depvar~ic+oc|0, H2) # print(summary(m2)) m3 <- mlogit(depvar~ic+oc|0, H, weights=rep(w,each=5)) # print(summary(m3)) print(all.equal(coef(m2),coef(m3))) f2 <- fitted(m2)[cumsum(w)] f3 <- fitted(m3) names(f2) <- names(f3) print(all.equal(f2,f3)) cat("\nRatios:", m2$logLik/m3$logLik, sum(w)/length(w), sqrt(sum(w)/length(w)), sqrt(length(w)/sum(w)), "\n\n") s2 <- summary(m2) s3 <- summary(m3) print(s2$CoefTable / s3$CoefTable) [[alternative HTML version deleted]]