Hi All, I am attempting to build a Multinomial Logit model with dummy variables of the following form: Dependent Variable : 0-8 Discrete Choices Dummy Variable 1: 965 dummy varsghpow at student.monash.edu.augh@gp1.com Dummy Variable 2: 805 dummy vars The data set I am using has the dummy columns pre-created, so it's a table of 72,381 rows and 1770 columns. The first 965 columns represent the dummy columns for Variable 1 The next 805 columns represent the dummy columns for Variable 2 My code to build the mlogit model looks like the following. I want to know...is there a better way of doing this without these huge equations? (I probably also need a more powerful PC to do all of this). I'll also want to perform a joint test of significance on the first 805 coefficients... Is this possible? Thanks GP [code] #install MLOGIT library(mlogit) #load mydata mydata = 0 mydata<-read.csv(file="G:\\data.csv",head=TRUE) my_data=0 num.rows=length(mydata[,1]) num.cols=965+805+1 my_data=matrix(0,nr=num.rows,nc=num.cols) for(i in 1:num.rows) { nb=mydata[i,2] np=mydata[i,3] my_data[i,nb]=1 my_data[i,965+np]=1 my_data[i,1+1770]=mydata[i,1] } #convert matrix to data.frame # convert to data frame my_data_frame<-as.data.frame(my_data) #check data frame headers head(my_data_frame) #load dataframe into mldata with choice variable mldata<-mlogit.data(my_data_frame, varying=NULL, choice="V1771", shape="wide") #V1771 = dependent var #V1-V965 = variable 1 dummies #V966-V1700 = variable 2 dummies #regress V1771 against all 1700 variables... mlogit.model<-mlogit(V1771~0|V1+V2+V3...+V1700,data=mldata, reflevel="0") [/code] -- View this message in context: http://r.789695.n4.nabble.com/Multinomial-Logit-Model-with-lots-of-Dummy-Variables-tp3439492p3439492.html Sent from the R help mailing list archive at Nabble.com.
Jeremy Hetzel
2011-Apr-10 13:31 UTC
[R] Multinomial Logit Model with lots of Dummy Variables
If you are just looking to collapse the dummy variables into two factor variables, the following will work. ## Generate some example data set.seed(1234) n <- 100 # Generate outcome outcome <- rbinom(n, 3, 0.5) colnames(exposures) <- paste("V", seq(1:10), sep = "") #Generate dummy variables for A and B A <- t(apply(matrix(nrow = 100, ncol = 5), 1, function(x) { sample(c(1, 0, 0, 0, 0)) })) B <- t(apply(matrix(nrow = 100, ncol = 5), 1, function(x) { sample(c(1, 0, 0, 0, 0)) })) # Combine into data frame dat <- data.frame(outcome, A, B) names(dat) <- c('outcome', paste("A", seq(1:5), sep = ""), paste("B", seq(1:5), sep = "")) head(dat) ## Collapse dummies to factor variable A <- apply(dat, 1, function(x) { A <- x[2:6] A.names <- names(x[2:6]) A.value <- A.names[A==1] return(A.value) }) B <- apply(dat, 1, function(x) { B <- x[7:11] B.names <- names(x[7:11]) B.names B.value <- B.names[B==1] return(B.value) }) # Combine into new data frame dat.new <- data.frame(dat$outcome, A, B) head(dat.new) Jeremy
Hi Thanks to Jeremy for his response... I have been able to generate the factors and generate mlogit data using his code: mldata<-mlogit.data(mydata, varying=NULL, choice="pitch_type_1", shape="wide") my mlogit data looks like: "dependent_var","A variable","B Var","chid","alt" FALSE,"110","19",1,"0" FALSE,"110","19",1,"1" FALSE,"110","19",1,"2" FALSE,"110","19",1,"3" FALSE,"110","19",1,"4" TRUE,"110","19",1,"5" FALSE,"110","19",1,"6" FALSE,"110","19",1,"7" FALSE,"110","19",1,"8" FALSE,"110","19",2,"0" FALSE,"110","19",2,"1" FALSE,"110","19",2,"2" FALSE,"110","19",2,"3" FALSE,"110","19",2,"4" FALSE,"110","19",2,"5" TRUE,"110","19",2,"6" FALSE,"110","19",2,"7" FALSE,"110","19",2,"8" TRUE,"110","561",3,"0" FALSE,"110","561",3,"1" FALSE,"110","561",3,"2" FALSE,"110","561",3,"3" FALSE,"110","561",3,"4" FALSE,"110","561",3,"5" FALSE,"110","561",3,"6" FALSE,"110","561",3,"7" FALSE,"110","561",3,"8" FALSE,"110","149",4,"0" FALSE,"110","149",4,"1" TRUE,"110","149",4,"2" ... The mldata contains 651431 rows. If I try to run this full data set I get the following error:> mlogit.model<- mlogit(dependent_var~0|A+B, data = mldata, reflevel="0")Error in model.matrix.default(formula, data) : allocMatrix: too many elements specified Calls: mlogit ... model.matrix.mFormula -> model.matrix -> model.matrix.default Execution halted Smaller datasets (595 mldata rows) and mlogit works fine and generates regression output. Is there a problem with mlogit and huge datasets? I suppose this is perhaps not the best way to assess this kind of data, but I am trying to replicate a previous analysis that was completed on a similar amount of similar data. -- View this message in context: http://r.789695.n4.nabble.com/Multinomial-Logit-Model-with-lots-of-Dummy-Variables-tp3439492p3455345.html Sent from the R help mailing list archive at Nabble.com.