Hi All,
I am attempting to build a Multinomial Logit model with dummy variables of
the following form:
Dependent Variable : 0-8 Discrete Choices
Dummy Variable 1: 965 dummy varsghpow at student.monash.edu.augh@gp1.com
Dummy Variable 2: 805 dummy vars
The data set I am using has the dummy columns pre-created, so it's a table
of 72,381 rows and 1770 columns.
The first 965 columns represent the dummy columns for Variable 1
The next 805 columns represent the dummy columns for Variable 2
My code to build the mlogit model looks like the following. I want to
know...is there a better way of doing this without these huge equations? (I
probably also need a more powerful PC to do all of this).
I'll also want to perform a joint test of significance on the first 805
coefficients...
Is this possible?
Thanks
GP
[code]
#install MLOGIT
library(mlogit)
#load mydata
mydata = 0
mydata<-read.csv(file="G:\\data.csv",head=TRUE)
my_data=0
num.rows=length(mydata[,1])
num.cols=965+805+1
my_data=matrix(0,nr=num.rows,nc=num.cols)
for(i in 1:num.rows) {
nb=mydata[i,2]
np=mydata[i,3]
my_data[i,nb]=1
my_data[i,965+np]=1
my_data[i,1+1770]=mydata[i,1]
}
#convert matrix to data.frame
# convert to data frame
my_data_frame<-as.data.frame(my_data)
#check data frame headers
head(my_data_frame)
#load dataframe into mldata with choice variable
mldata<-mlogit.data(my_data_frame, varying=NULL, choice="V1771",
shape="wide")
#V1771 = dependent var
#V1-V965 = variable 1 dummies
#V966-V1700 = variable 2 dummies
#regress V1771 against all 1700 variables...
mlogit.model<-mlogit(V1771~0|V1+V2+V3...+V1700,data=mldata,
reflevel="0")
[/code]
--
View this message in context:
http://r.789695.n4.nabble.com/Multinomial-Logit-Model-with-lots-of-Dummy-Variables-tp3439492p3439492.html
Sent from the R help mailing list archive at Nabble.com.
Jeremy Hetzel
2011-Apr-10 13:31 UTC
[R] Multinomial Logit Model with lots of Dummy Variables
If you are just looking to collapse the dummy variables into two factor
variables, the following will work.
## Generate some example data
set.seed(1234)
n <- 100
# Generate outcome
outcome <- rbinom(n, 3, 0.5)
colnames(exposures) <- paste("V", seq(1:10), sep = "")
#Generate dummy variables for A and B
A <- t(apply(matrix(nrow = 100, ncol = 5), 1, function(x)
{
sample(c(1, 0, 0, 0, 0))
}))
B <- t(apply(matrix(nrow = 100, ncol = 5), 1, function(x)
{
sample(c(1, 0, 0, 0, 0))
}))
# Combine into data frame
dat <- data.frame(outcome, A, B)
names(dat) <- c('outcome', paste("A", seq(1:5), sep =
""), paste("B",
seq(1:5), sep = ""))
head(dat)
## Collapse dummies to factor variable
A <- apply(dat, 1, function(x)
{
A <- x[2:6]
A.names <- names(x[2:6])
A.value <- A.names[A==1]
return(A.value)
})
B <- apply(dat, 1, function(x)
{
B <- x[7:11]
B.names <- names(x[7:11])
B.names
B.value <- B.names[B==1]
return(B.value)
})
# Combine into new data frame
dat.new <- data.frame(dat$outcome, A, B)
head(dat.new)
Jeremy
Hi Thanks to Jeremy for his response... I have been able to generate the factors and generate mlogit data using his code: mldata<-mlogit.data(mydata, varying=NULL, choice="pitch_type_1", shape="wide") my mlogit data looks like: "dependent_var","A variable","B Var","chid","alt" FALSE,"110","19",1,"0" FALSE,"110","19",1,"1" FALSE,"110","19",1,"2" FALSE,"110","19",1,"3" FALSE,"110","19",1,"4" TRUE,"110","19",1,"5" FALSE,"110","19",1,"6" FALSE,"110","19",1,"7" FALSE,"110","19",1,"8" FALSE,"110","19",2,"0" FALSE,"110","19",2,"1" FALSE,"110","19",2,"2" FALSE,"110","19",2,"3" FALSE,"110","19",2,"4" FALSE,"110","19",2,"5" TRUE,"110","19",2,"6" FALSE,"110","19",2,"7" FALSE,"110","19",2,"8" TRUE,"110","561",3,"0" FALSE,"110","561",3,"1" FALSE,"110","561",3,"2" FALSE,"110","561",3,"3" FALSE,"110","561",3,"4" FALSE,"110","561",3,"5" FALSE,"110","561",3,"6" FALSE,"110","561",3,"7" FALSE,"110","561",3,"8" FALSE,"110","149",4,"0" FALSE,"110","149",4,"1" TRUE,"110","149",4,"2" ... The mldata contains 651431 rows. If I try to run this full data set I get the following error:> mlogit.model<- mlogit(dependent_var~0|A+B, data = mldata, reflevel="0")Error in model.matrix.default(formula, data) : allocMatrix: too many elements specified Calls: mlogit ... model.matrix.mFormula -> model.matrix -> model.matrix.default Execution halted Smaller datasets (595 mldata rows) and mlogit works fine and generates regression output. Is there a problem with mlogit and huge datasets? I suppose this is perhaps not the best way to assess this kind of data, but I am trying to replicate a previous analysis that was completed on a similar amount of similar data. -- View this message in context: http://r.789695.n4.nabble.com/Multinomial-Logit-Model-with-lots-of-Dummy-Variables-tp3439492p3455345.html Sent from the R help mailing list archive at Nabble.com.