Rob James
2012-May-09 21:50 UTC
[R] Failed Convergence when using mi to generate synthetic data
I was hoping to use mi to generate a synthetic version of a database. The
strategy (see code below) was simple: use the diamonds dataset from
ggplot2, subset it focus on 3K single-color, then create a blank record
for every "real" record, and throw the new dataset at mi to see if it
would
populate the blank records. I kept getting failed convergence.
I think I have simplified the dataset down to the point where either I am
doing it wrong or something is wrong (conceptually) with what I am doing. I
would welcome suggestions:
library(ggplot2)
library(mi)
data(diamonds)
#use only 2800 or so observations!
diamonds1 <-subset(diamonds, color=="J")
rm(diamonds)
#simplify the data structure
diamonds1 <-subset(diamonds1, select=-c(x, z, y, cut, clarity, depth,
table))
str(diamonds1)
#generate a blank table
emptydiamonds1 <-diamonds1
for(j in 1:dim(diamonds1)[2]) {
emptydiamonds1[,j] <- NA
}
#throw up a dummy variable for imputation
diamonds1$impute=0
emptydiamonds1$impute=1
#package the two into one dataset
d2 <-rbind(diamonds1, emptydiamonds1)
str(d2)
#run in.info
miinfo <-mi.info(d2)
#pre_process
mi_pre <-mi.preprocess(d2)
#impute
Imp1 <-mi(mi_pre, n.iter=49)
[[alternative HTML version deleted]]
