<abigailclifton <at> me.com> writes:
> I am trying to fit a generalised linear model to some loan
> application and default data. The purpose of this is to eventually
> work out the probability an applicant will default.
> However, R seems to crash or die when I run "glm" on anything
> greater than a 5-way saturated model for my data.
What does "crash or die" mean? Are you getting error messages?
What are they? Is the R application actually quitting?
> My first question: is the best way to fit a generalised linear model
> in R to fit the saturated model and extract the significant terms
> only, or to start at the null model and to work up to the optimum
> one?
This is more of a statistical practice question than an R question.
Opinions differ but in general I would say if it is computationally
feasible that you should start (and maybe finish) with the
full model.
> I am importing a csv file with 3500 rows and 27 columns (3500x27 matrix).
> My second question: is there anyway to increase the memory
> I have so R can cope with more analysis?
help("Memory-limits")>
> I can send my code if it would help to answer the question.
Please read the posting guide (link at the bottom of every R-help
posting) and follow its advice. We don't know enough about your
situation to help. You could also try reading
http://tinyurl.com/reproducible-000 ...
This works for me:
z <- matrix(rnorm(3500*27),ncol=27)
y <- sample(0:1,replace=TRUE,size=3500)
colnames(z) <- c(letters,"A")
d <- data.frame(y=y,z)
gg <- glm(y~.,data=d,family="binomial")
gg <- glm(y~a*b*c*d*e*f*g*h,data=d,family="binomial")