Martijn Wieling
2011-Nov-09 12:41 UTC
[R] Problem with simple random slope in gam and bam (mgcv package)
Dear useRs, This is the first time I post to this list and I would appreciate any help available. I've used the excellent mgcv package for a while now to investigate geographical patterns of language variation, and it has has always worked without any problems for me. The problem below occurs using R 2.14.0 (both 32 and 64 bit versions in Windows and the 64 bit version in Unix) and mgcv (both version 1.7-10 and 1.7-6). In my (simplified) model predicting pronunciation distance I'd like to include a random slope per Participant for a binary value (IsDem) which stores a word-specific characteristic. I load the data (available at http://www.martijnwieling.nl/dat.csv) and run the model as follows:> library(mgcv) # version 1.7-10, but problem also occurs with earlier versions (e.g., 1.7-6) > dat = read.csv('dat.csv',header=T) # data available at: http://www.martijnwieling.nl/dat.csv > dim(dat) # the original dataset is larger, but the problem also occurs in this subset[1] 20000 4> model = bam(PronDist ~ s(Participant,IsDem,bs="re"), data=dat) > print(model) # works fine > summary(model, freq=T) # works fine > summary(model) # the Bayesian p-value estimation does not work:Error in eigen(B, symmetric = TRUE) : infinite or missing values in 'x' I obviously am interested in more complex models, but whenever I include any binary value as a by-word or by-participant random slope I get the same error. I've tried to locate the error and it appears to occur in the function pinvXVX in the block which 'deals with the fractional part of the pinv'. Any help would be appreciated! With kind regards, Martijn Wieling University of Groningen http://www.martijnwieling.nl
Simon Wood
2011-Nov-09 14:09 UTC
[R] Problem with simple random slope in gam and bam (mgcv package)
Martijn, Thanks for this. It's a bug. The p-value computation involves model matrices for each `smooth' term (in your case actually a random effect). When the data set is large, then random sub-sampling of the data is used to keep the computational cost of these model matrices down. This is ok for continuous predictors, but in the case of factor predictors, used in "re" terms, it can fail to pick up some levels of the factor and consequently fail due to rank deficiency.... This possibility had not previously occurred to me. I'll work out a fix... best, Simon On 09/11/11 12:41, Martijn Wieling wrote:> Dear useRs, > > This is the first time I post to this list and I would appreciate any > help available. I've used the excellent mgcv package for a while now > to investigate geographical patterns of language variation, and it has > has always worked without any problems for me. The problem below > occurs using R 2.14.0 (both 32 and 64 bit versions in Windows and the > 64 bit version in Unix) and mgcv (both version 1.7-10 and 1.7-6). > > In my (simplified) model predicting pronunciation distance I'd like to > include a random slope per Participant for a binary value (IsDem) > which stores a word-specific characteristic. I load the data > (available at http://www.martijnwieling.nl/dat.csv) and run the model > as follows: > >> library(mgcv) # version 1.7-10, but problem also occurs with earlier versions (e.g., 1.7-6) >> dat = read.csv('dat.csv',header=T) # data available at: http://www.martijnwieling.nl/dat.csv >> dim(dat) # the original dataset is larger, but the problem also occurs in this subset > [1] 20000 4 >> model = bam(PronDist ~ s(Participant,IsDem,bs="re"), data=dat) >> print(model) # works fine >> summary(model, freq=T) # works fine >> summary(model) # the Bayesian p-value estimation does not work: > Error in eigen(B, symmetric = TRUE) : infinite or missing values in 'x' > > I obviously am interested in more complex models, but whenever I > include any binary value as a by-word or by-participant random slope I > get the same error. I've tried to locate the error and it appears to > occur in the function pinvXVX in the block which 'deals with the > fractional part of the pinv'. > > Any help would be appreciated! > > With kind regards, > > Martijn Wieling > University of Groningen > http://www.martijnwieling.nl > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Simon Wood, Mathematical Science, University of Bath BA2 7AY UK +44 (0)1225 386603 http://people.bath.ac.uk/sw283
Simon Wood
2011-Nov-14 07:41 UTC
[R] Problem with simple random slope in gam and bam (mgcv package)
Martijn, This was a problem in summary.gam handling "re" terms with largish datasets: I've uploaded a fix for this in mgcv_1.7-11. Hopefully on cran in the next few days. best, Simon On 09/11/11 12:41, Martijn Wieling wrote:> Dear useRs, > > This is the first time I post to this list and I would appreciate any > help available. I've used the excellent mgcv package for a while now > to investigate geographical patterns of language variation, and it has > has always worked without any problems for me. The problem below > occurs using R 2.14.0 (both 32 and 64 bit versions in Windows and the > 64 bit version in Unix) and mgcv (both version 1.7-10 and 1.7-6). > > In my (simplified) model predicting pronunciation distance I'd like to > include a random slope per Participant for a binary value (IsDem) > which stores a word-specific characteristic. I load the data > (available at http://www.martijnwieling.nl/dat.csv) and run the model > as follows: > >> library(mgcv) # version 1.7-10, but problem also occurs with earlier versions (e.g., 1.7-6) >> dat = read.csv('dat.csv',header=T) # data available at: http://www.martijnwieling.nl/dat.csv >> dim(dat) # the original dataset is larger, but the problem also occurs in this subset > [1] 20000 4 >> model = bam(PronDist ~ s(Participant,IsDem,bs="re"), data=dat) >> print(model) # works fine >> summary(model, freq=T) # works fine >> summary(model) # the Bayesian p-value estimation does not work: > Error in eigen(B, symmetric = TRUE) : infinite or missing values in 'x' > > I obviously am interested in more complex models, but whenever I > include any binary value as a by-word or by-participant random slope I > get the same error. I've tried to locate the error and it appears to > occur in the function pinvXVX in the block which 'deals with the > fractional part of the pinv'. > > Any help would be appreciated! > > With kind regards, > > Martijn Wieling > University of Groningen > http://www.martijnwieling.nl > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Simon Wood, Mathematical Science, University of Bath BA2 7AY UK +44 (0)1225 386603 http://people.bath.ac.uk/sw283
Possibly Parallel Threads
- Problem extracting enough coefs from gam (mgcv package)
- mgcv: inclusion of random intercept in model - based on p-value of smooth or anova?
- mgcv (bam) very large standard error difference between versions 1.7-11 and 1.7-17, bug?
- mgcv bam() with grouped binomial data
- mgcv 'bam' : prediction levels for random effects