Martijn Wieling
2012-Apr-23 17:26 UTC
[R] Problem extracting enough coefs from gam (mgcv package)
Dear useRs, I have used using the excellent mgcv package (version 1.7-12) to create a generalized additive model (gam) including random effects - represented with s(...,bs="re") - on the basis of dialect data. My model contains two random-effect factors (Word and Key - the latter representing a speaker) and I have added both random intercepts and various random slopes for these random-effect factors. There is no missing data in my dataset. When I try to extract the by-word random intercepts from my model, using coef(model), I find 357 values, equal to the number of words in my dataset. Using coef(model) I get uninformative names: s(Word,1) until s(Word,357), but I'm assuming (I might be wrong though?) that I can link the labels of the words to these values by obtaining the 357 labels from the original dataset: unique(dat[,c("Word")]) Unfortunately, I cannot use this procedure to label the by-word random slopes, because I find a varying number of values for these (ranging from 346 to 356) which is always less than 357. (The number of by-speaker random slopes does equal the number of speakers, though.) Does anybody i) have an idea why I obtain fewer by-word random slopes than words, and/or ii) how I can link the random slopes which are present to the correct labels of the words? (I did not include the model as it is >300 MB in size, but let me know if this is necessary.) Any help would be greatly appreciated! With kind regards, Martijn Wieling University of Groningen http://www.martijnwieling.nl
Simon Wood
2012-Apr-24 08:50 UTC
[R] Problem extracting enough coefs from gam (mgcv package)
Martijn, It's a bit hard to know without seeing the full model structure, but it's possible that the issue is related to an undesirable side effect of the handling of identifiability constraints on smooth terms, prior to mgcv 1.7-13: the standard side constraint approach used for smooths could lead to unexpected constraints being applied to s(...,bs="re") terms in some cases. So, could you sent me the gam call that generates the problem, and perhaps try out if it still happens in 1.7-13? best, Simon On 23/04/12 18:26, Martijn Wieling wrote:> Dear useRs, > > I have used using the excellent mgcv package (version 1.7-12) to > create a generalized additive model (gam) including random effects - > represented with s(...,bs="re") - on the basis of dialect data. > > My model contains two random-effect factors (Word and Key - the latter > representing a speaker) and I have added both random intercepts and > various random slopes for these random-effect factors. There is no > missing data in my dataset. When I try to extract the by-word random > intercepts from my model, using coef(model), I find 357 values, equal > to the number of words in my dataset. Using coef(model) I get > uninformative names: s(Word,1) until s(Word,357), but I'm assuming (I > might be wrong though?) that I can link the labels of the words to > these values by obtaining the 357 labels from the original dataset: > unique(dat[,c("Word")]) > > Unfortunately, I cannot use this procedure to label the by-word random > slopes, because I find a varying number of values for these (ranging > from 346 to 356) which is always less than 357. (The number of > by-speaker random slopes does equal the number of speakers, though.) > > Does anybody i) have an idea why I obtain fewer by-word random slopes > than words, and/or ii) how I can link the random slopes which are > present to the correct labels of the words? > > (I did not include the model as it is>300 MB in size, but let me know > if this is necessary.) > > Any help would be greatly appreciated! > > With kind regards, > Martijn Wieling > University of Groningen > http://www.martijnwieling.nl > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Simon Wood, Mathematical Science, University of Bath BA2 7AY UK +44 (0)1225 386603 http://people.bath.ac.uk/sw283
Martijn Wieling
2012-Apr-24 09:22 UTC
[R] Problem extracting enough coefs from gam (mgcv package)
Hi Simon, Thanks for your quick reply. I'm now running the model again with mgcv 1.7-13. This might take some time (half a day or so) as the dataset is quite large (112,608 rows). The call I've used was (I've simplified some variable names): model = bam(LingDist ~ s(Lon,Lat) + VowelRatio + IsDem + WordLength + SpBirthYear + IsAragon + SpBirthYear_IsAragon + PopCnt + s(Word,bs="re") + s(Speaker,bs="re") + s(Word,SpBirthYear,bs="re") + s(Word,IsAragon,bs="re") + s(Word,PopCnt,bs="re") + s(Speaker,VowelRatio,bs="re") + s(Speaker,IsDem,bs="re") + s(Speaker,WordLength,bs="re") + s(Word,Tourism,bs="re") + s(Word,PopAge,bs="re")+ s(Word,PopIncome,bs="re") + s(Word,SpEdu,bs="re") + s(Word,SpBirthYear_IsAragon,bs="re"), data=dat) I'll post the results w.r.t. the random slopes. My procedure to assign labels when the number of slope estimates equals the number of words is correct: rownames(slopes) unique(dat[,c("Word")])? With kind regards, Martijn On 24/04/12 10:50, Simon Wood wrote:> Martijn, > > It's a bit hard to know without seeing the full model structure, but > it's possible that the issue is related to an undesirable side effect of > the handling of identifiability constraints on smooth terms, prior to > mgcv 1.7-13: the standard side constraint approach used for smooths > could lead to unexpected constraints being applied to s(...,bs="re") > terms in some cases. > > So, could you sent me the gam call that generates the problem, and > perhaps try out if it still happens in 1.7-13? > > best, > Simon > > On 23/04/12 18:26, Martijn Wieling wrote: >> Dear useRs, >> >> I have used using the excellent mgcv package (version 1.7-12) to >> create a generalized additive model (gam) including random effects - >> represented with s(...,bs="re") - on the basis of dialect data. >> >> My model contains two random-effect factors (Word and Key - the latter >> representing a speaker) and I have added both random intercepts and >> various random slopes for these random-effect factors. There is no >> missing data in my dataset. When I try to extract the by-word random >> intercepts from my model, using coef(model), I find 357 values, equal >> to the number of words in my dataset. Using coef(model) I get >> uninformative names: s(Word,1) until s(Word,357), but I'm assuming (I >> might be wrong though?) that I can link the labels of the words to >> these values by obtaining the 357 labels from the original dataset: >> unique(dat[,c("Word")]) >> >> Unfortunately, I cannot use this procedure to label the by-word random >> slopes, because I find a varying number of values for these (ranging >> from 346 to 356) which is always less than 357. (The number of >> by-speaker random slopes does equal the number of speakers, though.) >> >> Does anybody i) have an idea why I obtain fewer by-word random slopes >> than words, and/or ii) how I can link the random slopes which are >> present to the correct labels of the words? >> >> (I did not include the model as it is>300 MB in size, but let me know >> if this is necessary.) >> >> Any help would be greatly appreciated! >> >> With kind regards, >> Martijn Wieling >> University of Groningen >> http://www.martijnwieling.nl >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >>
Possibly Parallel Threads
- Problem with simple random slope in gam and bam (mgcv package)
- mgcv (bam) very large standard error difference between versions 1.7-11 and 1.7-17, bug?
- mgcv: inclusion of random intercept in model - based on p-value of smooth or anova?
- Random effects in package mgcv
- nlsList - Error in !unlist(lapply(coefs, is.null))