Hi, I just wanted to check I'm not re-inventing the wheel here. I'm developing a new algorithm for backfitting (i.e. additive models) and for computing partial residuals, where partial residuals are still computed even where there are missing values. Noting additive models here contain both linear terms and smooth terms. If I am re-inventing the wheel could some one please let me know. I'm kind of on my own at the moment, and don't have quite as much academic support as I would like. Here's an excerpt from my incomplete package (on cran), amba. One way to think of residuals, is as some vector of values. If we start with the response values and subtract the overall mean, we get values with relatively high variance. If we then subtract the fitted values for the first term, the variance decreases. If we repeat for each term, the variance gradually decreases, until we are left with values with relatively low variance. In the ideal case, the residuals would have zero variance. If we apply certain special conditions, then it is possible to only subtract a fitted value, where the corresponding explanatory value is valid (i.e. not missing). Where it is not valid, we just skip that subtraction operation (i.e. for that particular observation, the variance is not reduced as much). For this to work, each explanatory variable's partial residuals for each fit (not just the final fit) must be zero-centered. For smoothers this isn't a big issue, however conventional linear terms often do not satisfy this zero-centered condition. Noting the centering condition applies to partial residuals in relation to an explanatory variable (not in relation to a parameter) and each explanatory may have multiple parameters associated with it. For our linear terms to satisfy it, we require extra parameters. Categorical terms require one parameter for each level, and polynomial terms, their own intercepts. kind regards Charlotte
