Dear WizaRds, This is mostly a statistics question, but I'm figuring that R is the right solution (even before I start!) I have some bio data of heart rate versus time (rats taken from resting to maximal heart rate). I want to regress heart rate on time. The data have been normalized such that resting heart rate is zero at time=0, so that all curves intersect at the origin (and at the origin only). The regression function is monotonically increasing, but little is known about it. The data are also strictly ordinal: I have a factor such that lower-order data *must* be on a curve that is offset beneath a higher-order curve. Don't ask where the factors come from...but *given* these factors the assumption, or rather the *constraint*, is that lower orders are better (lower-order rats are fitter rats with better cardiovascular response). Most important is that these curves do not intersect because of these factors (a fitter rat can not have a worse heart-rate response than a less fit rat!). Here's a schematic to show you what I mean. It's very rough, but it gets the point across: seq(0, 1, len=100) f<- 1/seq(.3,1,len=6) windows(); plot(t, sqrt(3*t), type='n', xlab='time', ylab='heart rate'); grid() for (i in seq(along=f)) lines(t, sqrt(f[i]*t), col=ceiling(2*i)) Now I'm wondering where I should start and I'm think that this is really not much different from having a y_i ~ x_i | factor_i ... the i^th response curve just like a dummy variable male/female linear regression. But in some way the factors are related (there's a dose behind it, if ya see what I mean), so they are not independent...they're all part of a "system" (some studies have more "juice" overall, so the whole "system" is varying from one group of rats to the next). This means that in some "systems" the curves will be closer or some curves within the system will be closer together. Here is my question for you guys: what is the best way to model this kind of problem, especially given that each "i^th curve" could have as few as 3 data points? If I can only vaguely assume the type of the regression function (monotonic, rapidly increasing from the origin kind of like sqrt(t), as above) should it be a parametric or nonparametric regression? What about those errors? Gee! It's hard to assume anything (there're so few of 'em and they're probably heteroskedastic! Where do I begin? I'd gladly accept all references, pointers, and such like. Best, and thanks for R and R-Help, all you Core guys!!! Jack. --------------------------------- [[alternative HTML version deleted]]

Hi John, It sounds like what you need is a mixed effects model. This will allow you to model effects which apply to the whole population (fixed effects) and those which are specific to an individual (random effects).The theory and practice of mixed effects models is too large to discuss fully in an email. But I can point you in the right direction. First of all I suggest you read "Mixed-Effects models in S and S-PLUS" by Pinheiro and Bates. This is an excellent text on the subject with loads of worked examples. Also John Fox has produced a very nice introduction to mixed effects models, again with worked examples in R: see http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-mixed-models.pdf The library you require in R is called library(nlme). look up the Pixel example (?Pixel) it shows you how to model non-linear effects using a quadratic term. Download the nlme library from CRAN and paste the following in to R: library(nlme) fm1 <- lme(pixel ~ day + I(day^2), data = Pixel, random = list(Dog = ~ day, Side = ~ 1)) summary(fm1) plot(augPred(fm1)) I hope this helps. Regards Wayne -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]On Behalf Of John McHenry Sent: 13 September 2007 03:09 To: r-help at r-project.org Subject: [R] Multivariate, multilevel regression? Dear WizaRds, This is mostly a statistics question, but I'm figuring that R is the right solution (even before I start!) I have some bio data of heart rate versus time (rats taken from resting to maximal heart rate). I want to regress heart rate on time. The data have been normalized such that resting heart rate is zero at time=0, so that all curves intersect at the origin (and at the origin only). The regression function is monotonically increasing, but little is known about it. The data are also strictly ordinal: I have a factor such that lower-order data *must* be on a curve that is offset beneath a higher-order curve. Don't ask where the factors come from...but *given* these factors the assumption, or rather the *constraint*, is that lower orders are better (lower-order rats are fitter rats with better cardiovascular response). Most important is that these curves do not intersect because of these factors (a fitter rat can not have a worse heart-rate response than a less fit rat!). Here's a schematic to show you what I mean. It's very rough, but it gets the point across: seq(0, 1, len=100) f<- 1/seq(.3,1,len=6) windows(); plot(t, sqrt(3*t), type='n', xlab='time', ylab='heart rate'); grid() for (i in seq(along=f)) lines(t, sqrt(f[i]*t), col=ceiling(2*i)) Now I'm wondering where I should start and I'm think that this is really not much different from having a y_i ~ x_i | factor_i ... the i^th response curve just like a dummy variable male/female linear regression. But in some way the factors are related (there's a dose behind it, if ya see what I mean), so they are not independent...they're all part of a "system" (some studies have more "juice" overall, so the whole "system" is varying from one group of rats to the next). This means that in some "systems" the curves will be closer or some curves within the system will be closer together. Here is my question for you guys: what is the best way to model this kind of problem, especially given that each "i^th curve" could have as few as 3 data points? If I can only vaguely assume the type of the regression function (monotonic, rapidly increasing from the origin kind of like sqrt(t), as above) should it be a parametric or nonparametric regression? What about those errors? Gee! It's hard to assume anything (there're so few of 'em and they're probably heteroskedastic! Where do I begin? I'd gladly accept all references, pointers, and such like. Best, and thanks for R and R-Help, all you Core guys!!! Jack. --------------------------------- [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

On 9/13/07, John McHenry <john_d_mchenry at yahoo.com> wrote:> Dear WizaRds, > > This is mostly a statistics question, but I'm figuring that R is the right solution (even before I start!) > >Also have a look at these tutorials (in the journal Statistics in Medicine, needs subscription) http://dx.doi.org/10.1002/(SICI)1097-0258(19971030)16:20<2349::AID-SIM667>3.0.CO;2-E and http://www3.interscience.wiley.com/cgi-bin/abstract/62004087/ABSTRACT -- Armin Goralczyk, M.D. Dept. of General Surgery University of G?ttingen G?ttingen, Germany http://www.chirurgie-goettingen.de