Juliet Hannah
2009-Mar-07 17:49 UTC
[R] using a noisy variable in regression (not an R question)
Hi, This is not an R question, but I've seen opinions given on non R topics, so I wanted to give it a try. :) How would one treat a variable that was measured once, but is known to fluctuate a lot? For example, I want to include a hormone in my regression as an explanatory variable. However, this hormone varies in its levels throughout a day. Nevertheless, its levels differ substantially between individuals so that there is information there to use. One simple thing to try would be to form categories, but I assume there are better ways to handle this. Has anyone worked with such data, or could anyone suggest some keywords that may be helpful in searching for this topic. Thanks for your input. Regards, Juliet
John Sorkin
2009-Mar-07 18:14 UTC
[R] using a noisy variable in regression (not an R question)
Juliet, The answer is simple - add the measured value as an independent variable to the regression. There is no need to convert continuous values to categorical values. If there is a circadian rhythm to the hormone secretion (e.g. cortisol) I would try to get values at the same time of day for all study participants. Baring this, perhaps you could adjust both for the hormone concentration and the time of day the sample was obtained. John John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing)>>> Juliet Hannah <juliet.hannah at gmail.com> 3/7/2009 12:49 PM >>>Hi, This is not an R question, but I've seen opinions given on non R topics, so I wanted to give it a try. :) How would one treat a variable that was measured once, but is known to fluctuate a lot? For example, I want to include a hormone in my regression as an explanatory variable. However, this hormone varies in its levels throughout a day. Nevertheless, its levels differ substantially between individuals so that there is information there to use. One simple thing to try would be to form categories, but I assume there are better ways to handle this. Has anyone worked with such data, or could anyone suggest some keywords that may be helpful in searching for this topic. Thanks for your input. Regards, Juliet ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Confidentiality Statement: This email message, including any attachments, is for th...{{dropped:6}}
Jonathan Baron
2009-Mar-07 18:21 UTC
[R] using a noisy variable in regression (not an R question)
If you form categories, you add even more error, specifically, the variation in the distance between each number and the category boundary. What's wrong with just including it in the regression? Yes, the measure X1 will account for less variance than the underlying variable of real interest (T1, each individual's mean, perhaps), but X1 could still be useful in two ways. One, it might be a significant predictor of the dependent variable Y despite the error. Two, it might increase the sensitivity of the model to other predictors (X2, X3...) by accounting for what would otherwise be error. What you cannot conclude in this case (when you measure a predictor with error) is that the effect of (say) X2 is not accounted for by its correlation with T1. Some people try to conclude this when X2 remains a significant predictor of Y when X1 is included in the model. The trouble is that X1 is an error-prone measure of T1, so the full effect of T1 is not removed by inclusion of X1. Jon On 03/07/09 12:49, Juliet Hannah wrote:> Hi, This is not an R question, but I've seen opinions given on non R > topics, so I wanted > to give it a try. :) > > How would one treat a variable that was measured once, but is known to > fluctuate a lot? > For example, I want to include a hormone in my regression as an > explanatory variable. However, this > hormone varies in its levels throughout a day. Nevertheless, its levels differ > substantially between individuals so that there is information there to use. > > One simple thing to try would be to form categories, but I assume > there are better ways to handle this. Has anyone worked with such data, or could > anyone suggest some keywords that may be helpful in searching for this > topic. Thanks > for your input. > > Regards, > > Juliet > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron Editor: Judgment and Decision Making (http://journal.sjdm.org)
Stephan Kolassa
2009-Mar-07 18:25 UTC
[R] using a noisy variable in regression (not an R question)
Hi Juliet, Juliet Hannah schrieb:> > One simple thing to try would be to form categories >Simple but problematic. Frank Harrell put together a wonderful page detailing all the issues with categorizing continuous data: http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/CatContinuous So: keep your data continuous. Apart from that, I would second John's recommendation to try to get samples at the same point in time (and, if it is cortisol, stay away from smokers etc.). Best wishes Stephan
Paul Johnson
2009-Mar-07 19:58 UTC
[R] using a noisy variable in regression (not an R question)
On Sat, Mar 7, 2009 at 11:49 AM, Juliet Hannah <juliet.hannah at gmail.com> wrote:> Hi, This is not an R question, but I've seen opinions given on non R > topics, so I wanted > to give it a try. :) > > How would one treat a variable that was measured once, but is known to > fluctuate a lot? > For example, I want to include a hormone in my regression as an > explanatory variable. However, this > hormone varies in its levels throughout a day. Nevertheless, its levels differ > substantially between individuals so that there is information there to use. > > One simple thing to try would be to form categories, but I assume > there are better ways to handle this. Has anyone worked with such data, or could > anyone suggest some keywords that may be helpful in searching for this > topic. Thanks > for your input. >>From teaching econometrics, I remember that if the "truth" isy=b0+b1x1+noise and then you do not have a correct measure of x1, but rather something else like ex1=x1+noise, then the regression estimate of b1 is biased, generally attenuated. As far as I understand it, the technical solutions are not too encouraging You can try to get better data or possibly to build an instrumental variables model, where you could have other predictors of the "true" value of x1 in a first stage model. I don't recall that I was able to persuade myself that approach really solves anything, but many people recommend it. I suppose a key question is whether you can persuade your audience that ex1= x1+noise and whether that noise is well behaved. As I was considering your problem, I was wondering if there might not be a "mixed model" approach to this problem. You hypothesize the truth is y=b0+b1x1+noise, but you don't have x1. So suppose you reconsider the "truth" as a random parameter, as in y=b0+c1*ex1+noise. ex1 is a fixed estimate of the hormone level for each observation. c1 is a random, varying coefficient because the effect of the hormone fluctuates in an unmeasurable way. Then you could try to estimate the distribution of c1. You have an interesting problem, I think. pj -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas
Charles C. Berry
2009-Mar-07 20:48 UTC
[R] using a noisy variable in regression (not an R question)
On Sat, 7 Mar 2009, Juliet Hannah wrote:> Hi, This is not an R question, but I've seen opinions given on non R > topics, so I wanted > to give it a try. :) > > How would one treat a variable that was measured once, but is known to > fluctuate a lot? > For example, I want to include a hormone in my regression as an > explanatory variable. However, this > hormone varies in its levels throughout a day. Nevertheless, its levels differ > substantially between individuals so that there is information there to use. > > One simple thing to try would be to form categories, but I assume > there are better ways to handle this. Has anyone worked with such data, or could > anyone suggest some keywords that may be helpful in searching for thisTry: correction for attenuation measurement error models errors-in-variables Wayne Fuller LA Stefanski and RJ Carroll William Cochran HTH, Chuck> topic. Thanks > for your input. > > Regards, > > Juliet > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
John Maindonald
2009-Mar-08 23:41 UTC
[R] using a noisy variable in regression (not an R question)
One can "just include it in the regression", but the potential problems for interpretation are surely greater than those indicated. Inclusion of X1 = T1+E1 may cause X2 to appear significant when in fact it is having no effect at all. Or the true effect can be reversed in sign. This happens because X1 and X2 are correlated. Maybe this is implicit in what Jon is saying. See Carroll, Ruppert and Stefanski: Measurement Error in Nonlinear Models (2004, pp.52-55). The error in E1 may need to be fairly large relative to SD(T1) for this to be an issue. My notes at http://www.maths.anu.edu.au/%7Ejohnm/r-book/2edn/xtras/xtras.pdf have brief comments, and code that can be used to illustrate the point. I support Stephen Kolassa's suggestions re using simulation for sensitivity analysis, though I think this can also be done analytically. John Maindonald email: john.maindonald@anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. On 08/03/2009, at 10:00 PM, r-help-request@r-project.org wrote:> From: Jonathan Baron <baron@psych.upenn.edu> > Date: 8 March 2009 5:21:55 AM > To: Juliet Hannah <juliet.hannah@gmail.com> > Cc: r-help@r-project.org > Subject: Re: [R] using a noisy variable in regression (not an R > question) > > > If you form categories, you add even more error, specifically, the > variation in the distance between each number and the category > boundary. > > What's wrong with just including it in the regression? > > Yes, the measure X1 will account for less variance than the underlying > variable of real interest (T1, each individual's mean, perhaps), but > X1 could still be useful in two ways. One, it might be a significant > predictor of the dependent variable Y despite the error. Two, it > might increase the sensitivity of the model to other predictors (X2, > X3...) by accounting for what would otherwise be error. > > What you cannot conclude in this case (when you measure a predictor > with error) is that the effect of (say) X2 is not accounted for by its > correlation with T1. Some people try to conclude this when X2 remains > a significant predictor of Y when X1 is included in the model. The > trouble is that X1 is an error-prone measure of T1, so the full effect > of T1 is not removed by inclusion of X1. > > Jon > > On 03/07/09 12:49, Juliet Hannah wrote: >> Hi, This is not an R question, but I've seen opinions given on non R >> topics, so I wanted >> to give it a try. :) >> >> How would one treat a variable that was measured once, but is known >> to >> fluctuate a lot? >> For example, I want to include a hormone in my regression as an >> explanatory variable. However, this >> hormone varies in its levels throughout a day. Nevertheless, its >> levels differ >> substantially between individuals so that there is information >> there to use. >> >> One simple thing to try would be to form categories, but I assume >> there are better ways to handle this. Has anyone worked with such >> data, or could >> anyone suggest some keywords that may be helpful in searching for >> this >> topic. Thanks >> for your input. >> >> Regards, >> >> Juliet >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- > Jonathan Baron, Professor of Psychology, University of Pennsylvania > Home page: http://www.sas.upenn.edu/~baron > Editor: Judgment and Decision Making (http://journal.sjdm.org)[[alternative HTML version deleted]]