thr3ads.net - R help - [R] using a noisy variable in regression (not an R question) [Mar 2009]

If this information is useful, please help other people find it:
Share via:

Juliet Hannah

2009-Mar-07 17:49 UTC

[R] using a noisy variable in regression (not an R question)

Hi, This is not an R question, but I've seen opinions given on non R
topics, so I wanted
to give it a try. :)

How would one treat a variable that was measured once, but is known to
fluctuate a lot?
For example, I want to include a hormone in my regression as an
explanatory variable. However, this
hormone varies in its levels throughout a day. Nevertheless, its levels differ
substantially between individuals so that there is information there to use.

One simple thing to try would be to form categories, but I assume
there are better ways to handle this. Has anyone worked with such data, or could
anyone suggest some keywords that may be helpful in searching for this
topic. Thanks
for your input.

Regards,

Juliet

John Sorkin

2009-Mar-07 18:14 UTC

head link

[R] using a noisy variable in regression (not an R question)

Juliet,
The answer is simple - add the measured value as an independent variable to the
regression. There is no need to convert continuous values to categorical values.
If there is a circadian rhythm to the hormone secretion (e.g. cortisol) I would
try to get values at the same time of day for all study participants. Baring
this, perhaps you could adjust both for the hormone concentration and the time
of day the sample was obtained.
John

John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)
>>> Juliet Hannah <juliet.hannah at gmail.com> 3/7/2009 12:49 PM
>>>Hi, This is not an R question, but I've seen opinions given on non R
topics, so I wanted
to give it a try. :)

How would one treat a variable that was measured once, but is known to
fluctuate a lot?
For example, I want to include a hormone in my regression as an
explanatory variable. However, this
hormone varies in its levels throughout a day. Nevertheless, its levels differ
substantially between individuals so that there is information there to use.

One simple thing to try would be to form categories, but I assume
there are better ways to handle this. Has anyone worked with such data, or could
anyone suggest some keywords that may be helpful in searching for this
topic. Thanks
for your input.

Regards,

Juliet

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}

Jonathan Baron

2009-Mar-07 18:21 UTC

head link

[R] using a noisy variable in regression (not an R question)

If you form categories, you add even more error, specifically, the
variation in the distance between each number and the category
boundary.

What's wrong with just including it in the regression?

Yes, the measure X1 will account for less variance than the underlying
variable of real interest (T1, each individual's mean, perhaps), but
X1 could still be useful in two ways.  One, it might be a significant
predictor of the dependent variable Y despite the error.  Two, it
might increase the sensitivity of the model to other predictors (X2,
X3...) by accounting for what would otherwise be error.

What you cannot conclude in this case (when you measure a predictor
with error) is that the effect of (say) X2 is not accounted for by its
correlation with T1.  Some people try to conclude this when X2 remains
a significant predictor of Y when X1 is included in the model.  The
trouble is that X1 is an error-prone measure of T1, so the full effect
of T1 is not removed by inclusion of X1.

Jon

On 03/07/09 12:49, Juliet Hannah wrote:> Hi, This is not an R question, but I've seen opinions given on non R
> topics, so I wanted
> to give it a try. :)
> 
> How would one treat a variable that was measured once, but is known to
> fluctuate a lot?
> For example, I want to include a hormone in my regression as an
> explanatory variable. However, this
> hormone varies in its levels throughout a day. Nevertheless, its levels
differ
> substantially between individuals so that there is information there to
use.
> 
> One simple thing to try would be to form categories, but I assume
> there are better ways to handle this. Has anyone worked with such data, or
could
> anyone suggest some keywords that may be helpful in searching for this
> topic. Thanks
> for your input.
> 
> Regards,
> 
> Juliet
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron
Editor: Judgment and Decision Making (http://journal.sjdm.org)

Stephan Kolassa

2009-Mar-07 18:25 UTC

head link

[R] using a noisy variable in regression (not an R question)

Hi Juliet,

Juliet Hannah schrieb:> 
> One simple thing to try would be to form categories
> 
Simple but problematic. Frank Harrell put together a wonderful page 
detailing all the issues with categorizing continuous data:
http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/CatContinuous

So: keep your data continuous.

Apart from that, I would second John's recommendation to try to get 
samples at the same point in time (and, if it is cortisol, stay away 
from smokers etc.).

Best wishes
Stephan

Paul Johnson

2009-Mar-07 19:58 UTC

head link

[R] using a noisy variable in regression (not an R question)

On Sat, Mar 7, 2009 at 11:49 AM, Juliet Hannah <juliet.hannah at
gmail.com> wrote:> Hi, This is not an R question, but I've seen opinions given on non R
> topics, so I wanted
> to give it a try. :)
>
> How would one treat a variable that was measured once, but is known to
> fluctuate a lot?
> For example, I want to include a hormone in my regression as an
> explanatory variable. However, this
> hormone varies in its levels throughout a day. Nevertheless, its levels
differ
> substantially between individuals so that there is information there to
use.
>
> One simple thing to try would be to form categories, but I assume
> there are better ways to handle this. Has anyone worked with such data, or
could
> anyone suggest some keywords that may be helpful in searching for this
> topic. Thanks
> for your input.
>
>From teaching econometrics, I remember that if the "truth" isy=b0+b1x1+noise and then you do not have a correct measure of x1, but
rather something else like ex1=x1+noise, then the regression estimate
of b1 is biased, generally attenuated.  As far as I understand it, the
technical solutions are not too encouraging You can try to get better
data or possibly to  build an instrumental variables model, where you
could have other predictors of the "true" value of x1 in a first stage
model.  I don't recall that I was able to persuade myself that
approach really solves anything, but many people recommend it. I
suppose a key question is whether you can persuade your audience that
ex1= x1+noise and whether that noise is well behaved.

As I was considering your problem, I was wondering if there might not
be a "mixed model" approach to this problem.  You hypothesize the
truth is y=b0+b1x1+noise, but you don't have x1.  So suppose you
reconsider the "truth" as a random parameter, as in y=b0+c1*ex1+noise.
ex1 is a fixed estimate of the hormone level for each observation.  c1
is a random, varying coefficient because the effect of the hormone
fluctuates in an unmeasurable way. Then you could try to estimate the
distribution of c1.

You have an interesting problem, I think.

pj
-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

Charles C. Berry

2009-Mar-07 20:48 UTC

head link

[R] using a noisy variable in regression (not an R question)

On Sat, 7 Mar 2009, Juliet Hannah wrote:
> Hi, This is not an R question, but I've seen opinions given on non R
> topics, so I wanted
> to give it a try. :)
>
> How would one treat a variable that was measured once, but is known to
> fluctuate a lot?
> For example, I want to include a hormone in my regression as an
> explanatory variable. However, this
> hormone varies in its levels throughout a day. Nevertheless, its levels
differ
> substantially between individuals so that there is information there to
use.
>
> One simple thing to try would be to form categories, but I assume
> there are better ways to handle this. Has anyone worked with such data, or
could
> anyone suggest some keywords that may be helpful in searching for this

Try:

correction for attenuation

measurement error models

errors-in-variables

Wayne Fuller

LA Stefanski and RJ Carroll

William Cochran

HTH,

Chuck
> topic. Thanks
> for your input.
>
> Regards,
>
> Juliet
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

John Maindonald

2009-Mar-08 23:41 UTC

head link

[R] using a noisy variable in regression (not an R question)

One can "just include it in the regression", but the potential
problems
for interpretation are surely greater than those indicated.  Inclusion  
of
X1 = T1+E1 may cause X2 to appear significant when in fact it is having
no effect at all.  Or the true effect can be reversed in sign.  This  
happens
because X1 and X2 are correlated.  Maybe this is implicit in what Jon
is saying.

See Carroll, Ruppert and Stefanski:
Measurement Error in Nonlinear Models (2004, pp.52-55).  The error in E1
may need to be fairly large relative to SD(T1) for this to be an  
issue.  My notes
at http://www.maths.anu.edu.au/%7Ejohnm/r-book/2edn/xtras/xtras.pdf
have brief comments, and code that can be used to illustrate the point.

I support Stephen Kolassa's suggestions re using simulation for
sensitivity analysis, though I think this can also be done analytically.

John Maindonald             email: john.maindonald@anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 08/03/2009, at 10:00 PM, r-help-request@r-project.org wrote:
> From: Jonathan Baron <baron@psych.upenn.edu>
> Date: 8 March 2009 5:21:55 AM
> To: Juliet Hannah <juliet.hannah@gmail.com>
> Cc: r-help@r-project.org
> Subject: Re: [R] using a noisy variable in regression (not an R  
> question)
>
>
> If you form categories, you add even more error, specifically, the
> variation in the distance between each number and the category
> boundary.
>
> What's wrong with just including it in the regression?
>
> Yes, the measure X1 will account for less variance than the underlying
> variable of real interest (T1, each individual's mean, perhaps), but
> X1 could still be useful in two ways.  One, it might be a significant
> predictor of the dependent variable Y despite the error.  Two, it
> might increase the sensitivity of the model to other predictors (X2,
> X3...) by accounting for what would otherwise be error.
>
> What you cannot conclude in this case (when you measure a predictor
> with error) is that the effect of (say) X2 is not accounted for by its
> correlation with T1.  Some people try to conclude this when X2 remains
> a significant predictor of Y when X1 is included in the model.  The
> trouble is that X1 is an error-prone measure of T1, so the full effect
> of T1 is not removed by inclusion of X1.
>
> Jon
>
> On 03/07/09 12:49, Juliet Hannah wrote:
>> Hi, This is not an R question, but I've seen opinions given on non
R
>> topics, so I wanted
>> to give it a try. :)
>>
>> How would one treat a variable that was measured once, but is known  
>> to
>> fluctuate a lot?
>> For example, I want to include a hormone in my regression as an
>> explanatory variable. However, this
>> hormone varies in its levels throughout a day. Nevertheless, its  
>> levels differ
>> substantially between individuals so that there is information  
>> there to use.
>>
>> One simple thing to try would be to form categories, but I assume
>> there are better ways to handle this. Has anyone worked with such  
>> data, or could
>> anyone suggest some keywords that may be helpful in searching for  
>> this
>> topic. Thanks
>> for your input.
>>
>> Regards,
>>
>> Juliet
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> -- 
> Jonathan Baron, Professor of Psychology, University of Pennsylvania
> Home page: http://www.sas.upenn.edu/~baron
> Editor: Judgment and Decision Making (http://journal.sjdm.org)

	[[alternative HTML version deleted]]

Maybe Matching Threads

Search for more reasonably related threads

R help - Mar 2009 - using a noisy variable in regression (not an R question)

[R] using a noisy variable in regression (not an R question)

[R] using a noisy variable in regression (not an R question)

[R] using a noisy variable in regression (not an R question)

[R] using a noisy variable in regression (not an R question)

[R] using a noisy variable in regression (not an R question)

[R] using a noisy variable in regression (not an R question)

[R] using a noisy variable in regression (not an R question)

Maybe Matching Threads