Stathis Kamperis
2013-Aug-08 10:43 UTC
[R] Varying statistical significance in estimates of linear model
Hi everyone, I have a response variable 'y' and several predictor variables 'x_i'. I start with a linear model: m1 <- lm(y ~ x1); summary(m1) and I get a statistically significant estimate for 'x1'. Then, I modify my model as: m2 <- lm(y ~ x1 + x2); summary(m2) At this moment, the estimate for x1 might become non-significant while the estimate of x2 significant. As I add more predictor variables (or interaction terms), the estimates for which I get a statistically significant result vary. So sometimes x1, x2, x6 are significant, while others, x2, x4, x5 are. It seems to me that I could tweak my model in such a way (by adding/removing predictor variables or "suitable" interaction terms) that I could "prove" whatever I'd like to prove. What is the proper methodology involved here ? What do you people do in such cases ? I can provide the data if anyone cares and would like to have a look at them. Best regards, Stathis Kamperis
ONKELINX, Thierry
2013-Aug-08 13:25 UTC
[R] Varying statistical significance in estimates of linear model
Dear Stathis, I recommend that you try to get some advice from a local statistician or read an introductory book on statistics. This kind of question is beyond the scope of a mailing list. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -----Oorspronkelijk bericht----- Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Namens Stathis Kamperis Verzonden: donderdag 8 augustus 2013 12:43 Aan: r-help at r-project.org Onderwerp: [R] Varying statistical significance in estimates of linear model Hi everyone, I have a response variable 'y' and several predictor variables 'x_i'. I start with a linear model: m1 <- lm(y ~ x1); summary(m1) and I get a statistically significant estimate for 'x1'. Then, I modify my model as: m2 <- lm(y ~ x1 + x2); summary(m2) At this moment, the estimate for x1 might become non-significant while the estimate of x2 significant. As I add more predictor variables (or interaction terms), the estimates for which I get a statistically significant result vary. So sometimes x1, x2, x6 are significant, while others, x2, x4, x5 are. It seems to me that I could tweak my model in such a way (by adding/removing predictor variables or "suitable" interaction terms) that I could "prove" whatever I'd like to prove. What is the proper methodology involved here ? What do you people do in such cases ? I can provide the data if anyone cares and would like to have a look at them. Best regards, Stathis Kamperis ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * * Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.
Bert Gunter
2013-Aug-08 13:29 UTC
[R] Varying statistical significance in estimates of linear model
Stathis: 1. This has nothing to do with R. Post on a statistics list, like stats.stackexchange.com 2. Read a basic regression/linear models text. You need to educate yourself. -- Bert On Thu, Aug 8, 2013 at 3:43 AM, Stathis Kamperis <ekamperi at gmail.com> wrote:> Hi everyone, > > I have a response variable 'y' and several predictor variables 'x_i'. > I start with a linear model: > > m1 <- lm(y ~ x1); summary(m1) > > and I get a statistically significant estimate for 'x1'. Then, I > modify my model as: > > m2 <- lm(y ~ x1 + x2); summary(m2) > > At this moment, the estimate for x1 might become non-significant while > the estimate of x2 significant. > > As I add more predictor variables (or interaction terms), the > estimates for which I get a statistically significant result vary. So > sometimes x1, x2, x6 are significant, while others, x2, x4, x5 are. > > It seems to me that I could tweak my model in such a way (by > adding/removing predictor variables or "suitable" interaction terms) > that I could "prove" whatever I'd like to prove. > > What is the proper methodology involved here ? What do you people do > in such cases ? I can provide the data if anyone cares and would like > to have a look at them. > > Best regards, > Stathis Kamperis > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Stathis Kamperis
2013-Aug-09 20:20 UTC
[R] Varying statistical significance in estimates of linear model
For archiving reasons: 1. "Practical Regression and Anova using R" by Faraway 2. Possible reason: multi-collinearity in predictor variables. Thanks everybody! On Thu, Aug 8, 2013 at 1:43 PM, Stathis Kamperis <ekamperi at gmail.com> wrote:> Hi everyone, > > I have a response variable 'y' and several predictor variables 'x_i'. > I start with a linear model: > > m1 <- lm(y ~ x1); summary(m1) > > and I get a statistically significant estimate for 'x1'. Then, I > modify my model as: > > m2 <- lm(y ~ x1 + x2); summary(m2) > > At this moment, the estimate for x1 might become non-significant while > the estimate of x2 significant. > > As I add more predictor variables (or interaction terms), the > estimates for which I get a statistically significant result vary. So > sometimes x1, x2, x6 are significant, while others, x2, x4, x5 are. > > It seems to me that I could tweak my model in such a way (by > adding/removing predictor variables or "suitable" interaction terms) > that I could "prove" whatever I'd like to prove. > > What is the proper methodology involved here ? What do you people do > in such cases ? I can provide the data if anyone cares and would like > to have a look at them. > > Best regards, > Stathis Kamperis