Ana Marija
2020-Sep-14 13:29 UTC
[R] How to represent the effect of one covariate on regression results?
Hello, I was running association analysis using --glm genotypic from: https://www.cog-genomics.org/plink/2.0/assoc with these covariates: sex,age,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,TD,array,HBA1C. The result looks like this: #CHROM POS ID REF ALT A1 TEST OBS_CT BETA SE Z_OR_F_STAT P ERRCODE 10 135434303 rs11101905 G A A ADD 11863 -0.110733 0.0986981 -1.12193 0.261891 . 10 135434303 rs11101905 G A A DOMDEV 11863 0.079797 0.111004 0.718868 0.472222 . 10 135434303 rs11101905 G A A sex=Female 11863 -0.120404 0.0536069 -2.24605 0.0247006 . 10 135434303 rs11101905 G A A age 11863 0.00524501 0.00391528 1.33963 0.180367 . 10 135434303 rs11101905 G A A PC1 11863 -0.0191779 0.0166868 -1.14928 0.25044 . 10 135434303 rs11101905 G A A PC2 11863 -0.0269939 0.0173086 -1.55957 0.118863 . 10 135434303 rs11101905 G A A PC3 11863 0.0115207 0.0168076 0.685448 0.493061 . 10 135434303 rs11101905 G A A PC4 11863 9.57832e-05 0.0124607 0.0076868 0.993867 . 10 135434303 rs11101905 G A A PC5 11863 -0.00191047 0.00543937 -0.35123 0.725416 . 10 135434303 rs11101905 G A A PC6 11863 -0.0103309 0.0159879 -0.646172 0.518168 . 10 135434303 rs11101905 G A A PC7 11863 0.00790997 0.0144025 0.549207 0.582863 . 10 135434303 rs11101905 G A A PC8 11863 -0.00205639 0.0142709 -0.144096 0.885424 . 10 135434303 rs11101905 G A A PC9 11863 -0.00873771 0.0057239 -1.52653 0.126878 . 10 135434303 rs11101905 G A A PC10 11863 0.0116197 0.0123826 0.938388 0.348045 . 10 135434303 rs11101905 G A A TD 11863 -0.670026 0.0962216 -6.96337 3.32228e-12 . 10 135434303 rs11101905 G A A array=Biobank 11863 0.160666 0.073631 2.18205 0.0291062 . 10 135434303 rs11101905 G A A HBA1C 11863 0.0265933 0.00168758 15.7583 6.0236e-56 . 10 135434303 rs11101905 G A A GENO_2DF 11863 NA NA 0.726514 0.483613 . This results is shown just for one ID (rs11101905) there is about 2 million of those in the resulting file. My question is how do I present/plot the effect of covariate "TD" in the example it has "P" equal to 3.32228e-12 for all IDs in the resulting file so that I show how much effect covariate "TD" has on the analysis. Should I run another regression without covariate "TD" and than do scatter plot of P values with and without "TD" covariate or there is a better way to do this from the data I already have? Thanks Ana
Abby Spurdle
2020-Sep-15 03:12 UTC
[R] How to represent the effect of one covariate on regression results?
I'm wondering if you want one of these: (1) Plots of "Main Effects". (2) "Partial Residual Plots". Search for them, and you should be able to tell if they're what you want. But a word of warning: Many people (including many senior statisticians) misinterpret this kind of information. Because, it's always the effect of xj on Y, while holding the other variables *constant*. That's not as simple as it sounds, and people have a tendency of disregarding the importance of the second half of that sentence, in their final interpretations. P.S. John Fox, announced a package with support for Regression Diagnostics, about 11 days ago: https://stat.ethz.ch/pipermail/r-help/2020-September/468609.html I'm not sure how relevant it is to your question, but I just glanced at the vignette, and it's pretty slick... On Tue, Sep 15, 2020 at 1:30 AM Ana Marija <sokovic.anamarija at gmail.com> wrote:> > Hello, > > I was running association analysis using --glm genotypic from: > https://www.cog-genomics.org/plink/2.0/assoc with these covariates: > sex,age,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,TD,array,HBA1C. The > result looks like this: > > #CHROM POS ID REF ALT A1 TEST OBS_CT BETA > SE Z_OR_F_STAT P ERRCODE > 10 135434303 rs11101905 G A A ADD 11863 > -0.110733 0.0986981 -1.12193 0.261891 . > 10 135434303 rs11101905 G A A DOMDEV 11863 > 0.079797 0.111004 0.718868 0.472222 . > 10 135434303 rs11101905 G A A sex=Female > 11863 -0.120404 0.0536069 -2.24605 0.0247006 . > 10 135434303 rs11101905 G A A age 11863 > 0.00524501 0.00391528 1.33963 0.180367 . > 10 135434303 rs11101905 G A A PC1 11863 > -0.0191779 0.0166868 -1.14928 0.25044 . > 10 135434303 rs11101905 G A A PC2 11863 > -0.0269939 0.0173086 -1.55957 0.118863 . > 10 135434303 rs11101905 G A A PC3 11863 > 0.0115207 0.0168076 0.685448 0.493061 . > 10 135434303 rs11101905 G A A PC4 11863 > 9.57832e-05 0.0124607 0.0076868 0.993867 . > 10 135434303 rs11101905 G A A PC5 11863 > -0.00191047 0.00543937 -0.35123 0.725416 . > 10 135434303 rs11101905 G A A PC6 11863 > -0.0103309 0.0159879 -0.646172 0.518168 . > 10 135434303 rs11101905 G A A PC7 11863 > 0.00790997 0.0144025 0.549207 0.582863 . > 10 135434303 rs11101905 G A A PC8 11863 > -0.00205639 0.0142709 -0.144096 0.885424 . > 10 135434303 rs11101905 G A A PC9 11863 > -0.00873771 0.0057239 -1.52653 0.126878 . > 10 135434303 rs11101905 G A A PC10 11863 > 0.0116197 0.0123826 0.938388 0.348045 . > 10 135434303 rs11101905 G A A TD 11863 > -0.670026 0.0962216 -6.96337 3.32228e-12 . > 10 135434303 rs11101905 G A A array=Biobank > 11863 0.160666 0.073631 2.18205 0.0291062 . > 10 135434303 rs11101905 G A A HBA1C 11863 > 0.0265933 0.00168758 15.7583 6.0236e-56 . > 10 135434303 rs11101905 G A A GENO_2DF 11863 > NA NA 0.726514 0.483613 . > > This results is shown just for one ID (rs11101905) there is about 2 > million of those in the resulting file. > > My question is how do I present/plot the effect of covariate "TD" in > the example it has "P" equal to 3.32228e-12 for all IDs in the > resulting file so that I show how much effect covariate "TD" has on > the analysis. Should I run another regression without covariate "TD" > and than do scatter plot of P values with and without "TD" covariate > or there is a better way to do this from the data I already have? > > Thanks > Ana > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
David Winsemius
2020-Sep-15 06:26 UTC
[R] How to represent the effect of one covariate on regression results?
There is a user-group for PLINK, easily found by looking at the page you cited. This is not the correct place to submit such questions. https://groups.google.com/g/plink2-users?pli=1 -- David. On 9/14/20 6:29 AM, Ana Marija wrote:> Hello, > > I was running association analysis using --glm genotypic from: > https://www.cog-genomics.org/plink/2.0/assoc with these covariates: > sex,age,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,TD,array,HBA1C. The > result looks like this: > > #CHROM POS ID REF ALT A1 TEST OBS_CT BETA > SE Z_OR_F_STAT P ERRCODE > 10 135434303 rs11101905 G A A ADD 11863 > -0.110733 0.0986981 -1.12193 0.261891 . > 10 135434303 rs11101905 G A A DOMDEV 11863 > 0.079797 0.111004 0.718868 0.472222 . > 10 135434303 rs11101905 G A A sex=Female > 11863 -0.120404 0.0536069 -2.24605 0.0247006 . > 10 135434303 rs11101905 G A A age 11863 > 0.00524501 0.00391528 1.33963 0.180367 . > 10 135434303 rs11101905 G A A PC1 11863 > -0.0191779 0.0166868 -1.14928 0.25044 . > 10 135434303 rs11101905 G A A PC2 11863 > -0.0269939 0.0173086 -1.55957 0.118863 . > 10 135434303 rs11101905 G A A PC3 11863 > 0.0115207 0.0168076 0.685448 0.493061 . > 10 135434303 rs11101905 G A A PC4 11863 > 9.57832e-05 0.0124607 0.0076868 0.993867 . > 10 135434303 rs11101905 G A A PC5 11863 > -0.00191047 0.00543937 -0.35123 0.725416 . > 10 135434303 rs11101905 G A A PC6 11863 > -0.0103309 0.0159879 -0.646172 0.518168 . > 10 135434303 rs11101905 G A A PC7 11863 > 0.00790997 0.0144025 0.549207 0.582863 . > 10 135434303 rs11101905 G A A PC8 11863 > -0.00205639 0.0142709 -0.144096 0.885424 . > 10 135434303 rs11101905 G A A PC9 11863 > -0.00873771 0.0057239 -1.52653 0.126878 . > 10 135434303 rs11101905 G A A PC10 11863 > 0.0116197 0.0123826 0.938388 0.348045 . > 10 135434303 rs11101905 G A A TD 11863 > -0.670026 0.0962216 -6.96337 3.32228e-12 . > 10 135434303 rs11101905 G A A array=Biobank > 11863 0.160666 0.073631 2.18205 0.0291062 . > 10 135434303 rs11101905 G A A HBA1C 11863 > 0.0265933 0.00168758 15.7583 6.0236e-56 . > 10 135434303 rs11101905 G A A GENO_2DF 11863 > NA NA 0.726514 0.483613 . > > This results is shown just for one ID (rs11101905) there is about 2 > million of those in the resulting file. > > My question is how do I present/plot the effect of covariate "TD" in > the example it has "P" equal to 3.32228e-12 for all IDs in the > resulting file so that I show how much effect covariate "TD" has on > the analysis. Should I run another regression without covariate "TD" > and than do scatter plot of P values with and without "TD" covariate > or there is a better way to do this from the data I already have? > > Thanks > Ana > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Ana Marija
2020-Sep-15 15:57 UTC
[R] How to represent the effect of one covariate on regression results?
Hi Abby and David, Thanks for the useful tips! I will check those. I completed the regression analysis in plink (as R would be very slow for my sample size) but as I mentioned I need to determine the influence of a specific covariate in my results and Plink is of no help there. I did Pearson correlation analysis for P values which I got in regression with and without my covariate of interest and I got this:> cor.test(tt$P_TD, tt$P_noTD, method = "pearson", conf.level = 0.95)Pearson's product-moment correlation data: tt$P_TD and tt$P_noTD t = 20.17, df = 283, p-value < 2.2e-16 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.7156134 0.8117108 sample estimates: cor 0.7679493 I can see the p values are very correlated in those two instances. Can I conclude that my covariate then doesn't have a huge effect or what kind of conclusion I can draw from that? Thanks for all your help Ana On Tue, Sep 15, 2020 at 1:26 AM David Winsemius <dwinsemius at comcast.net> wrote:> > There is a user-group for PLINK, easily found by looking at the page you > cited. This is not the correct place to submit such questions. > > > https://groups.google.com/g/plink2-users?pli=1 > > > -- > > David. > > On 9/14/20 6:29 AM, Ana Marija wrote: > > Hello, > > > > I was running association analysis using --glm genotypic from: > > https://www.cog-genomics.org/plink/2.0/assoc with these covariates: > > sex,age,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,TD,array,HBA1C. The > > result looks like this: > > > > #CHROM POS ID REF ALT A1 TEST OBS_CT BETA > > SE Z_OR_F_STAT P ERRCODE > > 10 135434303 rs11101905 G A A ADD 11863 > > -0.110733 0.0986981 -1.12193 0.261891 . > > 10 135434303 rs11101905 G A A DOMDEV 11863 > > 0.079797 0.111004 0.718868 0.472222 . > > 10 135434303 rs11101905 G A A sex=Female > > 11863 -0.120404 0.0536069 -2.24605 0.0247006 . > > 10 135434303 rs11101905 G A A age 11863 > > 0.00524501 0.00391528 1.33963 0.180367 . > > 10 135434303 rs11101905 G A A PC1 11863 > > -0.0191779 0.0166868 -1.14928 0.25044 . > > 10 135434303 rs11101905 G A A PC2 11863 > > -0.0269939 0.0173086 -1.55957 0.118863 . > > 10 135434303 rs11101905 G A A PC3 11863 > > 0.0115207 0.0168076 0.685448 0.493061 . > > 10 135434303 rs11101905 G A A PC4 11863 > > 9.57832e-05 0.0124607 0.0076868 0.993867 . > > 10 135434303 rs11101905 G A A PC5 11863 > > -0.00191047 0.00543937 -0.35123 0.725416 . > > 10 135434303 rs11101905 G A A PC6 11863 > > -0.0103309 0.0159879 -0.646172 0.518168 . > > 10 135434303 rs11101905 G A A PC7 11863 > > 0.00790997 0.0144025 0.549207 0.582863 . > > 10 135434303 rs11101905 G A A PC8 11863 > > -0.00205639 0.0142709 -0.144096 0.885424 . > > 10 135434303 rs11101905 G A A PC9 11863 > > -0.00873771 0.0057239 -1.52653 0.126878 . > > 10 135434303 rs11101905 G A A PC10 11863 > > 0.0116197 0.0123826 0.938388 0.348045 . > > 10 135434303 rs11101905 G A A TD 11863 > > -0.670026 0.0962216 -6.96337 3.32228e-12 . > > 10 135434303 rs11101905 G A A array=Biobank > > 11863 0.160666 0.073631 2.18205 0.0291062 . > > 10 135434303 rs11101905 G A A HBA1C 11863 > > 0.0265933 0.00168758 15.7583 6.0236e-56 . > > 10 135434303 rs11101905 G A A GENO_2DF 11863 > > NA NA 0.726514 0.483613 . > > > > This results is shown just for one ID (rs11101905) there is about 2 > > million of those in the resulting file. > > > > My question is how do I present/plot the effect of covariate "TD" in > > the example it has "P" equal to 3.32228e-12 for all IDs in the > > resulting file so that I show how much effect covariate "TD" has on > > the analysis. Should I run another regression without covariate "TD" > > and than do scatter plot of P values with and without "TD" covariate > > or there is a better way to do this from the data I already have? > > > > Thanks > > Ana > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code.
Abby Spurdle
2020-Sep-15 20:18 UTC
[R] How to represent the effect of one covariate on regression results?
> My question is how do I present/plot the effect of covariate "TD" in > the example it has "P" equal to 3.32228e-12 for all IDs in the > resulting file so that I show how much effect covariate "TD" has on > the analysis. Should I run another regression without covariate "TD"I'll take a second shot in the dark: There is R^2, and a number of generalizations. (The most common of which, is probably adjusted R^2). And there are various other goodness of fit tests. https://en.wikipedia.org/wiki/Goodness_of_fit https://en.wikipedia.org/wiki/Coefficient_of_determination You could fit two models (one with a particular variable included, and one without), and compare how the statistic changes. However, I'm probably going to get told off, for going off-topic. So, unless any further questions are specific to R programming, I don't think I'm going to contribute further. Also, I'd recommend you read some notes on statistical modelling, or consult an expert, or both. And I suspect there are additional considerations modelling genetic data.