Ebert,Timothy Aaron
2022-Jun-15 13:46 UTC
[R] Model Comparision for case control studies in R
Disease status is missing from the sample data. Are age, disease, smoking, and/or hypertension correlated in any way or are they independent (correlation=0)? Are the correlations large enough to adversely influence your model? Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of anteneh asmare Sent: Wednesday, June 15, 2022 7:29 AM To: r-help at r-project.org Subject: [R] Model Comparision for case control studies in R [External Email] y<-c(0,1,1,0,0,1,0,0,1,1,1,0,1,1,1,0,0,0,0,1) age<-c(45,23,56,67,23,23,28,56,45,47,36,37,33,35,38,39,43,28,39,41) smoking<-c(0,1,1,1,0,0,0,0,0,1,1,0,0,1,0,1,1,1,0,1) hypertension<-c(1,1,0,1,0,1,0,1,1,0,1,1,1,1,1,1,0,0,1,0) data<-data.frame(y,age,smoking,hypertension) data model<-glm(y~age+factor(smoking)+factor(hypertension), data, family = binomial(link = "logit"),na.action = na.omit) summary(model) from above sample data I want to study a case-control study on male individuals with my response variable y, disease status (1=Case, 0=Control) with covariates age, smoking status(1=Yes, 0=No) and hypertension, hypertensive (1=Yes, 0=No). I want to fit the model to predict the disease status using at least two different methods. And to make model comparisons. I think logistic regression will be the best fit for this case control study. Do we have other options in addition to logistic regression? My objective is to fit the model to predict the disease status using at least two different methods. Kind regards, Hana ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=l7afPQ_gGAoV2EsNoYSYul0qAISEiXLmTmu0IQ03nZO4rcAi9xHZGsWwwig4oYOB&s=ztyDthknydhlcM49F33Gz6xRl6G7U9s8aIhB1VN-EKY&ePLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=l7afPQ_gGAoV2EsNoYSYul0qAISEiXLmTmu0IQ03nZO4rcAi9xHZGsWwwig4oYOB&s=tcsGkhvtVvoVvb1Ehah-vLRC6an40rJXQXqqfX2f0gI&eand provide commented, minimal, self-contained, reproducible code.
Dear Tim, Thanks. the first vector y<-c(0,1,1,0,0,1,0,0,1,1,1,0,1,1,1,0,0,0,0,1) is the disease status y(1=Case,0=Control). The covariate age, smoking status and hypertension are independent(uncorrelated). The logistic regression (unconditional) will used. But I need to compare other models with logistic regression instead of fitting it directly to logistic regression. There is no matching on the data to use conditional logistics regression. Best, Hana On 6/15/22, Ebert,Timothy Aaron <tebert at ufl.edu> wrote:> Disease status is missing from the sample data. > Are age, disease, smoking, and/or hypertension correlated in any way or are > they independent (correlation=0)? > Are the correlations large enough to adversely influence your model? > Tim > > -----Original Message----- > From: R-help <r-help-bounces at r-project.org> On Behalf Of anteneh asmare > Sent: Wednesday, June 15, 2022 7:29 AM > To: r-help at r-project.org > Subject: [R] Model Comparision for case control studies in R > > [External Email] > > y<-c(0,1,1,0,0,1,0,0,1,1,1,0,1,1,1,0,0,0,0,1) > age<-c(45,23,56,67,23,23,28,56,45,47,36,37,33,35,38,39,43,28,39,41) > smoking<-c(0,1,1,1,0,0,0,0,0,1,1,0,0,1,0,1,1,1,0,1) > hypertension<-c(1,1,0,1,0,1,0,1,1,0,1,1,1,1,1,1,0,0,1,0) > data<-data.frame(y,age,smoking,hypertension) > data > model<-glm(y~age+factor(smoking)+factor(hypertension), data, family > binomial(link = "logit"),na.action = na.omit) > summary(model) > from above sample data I want to study a case-control study on male > individuals with my response variable y, disease status (1=Case, > 0=Control) with covariates age, smoking status(1=Yes, 0=No) and > hypertension, hypertensive (1=Yes, 0=No). I want to fit the model to predict > the disease status using at least two different methods. And to make model > comparisons. I think logistic regression will be the best fit for this case > control study. Do we have other options in addition to logistic regression? > My objective is to fit the model to predict the disease status using at > least two different methods. > Kind regards, > Hana > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=l7afPQ_gGAoV2EsNoYSYul0qAISEiXLmTmu0IQ03nZO4rcAi9xHZGsWwwig4oYOB&s=ztyDthknydhlcM49F33Gz6xRl6G7U9s8aIhB1VN-EKY&e> PLEASE do read the posting guide > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=l7afPQ_gGAoV2EsNoYSYul0qAISEiXLmTmu0IQ03nZO4rcAi9xHZGsWwwig4oYOB&s=tcsGkhvtVvoVvb1Ehah-vLRC6an40rJXQXqqfX2f0gI&e> and provide commented, minimal, self-contained, reproducible code. >