Dear R users, I have a database called Base.csv (attached to this email) which contains 13 columns and 8257 rows and whose the first 8 columns are dummy variables which take 1 or 0. The problem is when I wrote the following instructions to do a logistic regression , R runs for hours and hours without giving an output: Base=read.csv("C:\\Users\\HP\\Desktop\\New\\Base.csv",header=FALSE,sep=";") fit_1=glm(Base[,2]~Base[,1]+Base[,10]+Base[,11]+Base[,12]+Base[,13],family=binomial(link="logit")) Apparently, there is not enough memory to have the requested output. Is there any other function for logistic regression that handle large data and return output in reasonable time. Many thanks Kind regards George
That?s not a large data set. Something else besides memory limits is going on. You should post output of summary(Base). ? David Sent from my iPhone> On Nov 11, 2022, at 11:29 PM, George Brida <georgebrida94 at gmail.com> wrote: > > ?Dear R users, > > I have a database called Base.csv (attached to this email) which > contains 13 columns and 8257 rows and whose the first 8 columns are dummy > variables which take 1 or 0. The problem is when I wrote the following > instructions to do a logistic regression , R runs for hours and hours > without giving an output: > > Base=read.csv("C:\\Users\\HP\\Desktop\\New\\Base.csv",header=FALSE,sep=";") > fit_1=glm(Base[,2]~Base[,1]+Base[,10]+Base[,11]+Base[,12]+Base[,13],family=binomial(link="logit")) > > Apparently, there is not enough memory to have the requested output. Is > there any other function for logistic regression that handle large data and > return output in reasonable time. > > Many thanks > > Kind regards > > George > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi George, I did not get an attachment. My first step would be to try simplifying things. Do all of these work? fit_1=glm(Base[,2]~Base[,1],family=binomial(link="logit")) fit_1=glm(Base[,2]~Base[,10],family=binomial(link="logit")) fit_1=glm(Base[,2]~Base[,11],family=binomial(link="logit")) fit_1=glm(Base[,2]~Base[,12],family=binomial(link="logit")) fit_1=glm(Base[,2]~Base[,13],family=binomial(link="logit")) This is not a large dataset. That said, if your computer is nearly out of memory, even a small dataset might be too much. It might have plenty of physical memory, but also lots of (open files, cookies, applications, other stuff) that eat memory. Regards, Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of George Brida Sent: Friday, November 11, 2022 4:17 PM To: r-help at r-project.org Subject: [R] Logistic regression for large data [External Email] Dear R users, I have a database called Base.csv (attached to this email) which contains 13 columns and 8257 rows and whose the first 8 columns are dummy variables which take 1 or 0. The problem is when I wrote the following instructions to do a logistic regression , R runs for hours and hours without giving an output: Base=read.csv("C:\\Users\\HP\\Desktop\\New\\Base.csv",header=FALSE,sep=";") fit_1=glm(Base[,2]~Base[,1]+Base[,10]+Base[,11]+Base[,12]+Base[,13],family=binomial(link="logit")) Apparently, there is not enough memory to have the requested output. Is there any other function for logistic regression that handle large data and return output in reasonable time. Many thanks Kind regards George ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu%7Cb0af80b8620648fcc1ab08dac47fb19e%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638038350110240752%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7C&sdata=oxeFwGCpH%2B9Ha%2BDFaWRygEcvOJ2O6AngSKNhMwE%2FczI%3D&reserved=0 PLEASE do read the posting guide https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7Cb0af80b8620648fcc1ab08dac47fb19e%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638038350110240752%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7C&sdata=FtLW709kbnkMLzylkRRtR1Y%2Fw5oehodb0dmS8DqwGig%3D&reserved=0 and provide commented, minimal, self-contained, reproducible code.
summary(Base) would show if one of columns of Base was read as character data instead of the expected numeric. That could cause an explosion in the number of dummy variables, hence a huge design matrix. -Bill On Fri, Nov 11, 2022 at 11:30 PM George Brida <georgebrida94 at gmail.com> wrote:> Dear R users, > > I have a database called Base.csv (attached to this email) which > contains 13 columns and 8257 rows and whose the first 8 columns are dummy > variables which take 1 or 0. The problem is when I wrote the following > instructions to do a logistic regression , R runs for hours and hours > without giving an output: > > Base=read.csv("C:\\Users\\HP\\Desktop\\New\\Base.csv",header=FALSE,sep=";") > > fit_1=glm(Base[,2]~Base[,1]+Base[,10]+Base[,11]+Base[,12]+Base[,13],family=binomial(link="logit")) > > Apparently, there is not enough memory to have the requested output. Is > there any other function for logistic regression that handle large data and > return output in reasonable time. > > Many thanks > > Kind regards > > George > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]