Hello 1* I need to use logistic regression. But my data file is very huge( appx. 4 million line). R doesn't handle such a file. What can I do ? ------------------------ 2* So, I thought whether I could perform sta. analyses on summarised data (count of yes/no values) of the huge file. Normally, summarised data file short and R could handle it. Then I used this command. > lo <-glm(hey.count~as.factor(jeo)+as.factor(eg)+as.factor(kon)+ as.factor(yol)+ as.factor(aks)+as.factor(fay),family=poisson,data=dt2) as you see I used count value of yes/no data as independent data. Is it good idea to use this method instead of binomial logistic regression ? what do you suggest more ? thanks in advance -- Ahmet Temiz Geological Engineer General Directorate of Disaster Affairs TURKEY ______________________________________ Scanned and protected by Inflex pldaniels.com/inflex ______________________________________ The views and opinions expressed in this e-mail message are the sender's own and do not necessarily represent the views and the opinions of Earthquake Research Dept. of General Directorate of Disaster Affairs. Bu e-postadaki fikir ve gorusler gonderenin sahsina ait olup, yasal olarak T.C. B.I.B. Afet Isleri Gn.Mud. Deprem Arastirma Dairesi'ni baglayici nitelikte degildir.
On Fri, 14 Mar 2003, orkun wrote:> 1* > I need to use logistic regression. But > my data file is very huge( appx. 4 million line). > R doesn't handle such a file. > What can I do ?R does handle such files (which are tiny by data-mining standards): you just need to put 1GB or 2GB of memory in your computer.> ------------------------ > 2* > So, I thought whether I could perform sta. analyses on summarised > data (count of yes/no values) of the huge file. Normally, summarised > data file short and R could handle it. > Then I used this command. > > lo <-glm(hey.count~as.factor(jeo)+as.factor(eg)+as.factor(kon)+ > as.factor(yol)+ as.factor(aks)+as.factor(fay),family=poisson,data=dt2) > > as you see I used count value of yes/no data as independent data. > > Is it good idea to use this method instead of binomial logistic regression ?No, but it would be a good idea to use binomial logistic regression (and not Bernoulli logistic regression). That is, to collapse the data to success/failure counts over the cross-classification of the factors, and use family=binomial. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, stats.ox.ac.uk/~ripley University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On Fri, 14 Mar 2003 16:48:37 +0200 orkun <temiz at deprem.gov.tr> wrote:> Hello > 1* > I need to use logistic regression. But > my data file is very huge( appx. 4 million line). > R doesn't handle such a file. > What can I do ? > ------------------------ > 2* > So, I thought whether I could perform sta. analyses on summarised > data (count of yes/no values) of the huge file. Normally, summarised > data file short and R could handle it. > Then I used this command. > > lo <-glm(hey.count~as.factor(jeo)+as.factor(eg)+as.factor(kon)+ > as.factor(yol)+ as.factor(aks)+as.factor(fay),family=poisson,data=dt2) > > as you see I used count value of yes/no data as independent data. > > Is it good idea to use this method instead of binomial logistic regression ? > > what do you suggest more ? > > thanks in advance > > -- > Ahmet Temiz > Geological Engineer > General Directorate > of Disaster Affairs > TURKEYIf you have no more than one continuous variable you can pre-process (outside of R) to collapse the data into frequency counts. I did not check to see if glm handles frequency case weights. The lrm function in the Design package (hesweb1.med.virginia.edu/biostat/s/Design.html) does. -- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine hesweb1.med.virginia.edu/biostat
In a message dated 3/14/03 10:10:15 AM Eastern Standard Time, temiz@deprem.gov.tr writes:> 1* > I need to use logistic regression. But > my data file is very huge( appx. 4 million line). > R doesn''t handle such a file. > What can I do ? >It depends on the strength of your computing system as well: consider adding more RAM, depending on how many columns (and what kind) you have to read in for 4 million records. You may try one of the DBMS like MySQL and PostgreSQL. They have a pretty good R interface. I am currently using MySQL with RMySQL and RODBC. I started from scratch and learn about it by reading online stuff (thanks to some nice people). It is esay to setup and administer, though with some limitations, which may not matter if you are using this only as a data storage and retrieval system for R. Also read about DBMS in the 1st issue of R-news and the Data Import section in the manual. Try to look for MySQL and DBMS postings on the R mailing list. It has some good info. Good luck! [[alternate HTML version deleted]]