thr3ads.net - R help - [R] logistic regression [Mar 2003]

If this information is useful, please help other people find it:
Share via:

orkun

2003-Mar-14 14:48 UTC

[R] logistic regression

Hello
1*
I need to use logistic regression. But
my data file is very huge( appx. 4 million line).
R doesn't handle such a file.
What can I do ?
------------------------
2*
So, I thought whether I could perform sta. analyses on summarised
data (count of yes/no values) of the huge file. Normally, summarised
data file short and R could handle it.
Then I used this command.
 > lo <-glm(hey.count~as.factor(jeo)+as.factor(eg)+as.factor(kon)+
as.factor(yol)+ as.factor(aks)+as.factor(fay),family=poisson,data=dt2)

as you see I used count value of yes/no data as independent data.

Is it good idea to use this method instead of binomial logistic regression ?

what do you suggest more ?

thanks in advance

-- 
Ahmet Temiz
Geological Engineer
General Directorate
of Disaster Affairs
TURKEY



______________________________________
Scanned and protected by Inflex
pldaniels.com/inflex

______________________________________
The views and opinions expressed in this e-mail message are the sender's own
and do not necessarily represent the views and the opinions of Earthquake
Research Dept.
of General Directorate of Disaster Affairs.

Bu e-postadaki fikir ve gorusler gonderenin sahsina ait olup, yasal olarak T.C.
B.I.B. Afet Isleri Gn.Mud. Deprem Arastirma Dairesi'ni baglayici nitelikte
degildir.

ripley@stats.ox.ac.uk

2003-Mar-14 15:11 UTC

head link

[R] logistic regression

On Fri, 14 Mar 2003, orkun wrote:
> 1*
> I need to use logistic regression. But
> my data file is very huge( appx. 4 million line).
> R doesn't handle such a file.
> What can I do ?
R does handle such files (which are tiny by data-mining standards): you
just need to put 1GB or 2GB of memory in your computer.
> ------------------------
> 2*
> So, I thought whether I could perform sta. analyses on summarised
> data (count of yes/no values) of the huge file. Normally, summarised
> data file short and R could handle it.
> Then I used this command.
>  > lo <-glm(hey.count~as.factor(jeo)+as.factor(eg)+as.factor(kon)+
> as.factor(yol)+ as.factor(aks)+as.factor(fay),family=poisson,data=dt2)
> 
> as you see I used count value of yes/no data as independent data.
> 
> Is it good idea to use this method instead of binomial logistic regression
?
No, but it would be a good idea to use binomial logistic regression (and 
not Bernoulli logistic regression).  That is, to collapse the data to 
success/failure counts over the cross-classification of the factors,
and use family=binomial.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  stats.ox.ac.uk/~ripley
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Frank E Harrell Jr

2003-Mar-14 16:42 UTC

head link

[R] logistic regression

On Fri, 14 Mar 2003 16:48:37 +0200
orkun <temiz at deprem.gov.tr> wrote:
> Hello
> 1*
> I need to use logistic regression. But
> my data file is very huge( appx. 4 million line).
> R doesn't handle such a file.
> What can I do ?
> ------------------------
> 2*
> So, I thought whether I could perform sta. analyses on summarised
> data (count of yes/no values) of the huge file. Normally, summarised
> data file short and R could handle it.
> Then I used this command.
>  > lo <-glm(hey.count~as.factor(jeo)+as.factor(eg)+as.factor(kon)+
> as.factor(yol)+ as.factor(aks)+as.factor(fay),family=poisson,data=dt2)
> 
> as you see I used count value of yes/no data as independent data.
> 
> Is it good idea to use this method instead of binomial logistic regression
?
> 
> what do you suggest more ?
> 
> thanks in advance
> 
> -- 
> Ahmet Temiz
> Geological Engineer
> General Directorate
> of Disaster Affairs
> TURKEY
If you have no more than one continuous variable you can pre-process (outside of
R) to collapse the data into frequency counts.  I did not check to see if glm
handles frequency case weights.  The lrm function in the Design package
(hesweb1.med.virginia.edu/biostat/s/Design.html) does.
-- 
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  hesweb1.med.virginia.edu/biostat

TyagiAnupam@aol.com

2003-Mar-14 18:03 UTC

head link

[R] logistic regression

In a message dated 3/14/03 10:10:15 AM Eastern Standard Time, 
temiz@deprem.gov.tr writes:

> 1*
> I need to use logistic regression. But
> my data file is very huge( appx. 4 million line).
> R doesn''t handle such a file.
> What can I do ?
> 
It depends on the strength of your computing system as well: consider adding 
more RAM, depending on how many columns (and what kind) you have to read in 
for 4 million records. You may try one of the DBMS like MySQL and PostgreSQL. 
 They have a pretty good R interface. I am currently using MySQL with RMySQL 
and RODBC. I started from scratch and learn about it by reading online stuff 
(thanks to some nice people). It is esay to setup and administer, though with 
some limitations, which may not matter if you are using this only as a data 
storage and retrieval system for R. Also read about DBMS in the 1st issue of 
R-news and the Data Import section in the manual.  Try to look for MySQL and 
DBMS postings on the R mailing list. It has some good info. Good luck!

	[[alternate HTML version deleted]]

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Mar 2003 - logistic regression

[R] logistic regression

[R] logistic regression

[R] logistic regression

[R] logistic regression

Seemingly Similar Threads