Grathwohl,Dominik,LAUSANNE,NRC/NT
2002-Nov-13 13:14 UTC
[R] building a formula for glm() with 30,000 independent vari ables
Dear Prof. Ripley, you mention the theory of perceptrons. Could you please point me to an introduction paper or book? Thanks in previous, Dominik> -----Original Message----- > From: ripley at stats.ox.ac.uk [mailto:ripley at stats.ox.ac.uk] > Sent: dimanche, 10. novembre 2002 18:55 > To: Ben Liblit > Cc: r-help at stat.math.ethz.ch > Subject: Re: [R] building a formula for glm() with 30,000 independent > variables > > > Well, the theory of perceptrons says you will find perfect > discrimination > with high probability even if there is no structure unless n > is well in > excess of 2p. So you do have 100,000 units? If so you have many > gigabytes of data and no R implementation I know of will do > this for you. > Also, the QR decomposition would take a very long time. > > You could call glm.fit directly if you could form the design matrix > somehow but I doubt if this would run in an acceptable time. > > On Sun, 10 Nov 2002, Ben Liblit wrote: > > > I would like to use R to perform a logistic regression with about > > 30,000 independent variables. That's right, thirty thousand. Most > > will be irrelevant: the intent is to use the regression to identify > > the few that actually matter. > > > > Among other things, this calls for giving glm() a colossal "y ~ ..." > > formula with thirty thousand summed terms on its right hand side. I > > build up the formula as a string and then call as.formula() > to convert > > it. Unfortunately, the conversion fails. The parser > reports that it > > has overflowed its stack. :-( > > > > Is there any way to pull this off in R? Can anyone suggest > > alternatives to glm() or to R itself that might be capable > of handling > > a problem of this size? Or am I insane to even be considering an > > analysis like this? > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272860 (secr) > Oxford OX1 3TG, UK Fax: +44 1865 272595 > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. > -.-.-.-.-.-.-.-.- > r-help mailing list -- Read > http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", > "help", or "[un]subscribe" > (in the "body", not the subject !) To: > r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. > _._._._._._._._._ >-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
ripley@stats.ox.ac.uk
2002-Nov-13 13:29 UTC
[R] building a formula for glm() with 30,000 independent vari ables
`Pattern Recognition and Neural Networks' p.119, amongst other places. See also MASS4 pp.198-9. On Wed, 13 Nov 2002, Grathwohl,Dominik,LAUSANNE,NRC/NT wrote:> Dear Prof. Ripley, > > you mention the theory of perceptrons. > Could you please point me to an introduction paper or book? > Thanks in previous, > > Dominik > > > -----Original Message----- > > From: ripley at stats.ox.ac.uk [mailto:ripley at stats.ox.ac.uk] > > Sent: dimanche, 10. novembre 2002 18:55 > > To: Ben Liblit > > Cc: r-help at stat.math.ethz.ch > > Subject: Re: [R] building a formula for glm() with 30,000 independent > > variables > > > > > > Well, the theory of perceptrons says you will find perfect > > discrimination > > with high probability even if there is no structure unless n > > is well in > > excess of 2p. So you do have 100,000 units? If so you have many > > gigabytes of data and no R implementation I know of will do > > this for you. > > Also, the QR decomposition would take a very long time. > > > > You could call glm.fit directly if you could form the design matrix > > somehow but I doubt if this would run in an acceptable time.-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Seemingly Similar Threads
- building a formula for glm() with 30,000 independent variables
- authority to join a domain
- [LLVMdev] EQTDDataStructures omits obvious, direct callee from DSCallGraph
- Adjusting two continuous variables by one continuous vari able
- Replace call stack with an equivalent on the heap?