thr3ads.net - similar to: "bigglm() results different from glm()"

Displaying 20 results from an estimated 500 matches similar to: "bigglm() results different from glm()"

2009 Mar 17

bigglm() results different from glm()

Dear all, I am using the bigglm package to fit a few GLM's to a large dataset (3 million rows, 6 columns). While trying to fit a Poisson GLM I noticed that the coefficient estimates were very different from what I obtained when estimating the model on a smaller dataset using glm(), I wrote a very basic toy example to compare the results of bigglm() against a glm() call. Consider the

exporting s3 and s4 methods

2009 Mar 17

exporting s3 and s4 methods

If a package defined an S3 generic and an S4 generic for the same function (so as to add methods for S4 classes to the existing code), how do I set up the namespace to have them exported? With import(stats) exportMethods(bigglm) importClassesFrom(DBI) useDynLib(biglm) export(biglm) export(bigglm) in NAMESPACE, the S3 generic is not exported. > methods("bigglm") [1] bigglm.RODBC*

Fitting a model with an offset in bigglm

2011 Feb 08

Fitting a model with an offset in bigglm

Dear all, I have a large data set and would like to fit a logistic regression model using the bigglm function. I need to include an offset in the model but when I do this the bigglm function seems to ignore it. For example, running the two models below produces the same model and the offset is ignored bigglm(y~x,offset=z,data=Test,family=binomial(link = "logit"))

bigglm "update" with ff

2009 Apr 03

bigglm "update" with ff

Hi, since bigglm doesn't have update, I was wondering how to achieve something like (similar to the example in ff package manual using biglm): first <- TRUE ffrowapply ({ if (first) { first <- FALSE fit <- bigglm(eqn, as.data.frame(bigdata[i1:i2,,drop=FALSE]), chunksize = 10000, family = binomial()) } else { fit <- update(fit,

Example function for bigglm (biglm) data input from file

2007 Jan 22

Example function for bigglm (biglm) data input from file

This is to submit a commented example function for use in the data argument to the bigglm(biglm) function, when you want to read the data from a file (instead of a URL), or rescale or modify the data before fitting the model. In the hope that this may be of help to someone out there. make.data <- function (filename, chunksize, ...) { conn<-NULL; function (reset=FALSE) { if

debug biglm response error on bigglm model

2011 Jan 10

debug biglm response error on bigglm model

G'morning What does the error message "Error in x %*% coef(object) : non- conformable arguments" indicate when calculating the response values for newdata with a model from bigglm (in package biglm), and how can I debug it? I am attempting to do Monte Carlo simulations, which may explain the loop in the code that follows. After the code I have included the output, which shows that

unable to get bigglm working, ATTN: Thomas Lumley

2010 Jul 02

unable to get bigglm working, ATTN: Thomas Lumley

I am using an example posted in this help forum to work with a file. the head of the file looks like: 988887 2007-03-05 2007-06-01 90 3 5.450 205500.00 999.00 999.000 0.000 0 0 988887 2007-03-06 2007-06-01 90 3 5.450 205500.00 999.00 999.000 0.000 1 0 988887 2007-03-07 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 2 0 988887 2007-03-08 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100

Comparison: glm() vs. bigglm()

2007 Jun 29

Comparison: glm() vs. bigglm()

Hi, Until now, I thought that the results of glm() and bigglm() would coincide. Probably a naive assumption? Anyways, I've been using bigglm() on some datasets I have available. One of the sets has >15M observations. I have 3 continuous predictors (A, B, C) and a binary outcome (Y). And tried the following: m1 <- bigglm(Y~A+B+C, family=binomial(), data=dataset1, chunksize=10e6)

bigglm binomial negative fitted value

2012 May 31

bigglm binomial negative fitted value

Hi, there Since glm cannot handle factors very well. I try to use bigglm like this: logit_model <- bigglm(responser~var1+var2+var3, data, chunksize=1000, family=binomial(), weights=~trial, sandwich=FALSE) fitted <- predict(logit_model, data) only var2 is factor, var1 and var3 are numeric. I expect fitted should be a vector of value falls in (0,1) However, I get something like this:

bigglm Memory Issues

2010 Mar 02

bigglm Memory Issues

Hi all, I'm somewhat of a novice in terms of programming, so I thought I'd come here to seek some help with an issue I'm having. I'm trying to model a glm using bigglm, but in spite of my best efforts, I cannot get it to work! Here is the particular line of code that is giving me trouble: >mod = bigglm(Pres/wt ~ Xdes, data=dat, family=poisson(), weights = ~wt, maxit=100,

Regresión logística

2015 Jun 15

Regresión logística

Hola, estoy intentando hacer una regresión logística entre la primera columna de mi data.table (In.hospital_death) y otras dos (GSV y BUN) , me da el error de abajo, he intentado eliminar las filas con valor NA por si esta función no lo admite, pero sigue dando el mismo error. ¿Alguien sabe porqué ocurre? (probé previamente a usar la función glm pero obtenía out of memory) library(XLConnect)

Questions about biglm

2009 Feb 19

Questions about biglm

Hello folks, I am very excited to have discovered R and have been exploring its capabilities. R's regression models are of great interest to me as my company is in the business of running thousands of linear regressions on large datasets. I am using biglm to run linear regressions on datasets that are as large as several GB's. I have been pleasantly surprised that biglm runs the

ff usage for glm

2012 Mar 30

ff usage for glm

Greetings useRs, Can anyone provide an example how to use ff to feed a very large data frame to glm? The data.frame cannot be loaded in R using conventional read.csv as it is too big. glm(...,data=ff.file) ?? Thank you Stephen B

Reading large datasets and fitting logistic models in R

2008 Aug 09

Reading large datasets and fitting logistic models in R

Hi R-experts, Does anyone have experience using R for handling large scale data (millions of rows, hundreds or thousands of features)? What is the largest size of data that anyone has used with glm? Also, is there a library to read data in sparse data format (like SVMlight format)? Thanks Pradheep [[alternative HTML version deleted]]

Linear models over large datasets

2007 Aug 16

Linear models over large datasets

I'd like to fit linear models on very large datasets. My data frames are about 2000000 rows x 200 columns of doubles and I am using an 64 bit build of R. I've googled about this extensively and went over the "R Data Import/Export" guide. My primary issue is although my data represented in ascii form is 4Gb in size (therefore much smaller considered in binary), R consumes about

Can we do GLM on 2GB data set with R?

2007 Jan 21

Can we do GLM on 2GB data set with R?

We are wanting to use R instead of/in addition to our existing stats package because of it's huge assortment of stat functions. But, we routinely need to fit GLM models to files that are approximately 2-4GB (as SQL tables, un-indexed, w/tinyint-sized fields except for the response & weight variables). Is this feasible, does anybody know, given sufficient hardware, using R? It appears to

biglm and epicalc ROC curves

2010 Nov 10

biglm and epicalc ROC curves

Hello list, I am trying to avoid "Rifying" some of my SAS code to generate ROC plots, and the logistic.display() and lroc() functions in the epicalc package do what I want. However, I must generate my logistic model with bigglm because I have 1) limited hardware, 2) ~2.5 million rows, and 4 categorical and 2 continuous independent variables. When I attempt to invoke epicalc's

big data

2010 Sep 08

big data

Hello, I searched the internet but i didn't find the answer for the next problem: I want to do a glm on a csv file consisting of 25 columns and 4 mln rows. Not all the columns are relevant. My problem is to read the data into R. Manipulate the data and then do a glm. I've tried with: dd<-scan("myfile.csv",colClasses=classes) dat<-as.data.frame(dd) My question is: what

predict on biglm class

2007 Feb 12

predict on biglm class

Hi Everyone, I often use the 'safe prediction' feature available through glm(). Now, I'm at a situation where I must use biglm:::bigglm. ## begin example library(splines) library(biglm) ff <- log(Volume)~ns(log(Girth), df=5) fit.glm <- glm(ff, data=trees) fit.biglm <- bigglm(ff, data=trees) predict(fit.glm, newdata=data.frame(Girth=2:5)) ## -1.3161465 -0.2975659

lean and mean lm/glm?

2006 Aug 21

lean and mean lm/glm?

Hi All: I'm new to R and have a few questions about getting R to run efficiently with large datasets. I'm running R on Windows XP with 1Gb ram (so about 600mb-700mb after the usual windows overhead). I have a dataset that has 4 million observations and about 20 variables. I want to run probit regressions on this data, but can't do this with more than about 500,000 observations before

similar to: bigglm() results different from glm()