similar to: anova or liklihood ratio test from biglm output

Displaying 20 results from an estimated 1000 matches similar to: "anova or liklihood ratio test from biglm output"

2010 Oct 12
merging and working with BIG data sets. Is sqldf the best way??
Hi everyone, I’m working with some very big datasets (each dataset has 11 million rows and 2 columns). My first step is to merge all my individual data sets together (I have about 20) I’m using the following command from sqldf data1 <- sqldf("select A.*, B.* from A inner join B using(ID)") But it’s taking A VERY VERY LONG TIME to merge just 2 of the datasets
2010 Oct 04
can't find and install reshape2??
Hi everyone, I’m trying to install reshape2. But when I click on “install package” it’s not coming up!?!?! I’m getting reshape, but no reshape2? I’ve also tried download.packages(reshape2, destdir="c:\\") & download.packages(Reshape2, destdir="c:\\")…but no luck!!! Does anyone have any ideas what could be going on? Chris Howden Founding Partner Tricky
2010 Oct 20
is there a way to update both packages if they occur in 2 libraries?
Hi everyone, I’ve recently added a private library as a way to manage my R libraries. And I did this by simply copying my old library to a new folder and then linking this to R by setting my R_LIBS environmental variable in .Renviron. However I have run into a problem. When I update my packages it is not updating those that are current in the base R library. This means I can’t load
2010 Sep 01
how to replace NA with a specific score that is dependant on another indicator variable
Hi everyone, I’m looking for a clever bit of code to replace NA’s with a specific score depending on an indicator variable. I can see how to do it using lots of if statements but I’m sure there most be a neater, better way of doing it. Any ideas at all will be much appreciated, I’m dreading coding up all those if statements!!!!! My problem is as follows: I have a data set with
2011 Jul 25
biglm() and NeweyWest()
Dear all, I am working on a large dataset and need to use biglm() to perform OLS regressions. I have detected significant ARCH effects which I try to account for using the Newey-West correction. So far, I have worked with NeweyWest() in the sandwich package. NeweyWest() however seems to be unable to handle an object of class "biglm". Looking into the code, I figured out that
2010 Jun 15
help biglm.big.matrix; problem with weights
Hello colleagues, I have tried to use the package biglm. I want to specify a multivariate regression with a weight. I have imported a large dataset with the library(bigmemory). I load the library (biglm) and specified a regression with a weight. But I get everytime a error message like ?object not found? or ?`weights' must be a formula? or "error in eval(expr, envir, enclos)". I
2007 Feb 12
predict on biglm class
Hi Everyone, I often use the 'safe prediction' feature available through glm(). Now, I'm at a situation where I must use biglm:::bigglm. ## begin example library(splines) library(biglm) ff <- log(Volume)~ns(log(Girth), df=5) fit.glm <- glm(ff, data=trees) fit.biglm <- bigglm(ff, data=trees) predict(fit.glm, newdata=data.frame(Girth=2:5)) ## -1.3161465 -0.2975659
2010 Oct 31
biglm: how it handles large data set?
I am trying to figure out why 'biglm' can handle large data set... According to the R document - "biglm creates a linear model object that uses only p^2 memory for p variables. It can be updated with more data using update. This allows linear regression on data sets larger than memory." After reading the source code below? I still could not figure out how 'update'
2009 Feb 19
Questions about biglm
Hello folks, I am very excited to have discovered R and have been exploring its capabilities. R's regression models are of great interest to me as my company is in the business of running thousands of linear regressions on large datasets. I am using biglm to run linear regressions on datasets that are as large as several GB's. I have been pleasantly surprised that biglm runs the
2010 Jun 16
biglm.big.matrix: Problem with weighting
Hello colleagues, I have tried to use the package bigmemory, biganalytics and biglm. I want to specify a multivariate regression with a weight. I have imported a large dataset with the library(bigmemory). I load the library (biglm) and specified a regression with a weight. But I get everytime an error message like "object not found" or "`weights' must be a
2012 Jan 03
Biglm source code alternatives (E.g. Call to Fortran)
Hi everyone, I have been looking at the Bigglm (Basically does Generalised Linear Models for big data under the Biglm package) command and I have done some profiling on this code and found that to do a GLM on a 100mb file (9 million rows by 5 columns matrix(most of the numbers were either a 0,1 or 2 randomly generated)) it took about 2 minutes on a linux machine with 8gb of RAM and 4 cores.
2009 Apr 27
VIF's in R using BIGLM
Dear R-help This is a follow-up to my previous post here: I am working on developing an open-source automated system for running batch-regressions on very large datasets. In my previous post, I posed the question of obtaining VIF's from the output of
2009 Feb 25
leaps and biglm
New versions of leaps and biglm are percolating through CRAN. The new version of biglm fixes a bug in sandwich standard errors with weights, and adds predict(), deviance() and AIC() methods [based on code from Christophe Dutang]. The new version of leaps adds a regsubsets() method for biglm objects, so that the subset selection algorithms can be run efficiently on large data sets. -thomas
2009 Feb 25
leaps and biglm
New versions of leaps and biglm are percolating through CRAN. The new version of biglm fixes a bug in sandwich standard errors with weights, and adds predict(), deviance() and AIC() methods [based on code from Christophe Dutang]. The new version of leaps adds a regsubsets() method for biglm objects, so that the subset selection algorithms can be run efficiently on large data sets. -thomas
2007 Oct 23
Residuals from biglm package
Hi all, first of all, I'm not an expert on R, I'm still learning, so sorry if this is a stupid question... I have a large dataset that is to big for my computer memory, and I found quite useful the package biglm. Now everything is working perfectly. But if I want the residuals, how I can do it? Let's say that we are running the example: > data(trees)>
2009 Apr 20
R-Squared with biglm?
I've been working with a rather large data set (~10M rows), and while biglm works beautifully for generating coefficients, it does not report an r-squared. It does report RSS. Any idea on how one could coax an R-squared out of biglm? Thanks in advance for any help with this! Bryan Lim Lecturer Department of Finance University of Melbourne [[alternative HTML version deleted]]
2009 Mar 20
Using predict on a biglm object returns NA
Hi R experts, I used biglm to construct a model (which has categorical variables). When I run predict on the model output on a new data (for testing) or on the same data, I get only NA's. I'm able to run predict with some other models constructed with biglm. One reason I suspect is that the model itself has a few undefined terms (NA's). I'm wondering if there's any way to
2007 Jan 22
Example function for bigglm (biglm) data input from file
This is to submit a commented example function for use in the data argument to the bigglm(biglm) function, when you want to read the data from a file (instead of a URL), or rescale or modify the data before fitting the model. In the hope that this may be of help to someone out there. <- function (filename, chunksize, ...) { conn<-NULL; function (reset=FALSE) { if
2011 Nov 15
getting R2 (goodness of fit) result after using biglm()
Hello. I had been struggling with running linear regression using lm() primarily because my data has a few categorical variables with at least a thousand levels. I tried the biglm() function and it worked. My problem now is that i don't know how to get the R2 results. Could someone help? Thanks, sean
2009 Feb 21
variable/model selction (step/stepAIC) for biglm ?
Hello dear R mailing list members. I have recently became curious of the possibility applying model selection algorithms (even as simple as AIC) to regressions of large datasets. I searched as best as I could, but couldn't find any reference or wrapper for using step or stepAIC to packages such as biglm. Any ideas or directions of how to implement such a concept ? Best, Tal --