Displaying 20 results from an estimated 1000 matches similar to: "big data"
2009 Feb 19
1
Questions about biglm
Hello folks,
I am very excited to have discovered R and have been exploring its
capabilities. R's regression models are of great interest to me as my
company is in the business of running thousands of linear regressions
on large datasets.
I am using biglm to run linear regressions on datasets that are as
large as several GB's. I have been pleasantly surprised that biglm
runs the
2009 Mar 17
1
exporting s3 and s4 methods
If a package defined an S3 generic and an S4 generic for the same function (so as to add methods for S4 classes to the existing code), how do I set up the namespace to have them exported?
With
import(stats)
exportMethods(bigglm)
importClassesFrom(DBI)
useDynLib(biglm)
export(biglm)
export(bigglm)
in NAMESPACE, the S3 generic is not exported.
> methods("bigglm")
[1] bigglm.RODBC*
2009 Jul 03
2
bigglm() results different from glm()
Hi Sir,
Thanks for making package available to us. I am facing few problems if
you can give some hints:
Problem-1:
The model summary and residual deviance matched (in the mail below) but
I didn't understand why AIC is still different.
> AIC(m1)
[1] 532965
> AIC(m1big_longer)
[1] 101442.9
Problem-2:
chunksize argument is there in bigglm but not in biglm, consequently,
2007 Aug 16
4
Linear models over large datasets
I'd like to fit linear models on very large datasets. My data frames
are about 2000000 rows x 200 columns of doubles and I am using an 64
bit build of R. I've googled about this extensively and went over the
"R Data Import/Export" guide. My primary issue is although my data
represented in ascii form is 4Gb in size (therefore much smaller
considered in binary), R consumes about
2009 Mar 17
2
bigglm() results different from glm()
Dear all,
I am using the bigglm package to fit a few GLM's to a large dataset (3
million rows, 6 columns). While trying to fit a Poisson GLM I noticed
that the coefficient estimates were very different from what I obtained
when estimating the model on a smaller dataset using glm(), I wrote a
very basic toy example to compare the results of bigglm() against a
glm() call. Consider the
2011 Jan 10
1
debug biglm response error on bigglm model
G'morning
What does the error message "Error in x %*% coef(object) : non-
conformable arguments" indicate when calculating the response values
for
newdata with a model from bigglm (in package biglm), and how can I
debug it? I am attempting to do Monte Carlo simulations, which may
explain the loop in the code that follows. After the code I
have included the output, which shows that
2009 Apr 03
1
bigglm "update" with ff
Hi, since bigglm doesn't have update, I was wondering how to achieve
something like (similar to the example in ff package manual using biglm):
first <- TRUE
ffrowapply ({
if (first) {
first <- FALSE
fit <- bigglm(eqn, as.data.frame(bigdata[i1:i2,,drop=FALSE]), chunksize =
10000, family = binomial())
} else {
fit <- update(fit,
2011 Feb 08
1
Fitting a model with an offset in bigglm
Dear all,
I have a large data set and would like to fit a logistic regression
model using the bigglm function. I need to include an offset in the
model but when I do this the bigglm function seems to ignore it.
For example, running the two models below produces the same model and
the offset is ignored
bigglm(y~x,offset=z,data=Test,family=binomial(link = "logit"))
2007 Jan 22
1
Example function for bigglm (biglm) data input from file
This is to submit a commented example function for use in the data
argument to the bigglm(biglm) function, when you want to read the data
from a file (instead of a URL), or rescale or modify the data before
fitting the model. In the hope that this may be of help to someone out
there.
make.data <- function (filename, chunksize, ...) {
conn<-NULL;
function (reset=FALSE) {
if
2012 Mar 30
3
ff usage for glm
Greetings useRs,
Can anyone provide an example how to use ff to feed a very large data frame to glm?
The data.frame cannot be loaded in R using conventional read.csv as it is too big.
glm(...,data=ff.file) ??
Thank you
Stephen B
2010 Jul 02
2
unable to get bigglm working, ATTN: Thomas Lumley
I am using an example posted in this help forum to work with a file. the head
of the file looks like:
988887 2007-03-05 2007-06-01 90 3 5.450 205500.00 999.00 999.000 0.000 0 0
988887 2007-03-06 2007-06-01 90 3 5.450 205500.00 999.00 999.000 0.000 1 0
988887 2007-03-07 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100 2 0
988887 2007-03-08 2007-06-01 90 3 5.450 205500.00 999.00 999.000 -0.100
2008 Aug 09
1
Reading large datasets and fitting logistic models in R
Hi R-experts,
Does anyone have experience using R for handling large scale data (millions
of rows, hundreds or thousands of features)?
What is the largest size of data that anyone has used with glm?
Also, is there a library to read data in sparse data format (like SVMlight
format)?
Thanks
Pradheep
[[alternative HTML version deleted]]
2007 Jan 21
1
Can we do GLM on 2GB data set with R?
We are wanting to use R instead of/in addition to our existing stats
package because of it's huge assortment of stat functions. But, we
routinely need to fit GLM models to files that are approximately 2-4GB
(as SQL tables, un-indexed, w/tinyint-sized fields except for the
response & weight variables). Is this feasible, does anybody know,
given sufficient hardware, using R? It appears to
2006 Aug 21
5
lean and mean lm/glm?
Hi All: I'm new to R and have a few questions about getting R to run efficiently with large datasets.
I'm running R on Windows XP with 1Gb ram (so about 600mb-700mb after the usual windows overhead). I have a dataset that has 4 million observations and about 20 variables. I want to run probit regressions on this data, but can't do this with more than about 500,000 observations before
2003 Nov 18
5
Histogram
Hi,
I have what should be a simple question. I would like to generate a
histogram of
x <- c("a","b","c","b","c","c")
where the first bar to be labeled 'c' with height 3, second bar to be
labeled 'b' with height 2 and third bar to be labeled 'a' with height 1.
This should be an easy task in R but I think I
2007 Jun 29
1
Comparison: glm() vs. bigglm()
Hi,
Until now, I thought that the results of glm() and bigglm() would
coincide. Probably a naive assumption?
Anyways, I've been using bigglm() on some datasets I have available.
One of the sets has >15M observations.
I have 3 continuous predictors (A, B, C) and a binary outcome (Y).
And tried the following:
m1 <- bigglm(Y~A+B+C, family=binomial(), data=dataset1, chunksize=10e6)
2004 Apr 30
3
searching a vector
Hi,
I have a integer vector x that contains a unique set of numbers:
x <- c(1,2,4,6,8,10,12)
Is there a simple test I can use to determine if an integer such as 6 is
contained in x ?
Thanks in advance for any help,
Arend
2003 Dec 02
8
Vector Assignments
Hi,
I have simple R question.
I have a vector x that contains real numbers. I would like to create
another vector col that is the same length of x such that:
if x[i] < 250 then col[i] = "red"
else if x[i] < 500 then col[i] = "blue"
else if x[i] < 750 then col[i] = "green"
else col[i] = "black" for all i
I am convinced that there is probably a
2008 Oct 15
1
Glusterfs performance with large directories
We at Wiseguys are looking into GlusterFS to run our Internet Archive.
The archive stores webpages collected by our spiders.
The test setup consists of three data machines, each exporting a volume
of about 3.7TB and one nameserver machine.
File layout is such that each host has it's own directory, for example the
GlusterFS website would be located in:
2003 Nov 17
1
\preformatted and $
Hi,
I have been developing a package in R and have been working on
documentation. I have a \details function that contains the following:
\details{
some text
\preformatted{
[my-section]
user = apv
host = 127.0.0.1
}
}
When I run R CMD check I get an error while checking the manual. If I
remove:
\preformatted{
[my-section]
user = apv
host = 127.0.0.1
}
and replace it with
[my-section]