Displaying 20 results from an estimated 20000 matches similar to: "how to read numeric vector as factors using read.table.ffdf"
2012 Sep 14
1
Any way to get read.table.ffdf() (in the ff package) to pass colClasses or comment.char parameters through to read.fwf() ?
Hi everyone, my apologies if I'm overlooking something obvious in the
documentation. I'm relatively inexperienced with the (awesome) ff package.
My goal is to use the read.table.ffdf() function to call the read.fwf()
function and pass through the colClasses and comment.char arguments. The
code below shows exactly what doesn't work for me.
If the colClasses and comment.char
2011 Jan 18
2
help with read.table.ffdf parameters
Hello fellow R users,
I am trying to read a 6.9 million row text file with 26 columns separated by
spaces into R using ff. When I specify a small number for first.rows,
next.rows and nrows it is read with no issue. However, when I try to specify
larger next.rows values and no nrows parameter to read the entire file, I
keep getting errors. Please see code below.
I am trying to this on a m1.large
2017 Jun 17
0
Prediction with two fixed-effects - large number of IDs
I have no direct experience with such horrific models, but your formula is a mess and Google suggests the biglm package with ffdf.
Specifically, you should convert your discrete variables to factors before you build the model, particularly since you want to use predict after the fact, for which you will need a new data set with the exact same levels in the factors.
Also, your use of I() is
2012 Jul 25
3
ff package: reading selected columns from csv
*Dear R users, Ive just started using the ff package.
There is a csv file (~4Gb) with 7 columns and 6e+7 rows. I want to read only
column from the file, skipping the first 100 rows.
Below Ive provided different outcomes, which will clarify my problem
*
> sessionInfo()
R version 2.14.2 (2012-02-29)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
...
attached base packages:
[1] tools
2012 Apr 15
0
Specifying splits - in read.csv.ffdf
Hi All,
I am currently trying to familiarize with "ff" package which allows me to store R objects in the hard drive. One of things that I notice when reading in a text file with "read.csv.ffdf" function - is that, in the R temp folder, 1000+ small files get created, each file having a name like "ffdf1bcd8b4aa0.ff". Each file is about 5KB in size.
My understanding
2013 Nov 18
1
Reading in csv data with ff package
I've spent some time trying to wrap my head around reading in large csv
files with the ff-package. I think I know how to do it, but am bumping
into some problems. I've tried to recreate the issues as best as I can
with a smaller example and maybe someone can help explain the problems.
The following code just creates a csv file with an integer column,
character column and logical column.
2010 Dec 24
1
How to specify ff object filepaths when reading a CSV file into a ff data frame.
Hi,
The read.csv.ffdf function in package ff will create the ff object
physical file in the default directories, I am trying to let the files
created in the paths users specify, I think the point is to make use
of the asffdf_args parameter,
I have a test CSV file named D:\rtemp\fftest.csv, the content of the
file is as following:
col1,col2,col3
1,"amber",2.4
2,"linda",4.5
2007 Dec 05
2
converting factors to dummy variables
Hi all -
I'm trying to find a way to create dummy variables from factors in a
regression. I have been using biglm along the lines of
ff <- log(Price) ~ factor(Colour):factor(Store) +
factor(DummyVar):factor(Colour):factor(Store)
lm1 <- biglm(ff, data=my.dataset)
but because there are lots of colours (>100) and lots of stores
(>250), I run it to memory problems. Now, not every
2017 Jun 17
3
Prediction with two fixed-effects - large number of IDs
Dear all,
I am running a panel regression with time and location fixed effects:
###
reg1 <- lm(lny ~ factor(id) + factor(year) + x1+ I(x1)^2 + x2+ I(x2)^2 ,
data=mydata, na.action="na.omit")
###
My goal is to use the estimation for prediction. However, I have 8,500 IDs,
which is resulting in very slow computation. Ideally, I would like to do
the following:
###
reg2 <-
2012 Mar 30
3
ff usage for glm
Greetings useRs,
Can anyone provide an example how to use ff to feed a very large data frame to glm?
The data.frame cannot be loaded in R using conventional read.csv as it is too big.
glm(...,data=ff.file) ??
Thank you
Stephen B
2010 Jun 11
1
ff package when reading .csv files
Hi
My aim is to read a large .csv file into R. I ran the following code and am
using R version 10.1 on Windows.
>library(ff)
> read.csv.ffdf(x=NULL,"file.csv",fileEncoding="",nrows=-1,first.rows=NULL,next.rows=NULL,levels=NULL,appendLevels=TRUE,FUN="read.table",transFUN=NULL,asffdf_args=list(),BATCHBYTES=getOption("ffbatchbytes"),VERBOSE=FALSE)
2013 Sep 30
4
read.table() with quoted integers
Hi!
It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider
quoted integers as an acceptable value for columns for which
colClasses="integer". But when colClasses is omitted, these columns are
read as integer anyway.
For example, let's consider a file named file.dat, containing:
"1"
"2"
> read.table("file.dat",
2009 Nov 06
0
New version of package ff
Dear R community,
ff Version 2.1.1 is available on CRAN. It now supports large data.frames,
csv import/export, packed atomic datatypes and bit filtering from package
'bit' on which it depends from now.
Some performance results in seconds from test data with 78 mio rows and 7 columns on a 3 GB notebook:
sequential reading 1 mio rows: csv = 32.7 ffdf = 1.3
sequential writing 1 mio
2009 Nov 06
0
New version of package ff
Dear R community,
ff Version 2.1.1 is available on CRAN. It now supports large data.frames,
csv import/export, packed atomic datatypes and bit filtering from package
'bit' on which it depends from now.
Some performance results in seconds from test data with 78 mio rows and 7 columns on a 3 GB notebook:
sequential reading 1 mio rows: csv = 32.7 ffdf = 1.3
sequential writing 1 mio
2011 Dec 22
1
ff object in lapply function
Hello. I'm using as.ffdf(mydataframe) to create ffdf objects inside an lapply
loop and returning that. I then use crbind to combine the lapply results
into allData.
So...simplified flow looks like this.
res <- lapply(1:nchunks, function(n)
{
blah blah with nth chunk
mydataframe <- data.frame(blah blah)
dat <-
2008 Aug 17
1
package building problem on windows
Hi,
I'm trying to compile the package biglm, but when I build it with R
CMD build biglm, it failed :
C:\LOCAL\c-dutang\code\R\biglm2>R CMD build biglm
* checking for file 'biglm/DESCRIPTION' ... OK
* preparing 'biglm':
* checking DESCRIPTION meta-information ...C:/DOCUME~1/c-dutang/Local:
Can't op
n C:/DOCUME~1/c-dutang/Local: No such file or directory
2010 Apr 13
2
how to work with big matrices and the ff-package?
Hello everyone,
I need to create and work with some big matrices that actually have somewhat over 2 million columns and 117 rows. To do some calculations on such big matrices R just needs too much memory for my PC (4GB installed). So I need a solution to work with large datasets. I'm trying to use the ff-package but I don't think I really understand the whole functionality of the
2009 Feb 19
1
Questions about biglm
Hello folks,
I am very excited to have discovered R and have been exploring its
capabilities. R's regression models are of great interest to me as my
company is in the business of running thousands of linear regressions
on large datasets.
I am using biglm to run linear regressions on datasets that are as
large as several GB's. I have been pleasantly surprised that biglm
runs the
2010 Oct 31
1
biglm: how it handles large data set?
I am trying to figure out why 'biglm' can handle large data set...
According to the R document - "biglm creates a linear model object that uses
only p^2 memory for p variables. It can be updated with more data using
update. This allows linear regression on data sets larger than memory."
After reading the source code below? I still could not figure out how
'update'
2012 May 04
2
Can't import this 4GB DATASET
Dear Experienced R Practitioners,
I have 4GB .txt data called "dataset.txt" and have attempted to use *ff,
bigmemory, filehash and sqldf *packages to import it, but have had no
success. The readLines output of this data is:
readLines("dataset.txt",n=20)
[1] " "