similar to: How to deal with more than 6GB dataset using R?

Displaying 20 results from an estimated 10000 matches similar to: "How to deal with more than 6GB dataset using R?"

2013 May 07
1
how to read numeric vector as factors using read.table.ffdf
I have a big data set that includes character variables of many different values. I'm trying to use ff to read the data and then use biglm.big.matrix to build linear models. However, since big.matrix will convert all character vectors to factors and the character labels will be lost. I decided to create a lookup table outside of R for my character columns and use numbers to represent different
2017 Jun 17
3
Prediction with two fixed-effects - large number of IDs
Dear all, I am running a panel regression with time and location fixed effects: ### reg1 <- lm(lny ~ factor(id) + factor(year) + x1+ I(x1)^2 + x2+ I(x2)^2 , data=mydata, na.action="na.omit") ### My goal is to use the estimation for prediction. However, I have 8,500 IDs, which is resulting in very slow computation. Ideally, I would like to do the following: ### reg2 <-
2012 Mar 30
3
ff usage for glm
Greetings useRs, Can anyone provide an example how to use ff to feed a very large data frame to glm? The data.frame cannot be loaded in R using conventional read.csv as it is too big. glm(...,data=ff.file) ?? Thank you Stephen B
2017 Jun 17
0
Prediction with two fixed-effects - large number of IDs
I have no direct experience with such horrific models, but your formula is a mess and Google suggests the biglm package with ffdf. Specifically, you should convert your discrete variables to factors before you build the model, particularly since you want to use predict after the fact, for which you will need a new data set with the exact same levels in the factors. Also, your use of I() is
2012 May 04
2
Can't import this 4GB DATASET
Dear Experienced R Practitioners, I have 4GB .txt data called "dataset.txt" and have attempted to use *ff, bigmemory, filehash and sqldf *packages to import it, but have had no success. The readLines output of this data is: readLines("dataset.txt",n=20) [1] " "
2012 Jul 25
3
ff package: reading selected columns from csv
*Dear R users, Ive just started using the ff package. There is a csv file (~4Gb) with 7 columns and 6e+7 rows. I want to read only column from the file, skipping the first 100 rows. Below Ive provided different outcomes, which will clarify my problem * > sessionInfo() R version 2.14.2 (2012-02-29) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: ... attached base packages: [1] tools
2008 Oct 18
1
[LLVMdev] 回复:Re: [LLVMdev] [Need your help]
Hi Eli, Thanks for your rapid response! Now I have another question. How&nbsp;to&nbsp;get LLVM bc files successfully by compiling test.c and static libraries ? Thanks a lot! ----- 原始邮件 ----- 发件人:Eli&nbsp;Friedman&nbsp;<eli.friedman at gmail.com> 收件人:LLVM&nbsp;Developers&nbsp;Mailing&nbsp;List&nbsp;<llvmdev at cs.uiuc.edu>
2011 Aug 10
1
Problems setting up a VM ....
.... New to this list & definitely new to virtualization :-/ .... I am trying to setup a 64-bit CentOS 5.6 VM on a 64-bit FC14 server host, all patched up as of a couple of weeks ago. uname -a on the host shows: [wam at Q6600, CFD, 9:24:23am] 540 % uname -a Linux Q6600 2.6.35.11-83.fc14.x86_64 #1 SMP Mon Feb 7 07:06:44 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux [wam at Q6600, CFD, 9:52:43am]
2010 Dec 24
1
How to specify ff object filepaths when reading a CSV file into a ff data frame.
Hi, The read.csv.ffdf function in package ff will create the ff object physical file in the default directories, I am trying to let the files created in the paths users specify, I think the point is to make use of the asffdf_args parameter, I have a test CSV file named D:\rtemp\fftest.csv, the content of the file is as following: col1,col2,col3 1,"amber",2.4 2,"linda",4.5
2013 Nov 18
1
Reading in csv data with ff package
I've spent some time trying to wrap my head around reading in large csv files with the ff-package. I think I know how to do it, but am bumping into some problems. I've tried to recreate the issues as best as I can with a smaller example and maybe someone can help explain the problems. The following code just creates a csv file with an integer column, character column and logical column.
2008 Mar 19
1
Re: conversion between Full-virtualization andPara-virtual
in a full-virtualization,this is my config file : name = "rhel51x64f" uuid = "8d33b93f-03e1-b2c2-c18f-82003c6e5b6f" maxmem = 2048 memory = 2048 vcpus = 2 builder = "hvm" kernel = "/usr/lib/xen/boot/hvmloader" boot = "cd" pae = 1 acpi = 1 apic = 1 on_poweroff = "destroy" on_reboot = "restart" on_crash = "restart"
2010 Apr 13
2
how to work with big matrices and the ff-package?
Hello everyone, I need to create and work with some big matrices that actually have somewhat over 2 million columns and 117 rows. To do some calculations on such big matrices R just needs too much memory for my PC (4GB installed). So I need a solution to work with large datasets. I'm trying to use the ff-package but I don't think I really understand the whole functionality of the
2008 Oct 17
1
[LLVMdev] [Need your help]
Hi, This is Crystal. I have some questions about llvm-gcc. Could you please give me some advice? Thanks in advance. Problem description: Env: llvm-gcc (GCC) 4.2.1 gcc (GCC) 4.1.2 OS:fedora7 I tried to compile a C programme test.c with llvm-gcc by task: [root at localhost mylib]# llvm-gcc -emit-llvm test.c -Llibmylib.a -c -o test.bc [root at localhost mylib]# lli test.bc after running the command
2010 Jun 11
1
ff package when reading .csv files
Hi My aim is to read a large .csv file into R. I ran the following code and am using R version 10.1 on Windows. >library(ff) > read.csv.ffdf(x=NULL,"file.csv",fileEncoding="",nrows=-1,first.rows=NULL,next.rows=NULL,levels=NULL,appendLevels=TRUE,FUN="read.table",transFUN=NULL,asffdf_args=list(),BATCHBYTES=getOption("ffbatchbytes"),VERBOSE=FALSE)
2013 Sep 30
4
read.table() with quoted integers
Hi! It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider quoted integers as an acceptable value for columns for which colClasses="integer". But when colClasses is omitted, these columns are read as integer anyway. For example, let's consider a file named file.dat, containing: "1" "2" > read.table("file.dat",
2011 Dec 22
1
ff object in lapply function
Hello. I'm using as.ffdf(mydataframe) to create ffdf objects inside an lapply loop and returning that. I then use crbind to combine the lapply results into allData. So...simplified flow looks like this. res <- lapply(1:nchunks, function(n) { blah blah with nth chunk mydataframe <- data.frame(blah blah) dat <-
2007 Aug 16
4
Linear models over large datasets
I'd like to fit linear models on very large datasets. My data frames are about 2000000 rows x 200 columns of doubles and I am using an 64 bit build of R. I've googled about this extensively and went over the "R Data Import/Export" guide. My primary issue is although my data represented in ascii form is 4Gb in size (therefore much smaller considered in binary), R consumes about
2012 Sep 14
1
Any way to get read.table.ffdf() (in the ff package) to pass colClasses or comment.char parameters through to read.fwf() ?
Hi everyone, my apologies if I'm overlooking something obvious in the documentation. I'm relatively inexperienced with the (awesome) ff package. My goal is to use the read.table.ffdf() function to call the read.fwf() function and pass through the colClasses and comment.char arguments. The code below shows exactly what doesn't work for me. If the colClasses and comment.char
2011 Feb 11
2
Large Datasets
I have recently been using R - more speciifcally the GUI packages Rattle and Rcmdr. I like these products a lot and want to use them for some projects - the problem that I run into is when I start to try and run large datasets through them.  The data sets are 10-15 million in record quantity and usually have 15-30 fields (both numerical and categorical). I saw that there were some packages
2009 Feb 19
1
Questions about biglm
Hello folks, I am very excited to have discovered R and have been exploring its capabilities. R's regression models are of great interest to me as my company is in the business of running thousands of linear regressions on large datasets. I am using biglm to run linear regressions on datasets that are as large as several GB's. I have been pleasantly surprised that biglm runs the