similar to: Large Datasets

Displaying 20 results from an estimated 1100 matches similar to: "Large Datasets"

2009 Dec 03
4
Data Manipulation Question
Can R support data manipulation programming that is available in the SAS datastep?  Specifically, can R support the following: -          Read multiple dataset one record at a time and compare values from each; then base on if-then logic write to multiple output files -          Load a lookup table and then process a different file; based on if-then logic, access and lookup values in the table
2011 Feb 02
1
Help with Help
I have recently been reading several books on data mining which contain a few data sets.  The books offer some perspective on model choices, tuning decisions, result interpretation.  Are there any good resources that can walk me through the thought process of an experienced data miner with the datasets that are included for packages such as "rpart" & "kmeans". I
2012 May 04
2
Can't import this 4GB DATASET
Dear Experienced R Practitioners, I have 4GB .txt data called "dataset.txt" and have attempted to use *ff, bigmemory, filehash and sqldf *packages to import it, but have had no success. The readLines output of this data is: readLines("dataset.txt",n=20) [1] " "
2010 Aug 11
1
Bigmemory: Error Running Example
Hi, I am trying to run the bigmemory example provided on the http://www.bigmemory.org/ The example runs on the "airline data" and generates summary of the csv files:- library(bigmemory) library(biganalytics) x <- read.big.matrix("2005.csv", type="integer", header=TRUE, backingfile="airline.bin", descriptorfile="airline.desc",
2010 Jun 15
1
help biglm.big.matrix; problem with weights
Hello colleagues, I have tried to use the package biglm. I want to specify a multivariate regression with a weight. I have imported a large dataset with the library(bigmemory). I load the library (biglm) and specified a regression with a weight. But I get everytime a error message like ?object not found? or ?`weights' must be a formula? or "error in eval(expr, envir, enclos)". I
2013 Apr 29
2
bigmemory and R 3.0
Dear helpers, Does anyone have information on the status of bigmemory and R3.0? Will it just take time for the devs to re-code for the new environment? Or is there an alternative for this new version? Thanks Ben Caldwell [[alternative HTML version deleted]]
2012 Jan 18
1
kmeans clustering on large but sparse matrix
Hi, I have a 60k*600k matrix, which exceed the vector length limit of 2^32-1. But it's rather sparse, only 0.02% has value. So I save is as MarketMatrix (mm) file, it's about 300M in size. I use readMM in Matrix package to read it in. If do so, the data type becomes dgTMatrix in 'Matrix' package instead of the common matrix type. The problem is, if I run k-means only on part of
2012 Jul 25
3
ff package: reading selected columns from csv
*Dear R users, Ive just started using the ff package. There is a csv file (~4Gb) with 7 columns and 6e+7 rows. I want to read only column from the file, skipping the first 100 rows. Below Ive provided different outcomes, which will clarify my problem * > sessionInfo() R version 2.14.2 (2012-02-29) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: ... attached base packages: [1] tools
2012 May 05
2
looking for adice on bigmemory framework with C++ and java interoperability
I work with problems that have rather large data requirements -- typically a bunch of multigig arrays. Given how generous R is with using memory, the only way for me to work with R has been to use bigmatrices from bigmemory package. One thing that is missing a bit is interoperability of bigmatrices with C++ and possibly java. What i mean by that is API that would allow read and write filebacked
2012 Oct 18
3
bigmemory for dataframes?
Hi Folks, I've been bumping my head against the 4GB limit for 32-bit R. I can't go to 64-bit R due to package compatibility issues (ROBDC - possible but painful, xlsReadWrite - not possible, and others). I have a number of big dataframes whose columns all sorts of data types - factor, character, integer, etc. I run and save models that keep copies of the modeled data inside the model
2010 Dec 24
1
How to specify ff object filepaths when reading a CSV file into a ff data frame.
Hi, The read.csv.ffdf function in package ff will create the ff object physical file in the default directories, I am trying to let the files created in the paths users specify, I think the point is to make use of the asffdf_args parameter, I have a test CSV file named D:\rtemp\fftest.csv, the content of the file is as following: col1,col2,col3 1,"amber",2.4 2,"linda",4.5
2013 Nov 18
1
Reading in csv data with ff package
I've spent some time trying to wrap my head around reading in large csv files with the ff-package. I think I know how to do it, but am bumping into some problems. I've tried to recreate the issues as best as I can with a smaller example and maybe someone can help explain the problems. The following code just creates a csv file with an integer column, character column and logical column.
2010 Apr 13
2
how to work with big matrices and the ff-package?
Hello everyone, I need to create and work with some big matrices that actually have somewhat over 2 million columns and 117 rows. To do some calculations on such big matrices R just needs too much memory for my PC (4GB installed). So I need a solution to work with large datasets. I'm trying to use the ff-package but I don't think I really understand the whole functionality of the
2010 Jun 11
1
ff package when reading .csv files
Hi My aim is to read a large .csv file into R. I ran the following code and am using R version 10.1 on Windows. >library(ff) > read.csv.ffdf(x=NULL,"file.csv",fileEncoding="",nrows=-1,first.rows=NULL,next.rows=NULL,levels=NULL,appendLevels=TRUE,FUN="read.table",transFUN=NULL,asffdf_args=list(),BATCHBYTES=getOption("ffbatchbytes"),VERBOSE=FALSE)
2013 May 07
1
how to read numeric vector as factors using read.table.ffdf
I have a big data set that includes character variables of many different values. I'm trying to use ff to read the data and then use biglm.big.matrix to build linear models. However, since big.matrix will convert all character vectors to factors and the character labels will be lost. I decided to create a lookup table outside of R for my character columns and use numbers to represent different
2013 Sep 30
4
read.table() with quoted integers
Hi! It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider quoted integers as an acceptable value for columns for which colClasses="integer". But when colClasses is omitted, these columns are read as integer anyway. For example, let's consider a file named file.dat, containing: "1" "2" > read.table("file.dat",
2011 Dec 22
1
ff object in lapply function
Hello. I'm using as.ffdf(mydataframe) to create ffdf objects inside an lapply loop and returning that. I then use crbind to combine the lapply results into allData. So...simplified flow looks like this. res <- lapply(1:nchunks, function(n) { blah blah with nth chunk mydataframe <- data.frame(blah blah) dat <-
2012 Sep 14
1
Any way to get read.table.ffdf() (in the ff package) to pass colClasses or comment.char parameters through to read.fwf() ?
Hi everyone, my apologies if I'm overlooking something obvious in the documentation. I'm relatively inexperienced with the (awesome) ff package. My goal is to use the read.table.ffdf() function to call the read.fwf() function and pass through the colClasses and comment.char arguments. The code below shows exactly what doesn't work for me. If the colClasses and comment.char
2009 Nov 09
3
Hand-crafting an .RData file
Hello, I frequently have to export a large quantity of data from some source (for example, a database, or a hand-written perl script) and then read it into R. This occasionally takes a lot of time; I'm usually using read.table("filename",comment.char="",quote="") to read the data once it is written to disk. However, I *know* that the program that generates
2012 Oct 31
1
ffdfindexget from package ff
I'm having trouble getting ffdfindexget to work right in Windows. Even the most trivial of examples gives me problems. > myVec = ff(1:5) > another = ff(10:14) > littleFrame = ffdf(myVec, another) > posVec = ff(c(2, 4), vmode = 'integer') > ffdfindexget(littleFrame, posVec) Error in if (any(B < 1)) stop("B too small") : missing value where TRUE/FALSE