thr3ads.net - similar to: "filehash for big data"

Displaying 20 results from an estimated 8000 matches similar to: "filehash for big data"

2010 Feb 22

big panel: filehash, bigmemory or other

Dear R-list I'm on my way to start a new project on a rather big panel, consisting of approximately 8 million observations in 30 waves of data and about 15 variables. I have a similar data set that is approximately 7 gigabytes in size. Until now I have done my data management in SAS, and Stata, mostly identifying spells, counting events in intervals, and a like, but I would like to

filehash

2008 Mar 15

filehash

Hello, I'm using filehash on the windows XP and it has been working fine with the newest R version 2.6.2. However, on the windows vista, when I ran the same code, I got the following error: > dbCreate("simdb") #create simdb database [1] TRUE > db<-dbInit("simdb") #initiate an object of database Error in sprintf(gettext(fmt, domain = domain), ...) : object

File too big for filehash?

2009 May 19

File too big for filehash?

Dear R users, I try to use a very large file (~3 Gib) with the filehash package. The length of the dataset is around 4,000,000 obs. I get this message from R while trying to "load" the dataset (named "cc084.csv"): > dumpDF(read.csv("cc084.csv", header=T), dbName="db01") Erreur : impossible d'allouer un vecteur de taille 15.6 Mo (French) Error:

filehash does not install on FreeBSD

2010 Jan 21

filehash does not install on FreeBSD

Trying to install package 'filehash' I get the following error on FreeBSD 9.0-CURRENT (amd64) with R version 2.11.0 (2010-01-15 r50990): ----------------------------------- R CMD INSTALL filehash_2.0-1.tar.gz * installing to library '/usr/local/lib/R/library' * installing *source* package 'filehash' ... ** libs gcc -std=gnu99 -I/usr/local/lib/R/include

Can the file locking in filehash be reused? (Was: Re: [R] [R-pkgs] filehash 2.0)

2008 Aug 28

Can the file locking in filehash be reused? (Was: Re: [R] [R-pkgs] filehash 2.0)

Hi (Roger), I saw the announcement of filehash v2.0 and the sentence "This development has lead to better file locking for concurrent access and faster reading and writing of data in general" caught my attention. What kind of file locking do you refer to here? I am looking for a mechanism that can be used to lock files for reading and/or writing, and I'd love to have a cross

filehash 2.0

2008 Aug 28

filehash 2.0

I have just uploaded to CRAN version 2.0 of the 'filehash' package. This version contains a major rewriting of many of the internals (much rewritten in C) for the DB1 format, which is the default. This development has lead to better file locking for concurrent access and faster reading and writing of data in general. In addition to rewriting the internals, I have added two modules for a

filehash 2.0

2008 Aug 28

filehash 2.0

What is the best package for large data cleaning (not statistical analysis)?

2009 Mar 15

What is the best package for large data cleaning (not statistical analysis)?

Dear R helpers: I am a newbie to R and have a question related to cleaning large data frames in R. So far, I have been using SAS for data cleaning because my data sets are relatively large (handling multiple files, each could be as large as 5-10 G). I am not a fan of SAS at all and am eager to move data cleaning tasks into R completely. Seems to me, there are 3 options. Using SQL, ff or

filehash - multiple indices via '[' not allowed when using RDS format

2010 Jan 02

filehash - multiple indices via '[' not allowed when using RDS format

Hi, I have been using filehash for a while. It has performed very well. However, recently I found filehash gives an error when I need to do something like db[c("a", "b")] when the db is in RDS format. Does any one know a way to get around that? The code below reproduces the error thanks Jeff filehashOption(defaultType = "DB1") dbCreate("mydb3", type =

Appending objects created using filehash package

2009 Jan 23

Appending objects created using filehash package

Hi, I am working with a very large dataset, and am using the 'filehash' package to manage such a large file. While I have no problem accessing objects that I load into a database, I was hoping there is a better way to append to objects already in the database. The only way I know now to append to an object, basically requires rewriting the entire object. Sample code:

TikzDevice

2012 Apr 27

TikzDevice

Dear R'ers, I have trouble installing tikzDevice in Ubuntu. When I use install.packages("tikzDevice"), it gives error message: ERROR: dependency ‘filehash’ is not available for package ‘tikzDevice’ * removing ‘/usr/local/lib/R/site-library/tikzDevice’ Then I tried filehash installation, I get the message: "package ‘filehash’ is not available (for R version 2.13.1)"

tikzDevice install problem

2011 Sep 23

tikzDevice install problem

Hi everybody! I'm trying to install the tikzDevice package, and I keep on getting the > ERROR: dependency ‘filehash’ is not available for package ‘tikzDevice’ I tried install.packages('filehash') and I get > package ‘filehash’ is not available Does anybody have the same problem or any hint? thank youhelena [[alternative HTML version deleted]]

Error message while trying to update packages: Error in gzfile(file, mode) : unable to open connection

2008 Mar 08

Error message while trying to update packages: Error in gzfile(file, mode) : unable to open connection

Hello, I have just installed v 2.6.2 on a new computer running Windows XP and tried to perform 'update packages' via the menu option on the R console. Any advice on the following problem is much appreciated. Bob Below are the warning and error messages received. A search of the hard drive does not reveal any file including "RtmpgMMu03/libloc" . >

suppress startup messages from default packages

2013 Jul 15

suppress startup messages from default packages

Hi all, several packages print messages during loading. How do I avoid to see them when the packages are in the defaultPackages? Here is an example. With this in ~/.Rprofile ,----[ ~/.Rprofile ] | old <- getOption("defaultPackages") | options(defaultPackages = c(old, "filehash")) | rm(old) `---- I get as last line when starting R: ,---- | filehash: Simple key-value

merging and working with BIG data sets. Is sqldf the best way??

2010 Oct 12

merging and working with BIG data sets. Is sqldf the best way??

Hi everyone, I’m working with some very big datasets (each dataset has 11 million rows and 2 columns). My first step is to merge all my individual data sets together (I have about 20) I’m using the following command from sqldf data1 <- sqldf("select A.*, B.* from A inner join B using(ID)") But it’s taking A VERY VERY LONG TIME to merge just 2 of the datasets

problem with RGtk

2010 Feb 19

problem with RGtk

Dear List, I would like to ask about package RGtk2 with which I have a problem. I will very much appreciate if somebody could tell me what I need to do. I need to install a package scdMicro and it depends on gWidgetsRGtk2. I am working on MAC, version 10.5.8. When I try to load gWidgetsRGtk2 (or RGtk2), it asks me: "Instal GTK+?" I installed Gtk+ from CRAN, and added a path: export

Can't import this 4GB DATASET

2012 May 04

Can't import this 4GB DATASET

Dear Experienced R Practitioners, I have 4GB .txt data called "dataset.txt" and have attempted to use *ff, bigmemory, filehash and sqldf *packages to import it, but have had no success. The readLines output of this data is: readLines("dataset.txt",n=20) [1] " "

(no subject)

2010 Oct 22

(no subject)

I am doing cluster analysis on 8768 respondents on 5 lifestyle variables and am having difficulty constructing a dissimilarity matrix which I will use for PAM. I always get an error: “cannot allocate vector of size 293.3 Mb” even if I have already increased my memory to its limit of 4000. I did it on 2GB , 32-bit OS . I tried ff and filehash and I still get the same error. Can you please

C versions of serialize/unserialize in packages

2008 Jul 31

C versions of serialize/unserialize in packages

Are the functions 'R_Unserialize' and 'R_InitFileInPStream' allowed to be used in R packages? I guess I'm just not clear on the implications of this comment in 'Rinternals.h': /* The connection interface is not yet available to packages. To allow limited use of connection pointers this defines the opaque pointer type. */ I have a function in the

unable to install package ff

2007 Nov 01

unable to install package ff

Hi all, I've had one of my most miserable R weeks in memory. I'm trying to deal with huge datasets (>1GB each) but am running up against those pesky memory limits. The libraries filehash and g.data are not very suitable for what I need. I haven't gotten into the sql thing yet. Most recently I've been trying to install the new package ff (not yet on the CRAN repository). I

similar to: filehash for big data