similar to: merging and working with big data sets

Displaying 20 results from an estimated 8000 matches similar to: "merging and working with big data sets"

2009 Apr 16
0
Major bigmemory revision released.
The re-engineered bigmemory package is now available (Version 3.5 and above) on CRAN. We strongly recommend you cease using the older versions at this point. bigmemory now offers completely platform-independent support for the big.matrix class in shared memory and, optionally, as filebacked matrices for larger-than-RAM applications. We're working on updating the package vignette, and a
2009 Apr 16
0
Major bigmemory revision released.
The re-engineered bigmemory package is now available (Version 3.5 and above) on CRAN. We strongly recommend you cease using the older versions at this point. bigmemory now offers completely platform-independent support for the big.matrix class in shared memory and, optionally, as filebacked matrices for larger-than-RAM applications. We're working on updating the package vignette, and a
2009 Jun 02
2
bigmemory - extracting submatrix from big.matrix object
I am using the library(bigmemory) to handle large datasets, say 1 GB, and facing following problems. Any hints from anybody can be helpful. _Problem-1: _ I am using "read.big.matrix" function to create a filebacked big matrix of my data and get the following warning: > x = read.big.matrix("/home/utkarsh.s/data.csv",header=T,type="double",shared=T,backingfile
2010 Dec 17
1
[Fwd: adding more columns in big.matrix object of bigmemory package]
Hi, With reference to the mail below, I have large datasets, coming from various different sources, which I can read into filebacked big.matrix using library bigmemory. I want to merge them all into one 'big.matrix' object. (Later, I want to run regression using library 'biglm'). I am unsuccessfully trying to do this from quite some time now. Can you please
2008 Jun 25
1
huge data?
Hi Jay Emerson, Our Intention is to primarily optimize "R" to utilize the Parallel Processing Capabilities of CELL BE Processor.(has any work been done in this area?) We have huge pages(of size 1MB 16MB ) available in the system and as you pointed out our data is also in the GB ranges.So the idea is if Vectors of this huge size are allocated from Huge Pages the performance will
2008 Jan 30
1
Understanding an R improvement that already occurred.
I was surprised to observe the following difference between 2.4.1 and 2.6.0 after a long overdue upgrade a few months ago of our departmental server. It wasn't a bug fix, but a subtle improvement. Here's the simplest example I could create. The size is excessive, on the order of the Netflix Competition data. The integer matrix is about 1.12 GB, and if coerced to numeric it is 2.24 GB.
2010 May 10
0
bigmemory 4.2.3
The long-promised revision to bigmemory has arrived, with package 4.2.3 now on CRAN. The mutexes (locks) have been extracted and will be available through package synchronicity (on R-Forge, soon to appear on CRAN). Initial versions of packages biganalytics and bigtabulate are on CRAN, and new versions which resolve the warnings and have streamlined CRAN-friendly configurations will appear
2010 May 10
0
bigmemory 4.2.3
The long-promised revision to bigmemory has arrived, with package 4.2.3 now on CRAN. The mutexes (locks) have been extracted and will be available through package synchronicity (on R-Forge, soon to appear on CRAN). Initial versions of packages biganalytics and bigtabulate are on CRAN, and new versions which resolve the warnings and have streamlined CRAN-friendly configurations will appear
2012 May 11
1
bigmemory
To answer your first question about read.big.matrix(), we don't know what your acc3.dat file is, but it doesn't appear to have been detected as a standard file (like a CSV file) or -- perhaps -- doesn't even exist (or doesn't exist in your current directory)? Next: > In addition, I am planning to do a multiple imputation with MICE package > using the data read by bigmemory
2009 Jul 20
2
kmeans.big.matrix
Hi, I'm playing around with the 'bigmemory' package, and I have finally managed to create some really big matrices. However, only now I realize that there may not be functions made for what I want to do with the matrices... I would like to perform a cluster analysis based on a big.matrix. Googling around I have found indications that a certain kmeans.big.matrix() function should
2010 Aug 11
1
Bigmemory: Error Running Example
Hi, I am trying to run the bigmemory example provided on the http://www.bigmemory.org/ The example runs on the "airline data" and generates summary of the csv files:- library(bigmemory) library(biganalytics) x <- read.big.matrix("2005.csv", type="integer", header=TRUE, backingfile="airline.bin", descriptorfile="airline.desc",
2007 Dec 08
2
NAMESPACE choices for exporting S4 methods
We are building a package, and want to create S4 methods for both head and mean for our own BigMatrix class. Following the recommendation in "Writing R Extensions" we use exportMethods instead of export in NAMESPACE (this is described as being "clearer"). This works for head, but not for mean. Obviously we importFrom(utils, head), but don't need to do this for mean,
2010 Oct 12
2
merging and working with BIG data sets. Is sqldf the best way??
Hi everyone, I’m working with some very big datasets (each dataset has 11 million rows and 2 columns). My first step is to merge all my individual data sets together (I have about 20) I’m using the following command from sqldf data1 <- sqldf("select A.*, B.* from A inner join B using(ID)") But it’s taking A VERY VERY LONG TIME to merge just 2 of the datasets
2008 Jun 25
0
Package bigmemory now available on CRAN
Package "bigmemory" is now available on CRAN. A brief abstract follows: Multi-gigabyte data sets challenge and frustrate R users even on well-equipped hardware. C/C++ and Fortran programming can be helpful, but is cumbersome for interactive data analysis and lacks the flexibility and power of R's rich statistical programming environment. The new package bigmemory bridges this gap,
2008 Jun 25
0
Package bigmemory now available on CRAN
Package "bigmemory" is now available on CRAN. A brief abstract follows: Multi-gigabyte data sets challenge and frustrate R users even on well-equipped hardware. C/C++ and Fortran programming can be helpful, but is cumbersome for interactive data analysis and lacks the flexibility and power of R's rich statistical programming environment. The new package bigmemory bridges this gap,
2011 Dec 01
1
bigmemory on Solaris
At one point we might have gotten something working (older version?) on Solaris x86, but were never successful on Solaris sparc that I remember -- it isn't a platform we can test and support. We believe there are problems with BOOST library compatibilities. We'll try (again) to clear up the other warnings in the logs, though. !-) We should also revisit the possibility of a CRAN BOOST
2012 May 05
2
looking for adice on bigmemory framework with C++ and java interoperability
I work with problems that have rather large data requirements -- typically a bunch of multigig arrays. Given how generous R is with using memory, the only way for me to work with R has been to use bigmatrices from bigmemory package. One thing that is missing a bit is interoperability of bigmatrices with C++ and possibly java. What i mean by that is API that would allow read and write filebacked
2011 Sep 29
1
efficient coding with foreach and bigmemory
I recently learned about the bigmemory and foreach packages and am trying to use them to help me create a very large matrix. Without those packages, I can create the type of matrix that I want with 10 columns and 5e6 rows. I would like to be able to scale up to 5e9 rows, or more, if possible. I have created a simplified example of what I'm trying to do, below. The first part of the
2010 Feb 22
1
big panel: filehash, bigmemory or other
Dear R-list I'm on my way to start a new project on a rather big panel, consisting of approximately 8 million observations in 30 waves of data and about 15 variables. I have a similar data set that is approximately 7 gigabytes in size. Until now I have done my data management in SAS, and Stata, mostly identifying spells, counting events in intervals, and a like, but I would like to
2010 May 10
0
Bayesian change point" package bcp 2.2.0 available
Version 2.2.0 of package bcp is now available.? It replaces the suggests of NetWorkSpaces (previously used for optional parallel MCMC) with the dependency on package foreach, giving greater flexibility and supporting a wider range of parallel backends (see doSNOW, doMC, etc...). For those unfamiliar with foreach (thanks to Steve Weston for this contribution), it's a beautiful and highly