Hi everybody! I've a large dataset, about 2 Mio entries of the format which I would like to import into a frame: <integer><integer><float><string><float><string><string> Because to the huge data amount I've choosen a binary format instead of a text format when exporting from Matlab. My import function is attached below. It works fine for only some entries but is deadly slow when trying to read the complete set. Does anybody has some pointers for me for improving the import or handling such large data sets? Thanks in advance! Uli read.DET.data <- function ( f ) { counter <- 1 spk.v <- c() imp.v <- c() score.v <- c() th.v <- c() ses.v <- c() rec.v <- c() type.v <- c() fid <- file( f ,"rb") tempi <- readBin(fid , integer(), size=1, signed=FALSE) while ( length(tempi) != 0) { spk.v[ counter ] <- tempi imp.v[ counter ] <- readBin(fid, integer(), size=1, signed=FALSE) score.v[ counter ] <- readBin(fid, numeric(), size=4) type.v[ counter ] <- readBin(fid, character()) th.v[ counter ] <- readBin(fid, numeric(), size=4) ses.v[ counter ] <- readBin(fid, character()) rec.v[ counter ] <- readBin(fid, character()) counter <- counter + 1 tempi <- readBin(fid, integer(), size=1, signed=FALSE) } close( fid ) spkf <- factor ( spk.v ) impf <- factor ( imp.v ) det.f <- data.frame( spk=spkf, imp=impf, score=score.v, th=th.v, ses=ses.v, rec=rec.v, type=type.v) det.f }
Uli Tuerk wrote:> Hi everybody! > > I've a large dataset, about 2 Mio entries of the format which I would like > to import into a frame: > <integer><integer><float><string><float><string><string> > > Because to the huge data amount I've choosen a binary format instead > of a text format when exporting from Matlab. > My import function is attached below. It works fine for only some entries > but is deadly slow when trying to read the complete set. > > Does anybody has some pointers for me for improving the import or handling > such large data sets?Suggestion: a) Use a database!!! And only for very strong reasons against a): b) Rewrite your import code in C. c) optimize the code below by initializing the objects in full length (e.g. imp.v <- numeric(n)) (maybe you can read it from the header or derive the size from the size of the file ....) Uwe Ligges> Thanks in advance! > > Uli > > > > read.DET.data <- function ( f ) { > counter <- 1 > spk.v <- c() > imp.v <- c() > score.v <- c() > th.v <- c() > ses.v <- c() > rec.v <- c() > type.v <- c() > fid <- file( f ,"rb") > tempi <- readBin(fid , integer(), size=1, signed=FALSE) > while ( length(tempi) != 0) { > spk.v[ counter ] <- tempi > imp.v[ counter ] <- readBin(fid, integer(), size=1, signed=FALSE) > score.v[ counter ] <- readBin(fid, numeric(), size=4) > type.v[ counter ] <- readBin(fid, character()) > th.v[ counter ] <- readBin(fid, numeric(), size=4) > ses.v[ counter ] <- readBin(fid, character()) > rec.v[ counter ] <- readBin(fid, character()) > counter <- counter + 1 > tempi <- readBin(fid, integer(), size=1, signed=FALSE) > } > close( fid ) > spkf <- factor ( spk.v ) > impf <- factor ( imp.v ) > > det.f <- data.frame( spk=spkf, imp=impf, score=score.v, th=th.v, ses=ses.v, rec=rec.v, type=type.v) > > det.f > } > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Probably the simplest way to improve the speed of your code would be to write the data so that all the data in a column is contiguous. Then you'll be able to read each column with a single call to readBin(). hope this helps, Tony Plate At Tuesday 04:02 AM 6/1/2004, Uli Tuerk wrote:>Hi everybody! > >I've a large dataset, about 2 Mio entries of the format which I would like >to import into a frame: ><integer><integer><float><string><float><string><string> > >Because to the huge data amount I've choosen a binary format instead >of a text format when exporting from Matlab. >My import function is attached below. It works fine for only some entries >but is deadly slow when trying to read the complete set. > >Does anybody has some pointers for me for improving the import or handling >such large data sets? > >Thanks in advance! > >Uli > > > >read.DET.data <- function ( f ) { > counter <- 1 > spk.v <- c() > imp.v <- c() > score.v <- c() > th.v <- c() > ses.v <- c() > rec.v <- c() > type.v <- c() > fid <- file( f ,"rb") > tempi <- readBin(fid , integer(), size=1, signed=FALSE) > while ( length(tempi) != 0) { > spk.v[ counter ] <- tempi > imp.v[ counter ] <- readBin(fid, integer(), size=1, > signed=FALSE) > score.v[ counter ] <- readBin(fid, numeric(), size=4) > type.v[ counter ] <- readBin(fid, character()) > th.v[ counter ] <- readBin(fid, numeric(), size=4) > ses.v[ counter ] <- readBin(fid, character()) > rec.v[ counter ] <- readBin(fid, character()) > counter <- counter + 1 > tempi <- readBin(fid, integer(), size=1, signed=FALSE) > } > close( fid ) > spkf <- factor ( spk.v ) > impf <- factor ( imp.v ) > > det.f <- data.frame( spk=spkf, imp=impf, score=score.v, th=th.v, > ses=ses.v, rec=rec.v, type=type.v) > > det.f >} > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://www.stat.math.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html