I noticed same issue but didnt care much :) On Sat, Sep 17, 2016, 18:01 jim holtman <jholtman at gmail.com> wrote:> Your example was not reproducible. Also how do you "break" out of the > "while" loop? > > > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > > On Sat, Sep 17, 2016 at 8:05 AM, Philippe de Rochambeau <phiroc at free.fr> > wrote: > > > Hello, > > the following function, which stores numeric values extracted from a > > binary file, into an R matrix, is very slow, especially when the said > file > > is several MB in size. > > Should I rewrite the function in inline C or in C/C++ using Rcpp? If the > > latter case is true, how do you ? readBin ? in Rcpp (I?m a total Rcpp > > newbie)? > > Many thanks. > > Best regards, > > phiroc > > > > > > ------------- > > > > # inputPath is something like http://myintranet/getData? > > pathToFile=/usr/lib/xxx/yyy/data.bin <http://myintranet/getData? > > pathToFile=/usr/lib/xxx/yyy/data.bin> > > > > PLTreader <- function(inputPath){ > > URL <- file(inputPath, "rb") > > PLT <- matrix(nrow=0, ncol=6) > > compteurDePrints = 0 > > compteurDeLignes <- 0 > > maxiPrints = 5 > > displayData <- FALSE > > while (TRUE) { > > periodIndex <- readBin(URL, integer(), size=4, n=1, > > endian="little") # int (4 bytes) > > eventId <- readBin(URL, integer(), size=4, n=1, > > endian="little") # int (4 bytes) > > dword1 <- readBin(URL, integer(), size=4, signed=FALSE, > > n=1, endian="little") # int > > dword2 <- readBin(URL, integer(), size=4, signed=FALSE, > > n=1, endian="little") # int > > if (dword1 < 0) { > > dword1 = dword1 + 2^32-1; > > } > > eventDate = (dword2*2^32 + dword1)/1000 > > repNum <- readBin(URL, integer(), size=2, n=1, > > endian="little") # short (2 bytes) > > exp <- readBin(URL, numeric(), size=4, n=1, > > endian="little") # float (4 bytes, strangely enough, would expect 8) > > loss <- readBin(URL, numeric(), size=4, n=1, > > endian="little") # float (4 bytes) > > PLT <- rbind(PLT, c(periodIndex, eventId, eventDate, > > repNum, exp, loss)) > > } # end while > > return(PLT) > > close(URL) > > } > > > > ---------------- > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
You should probably pick a forum ? here or SO : http://stackoverflow.com/questions/39547398/faster-reading-of-binary-files-in-r : - vs cross-post to all of them. On Sat, Sep 17, 2016 at 11:04 AM, Ismail SEZEN <sezenismail at gmail.com> wrote:> I noticed same issue but didnt care much :) > > On Sat, Sep 17, 2016, 18:01 jim holtman <jholtman at gmail.com> wrote: > > > Your example was not reproducible. Also how do you "break" out of the > > "while" loop? > > > > > > Jim Holtman > > Data Munger Guru > > > > What is the problem that you are trying to solve? > > Tell me what you want to do, not how you want to do it. > > > > On Sat, Sep 17, 2016 at 8:05 AM, Philippe de Rochambeau <phiroc at free.fr> > > wrote: > > > > > Hello, > > > the following function, which stores numeric values extracted from a > > > binary file, into an R matrix, is very slow, especially when the said > > file > > > is several MB in size. > > > Should I rewrite the function in inline C or in C/C++ using Rcpp? If > the > > > latter case is true, how do you ? readBin ? in Rcpp (I?m a total Rcpp > > > newbie)? > > > Many thanks. > > > Best regards, > > > phiroc > > > > > > > > > ------------- > > > > > > # inputPath is something like http://myintranet/getData? > > > pathToFile=/usr/lib/xxx/yyy/data.bin <http://myintranet/getData? > > > pathToFile=/usr/lib/xxx/yyy/data.bin> > > > > > > PLTreader <- function(inputPath){ > > > URL <- file(inputPath, "rb") > > > PLT <- matrix(nrow=0, ncol=6) > > > compteurDePrints = 0 > > > compteurDeLignes <- 0 > > > maxiPrints = 5 > > > displayData <- FALSE > > > while (TRUE) { > > > periodIndex <- readBin(URL, integer(), size=4, n=1, > > > endian="little") # int (4 bytes) > > > eventId <- readBin(URL, integer(), size=4, n=1, > > > endian="little") # int (4 bytes) > > > dword1 <- readBin(URL, integer(), size=4, signed=FALSE, > > > n=1, endian="little") # int > > > dword2 <- readBin(URL, integer(), size=4, signed=FALSE, > > > n=1, endian="little") # int > > > if (dword1 < 0) { > > > dword1 = dword1 + 2^32-1; > > > } > > > eventDate = (dword2*2^32 + dword1)/1000 > > > repNum <- readBin(URL, integer(), size=2, n=1, > > > endian="little") # short (2 bytes) > > > exp <- readBin(URL, numeric(), size=4, n=1, > > > endian="little") # float (4 bytes, strangely enough, would expect 8) > > > loss <- readBin(URL, numeric(), size=4, n=1, > > > endian="little") # float (4 bytes) > > > PLT <- rbind(PLT, c(periodIndex, eventId, eventDate, > > > repNum, exp, loss)) > > > } # end while > > > return(PLT) > > > close(URL) > > > } > > > > > > ---------------- > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide http://www.R-project.org/ > > > posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
I would also suggest that you take a look at the 'pack' package which can convert the binary input to the value you want. Part of your performance problems might be all the short reads that you are doing. Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Sat, Sep 17, 2016 at 11:04 AM, Ismail SEZEN <sezenismail at gmail.com> wrote:> I noticed same issue but didnt care much :) > > On Sat, Sep 17, 2016, 18:01 jim holtman <jholtman at gmail.com> wrote: > >> Your example was not reproducible. Also how do you "break" out of the >> "while" loop? >> >> >> Jim Holtman >> Data Munger Guru >> >> What is the problem that you are trying to solve? >> Tell me what you want to do, not how you want to do it. >> >> On Sat, Sep 17, 2016 at 8:05 AM, Philippe de Rochambeau <phiroc at free.fr> >> wrote: >> >> > Hello, >> > the following function, which stores numeric values extracted from a >> > binary file, into an R matrix, is very slow, especially when the said >> file >> > is several MB in size. >> > Should I rewrite the function in inline C or in C/C++ using Rcpp? If the >> > latter case is true, how do you ? readBin ? in Rcpp (I?m a total Rcpp >> > newbie)? >> > Many thanks. >> > Best regards, >> > phiroc >> > >> > >> > ------------- >> > >> > # inputPath is something like http://myintranet/getData? >> > pathToFile=/usr/lib/xxx/yyy/data.bin <http://myintranet/getData? >> > pathToFile=/usr/lib/xxx/yyy/data.bin> >> > >> > PLTreader <- function(inputPath){ >> > URL <- file(inputPath, "rb") >> > PLT <- matrix(nrow=0, ncol=6) >> > compteurDePrints = 0 >> > compteurDeLignes <- 0 >> > maxiPrints = 5 >> > displayData <- FALSE >> > while (TRUE) { >> > periodIndex <- readBin(URL, integer(), size=4, n=1, >> > endian="little") # int (4 bytes) >> > eventId <- readBin(URL, integer(), size=4, n=1, >> > endian="little") # int (4 bytes) >> > dword1 <- readBin(URL, integer(), size=4, signed=FALSE, >> > n=1, endian="little") # int >> > dword2 <- readBin(URL, integer(), size=4, signed=FALSE, >> > n=1, endian="little") # int >> > if (dword1 < 0) { >> > dword1 = dword1 + 2^32-1; >> > } >> > eventDate = (dword2*2^32 + dword1)/1000 >> > repNum <- readBin(URL, integer(), size=2, n=1, >> > endian="little") # short (2 bytes) >> > exp <- readBin(URL, numeric(), size=4, n=1, >> > endian="little") # float (4 bytes, strangely enough, would expect 8) >> > loss <- readBin(URL, numeric(), size=4, n=1, >> > endian="little") # float (4 bytes) >> > PLT <- rbind(PLT, c(periodIndex, eventId, eventDate, >> > repNum, exp, loss)) >> > } # end while >> > return(PLT) >> > close(URL) >> > } >> > >> > ---------------- >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide http://www.R-project.org/ >> > posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/ >> posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]
Here is an example of how to do it: x <- 1:10 # integer values xf <- seq(1.0, 2, by = 0.1) # floating point setwd("d:/temp") # create file to write to output <- file('integer.bin', 'wb') writeBin(x, output) # write integer writeBin(xf, output) # write reals close(output) library(pack) library(readr) # read all the data at once allbin <- read_file_raw('integer.bin') # decode the data into a list (result <- unpack("V V V V V V V V V V d d d d d d d d d d", allbin)) Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Sat, Sep 17, 2016 at 11:04 AM, Ismail SEZEN <sezenismail at gmail.com> wrote:> I noticed same issue but didnt care much :) > > On Sat, Sep 17, 2016, 18:01 jim holtman <jholtman at gmail.com> wrote: > >> Your example was not reproducible. Also how do you "break" out of the >> "while" loop? >> >> >> Jim Holtman >> Data Munger Guru >> >> What is the problem that you are trying to solve? >> Tell me what you want to do, not how you want to do it. >> >> On Sat, Sep 17, 2016 at 8:05 AM, Philippe de Rochambeau <phiroc at free.fr> >> wrote: >> >> > Hello, >> > the following function, which stores numeric values extracted from a >> > binary file, into an R matrix, is very slow, especially when the said >> file >> > is several MB in size. >> > Should I rewrite the function in inline C or in C/C++ using Rcpp? If the >> > latter case is true, how do you ? readBin ? in Rcpp (I?m a total Rcpp >> > newbie)? >> > Many thanks. >> > Best regards, >> > phiroc >> > >> > >> > ------------- >> > >> > # inputPath is something like http://myintranet/getData? >> > pathToFile=/usr/lib/xxx/yyy/data.bin <http://myintranet/getData? >> > pathToFile=/usr/lib/xxx/yyy/data.bin> >> > >> > PLTreader <- function(inputPath){ >> > URL <- file(inputPath, "rb") >> > PLT <- matrix(nrow=0, ncol=6) >> > compteurDePrints = 0 >> > compteurDeLignes <- 0 >> > maxiPrints = 5 >> > displayData <- FALSE >> > while (TRUE) { >> > periodIndex <- readBin(URL, integer(), size=4, n=1, >> > endian="little") # int (4 bytes) >> > eventId <- readBin(URL, integer(), size=4, n=1, >> > endian="little") # int (4 bytes) >> > dword1 <- readBin(URL, integer(), size=4, signed=FALSE, >> > n=1, endian="little") # int >> > dword2 <- readBin(URL, integer(), size=4, signed=FALSE, >> > n=1, endian="little") # int >> > if (dword1 < 0) { >> > dword1 = dword1 + 2^32-1; >> > } >> > eventDate = (dword2*2^32 + dword1)/1000 >> > repNum <- readBin(URL, integer(), size=2, n=1, >> > endian="little") # short (2 bytes) >> > exp <- readBin(URL, numeric(), size=4, n=1, >> > endian="little") # float (4 bytes, strangely enough, would expect 8) >> > loss <- readBin(URL, numeric(), size=4, n=1, >> > endian="little") # float (4 bytes) >> > PLT <- rbind(PLT, c(periodIndex, eventId, eventDate, >> > repNum, exp, loss)) >> > } # end while >> > return(PLT) >> > close(URL) >> > } >> > >> > ---------------- >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide http://www.R-project.org/ >> > posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/ >> posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]
Hi Jim, this is exactly the answer I was look for. Many thanks. I didn?t R had a pack function, as in PERL. To answer your earlier question, I am trying to update legacy code to read a binary file with unknown size, over a network, slice up it into rows each containing an integer, an integer, a long, a short, a float and a float, and stuff the rows into a matrix. Best regards, Philippe> Le 17 sept. 2016 ? 20:38, jim holtman <jholtman at gmail.com> a ?crit : > > Here is an example of how to do it: > > x <- 1:10 # integer values > xf <- seq(1.0, 2, by = 0.1) # floating point > > setwd("d:/temp") > > # create file to write to > output <- file('integer.bin', 'wb') > writeBin(x, output) # write integer > writeBin(xf, output) # write reals > close(output) > > > library(pack) > library(readr) > > # read all the data at once > allbin <- read_file_raw('integer.bin') > > # decode the data into a list > (result <- unpack("V V V V V V V V V V d d d d d d d d d d", allbin)) > > > > > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > > On Sat, Sep 17, 2016 at 11:04 AM, Ismail SEZEN <sezenismail at gmail.com <mailto:sezenismail at gmail.com>> wrote: > I noticed same issue but didnt care much :) > > On Sat, Sep 17, 2016, 18:01 jim holtman <jholtman at gmail.com <mailto:jholtman at gmail.com>> wrote: > Your example was not reproducible. Also how do you "break" out of the > "while" loop? > > > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > > On Sat, Sep 17, 2016 at 8:05 AM, Philippe de Rochambeau <phiroc at free.fr <mailto:phiroc at free.fr>> > wrote: > > > Hello, > > the following function, which stores numeric values extracted from a > > binary file, into an R matrix, is very slow, especially when the said file > > is several MB in size. > > Should I rewrite the function in inline C or in C/C++ using Rcpp? If the > > latter case is true, how do you ? readBin ? in Rcpp (I?m a total Rcpp > > newbie)? > > Many thanks. > > Best regards, > > phiroc > > > > > > ------------- > > > > # inputPath is something like http://myintranet/getData <http://myintranet/getData>? > > pathToFile=/usr/lib/xxx/yyy/data.bin <http://myintranet/getData <http://myintranet/getData>? > > pathToFile=/usr/lib/xxx/yyy/data.bin> > > > > PLTreader <- function(inputPath){ > > URL <- file(inputPath, "rb") > > PLT <- matrix(nrow=0, ncol=6) > > compteurDePrints = 0 > > compteurDeLignes <- 0 > > maxiPrints = 5 > > displayData <- FALSE > > while (TRUE) { > > periodIndex <- readBin(URL, integer(), size=4, n=1, > > endian="little") # int (4 bytes) > > eventId <- readBin(URL, integer(), size=4, n=1, > > endian="little") # int (4 bytes) > > dword1 <- readBin(URL, integer(), size=4, signed=FALSE, > > n=1, endian="little") # int > > dword2 <- readBin(URL, integer(), size=4, signed=FALSE, > > n=1, endian="little") # int > > if (dword1 < 0) { > > dword1 = dword1 + 2^32-1; > > } > > eventDate = (dword2*2^32 + dword1)/1000 > > repNum <- readBin(URL, integer(), size=2, n=1, > > endian="little") # short (2 bytes) > > exp <- readBin(URL, numeric(), size=4, n=1, > > endian="little") # float (4 bytes, strangely enough, would expect 8) > > loss <- readBin(URL, numeric(), size=4, n=1, > > endian="little") # float (4 bytes) > > PLT <- rbind(PLT, c(periodIndex, eventId, eventDate, > > repNum, exp, loss)) > > } # end while > > return(PLT) > > close(URL) > > } > > > > ---------------- > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help> > > PLEASE do read the posting guide http://www.R-project.org/ <http://www.r-project.org/> > > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]