Hello, the following function, which stores numeric values extracted from a binary file, into an R matrix, is very slow, especially when the said file is several MB in size. Should I rewrite the function in inline C or in C/C++ using Rcpp? If the latter case is true, how do you ? readBin ? in Rcpp (I?m a total Rcpp newbie)? Many thanks. Best regards, phiroc ------------- # inputPath is something like http://myintranet/getData?pathToFile=/usr/lib/xxx/yyy/data.bin <http://myintranet/getData?pathToFile=/usr/lib/xxx/yyy/data.bin> PLTreader <- function(inputPath){ URL <- file(inputPath, "rb") PLT <- matrix(nrow=0, ncol=6) compteurDePrints = 0 compteurDeLignes <- 0 maxiPrints = 5 displayData <- FALSE while (TRUE) { periodIndex <- readBin(URL, integer(), size=4, n=1, endian="little") # int (4 bytes) eventId <- readBin(URL, integer(), size=4, n=1, endian="little") # int (4 bytes) dword1 <- readBin(URL, integer(), size=4, signed=FALSE, n=1, endian="little") # int dword2 <- readBin(URL, integer(), size=4, signed=FALSE, n=1, endian="little") # int if (dword1 < 0) { dword1 = dword1 + 2^32-1; } eventDate = (dword2*2^32 + dword1)/1000 repNum <- readBin(URL, integer(), size=2, n=1, endian="little") # short (2 bytes) exp <- readBin(URL, numeric(), size=4, n=1, endian="little") # float (4 bytes, strangely enough, would expect 8) loss <- readBin(URL, numeric(), size=4, n=1, endian="little") # float (4 bytes) PLT <- rbind(PLT, c(periodIndex, eventId, eventDate, repNum, exp, loss)) } # end while return(PLT) close(URL) } ---------------- [[alternative HTML version deleted]]
I suspect that rbind is responsible. Use list and append instead of rbind. At the end, combine elements of list by do.call(?rbind?, list).> On 17 Sep 2016, at 15:05, Philippe de Rochambeau <phiroc at free.fr> wrote: > > Hello, > the following function, which stores numeric values extracted from a binary file, into an R matrix, is very slow, especially when the said file is several MB in size. > Should I rewrite the function in inline C or in C/C++ using Rcpp? If the latter case is true, how do you ? readBin ? in Rcpp (I?m a total Rcpp newbie)? > Many thanks. > Best regards, > phiroc > > > ------------- > > # inputPath is something like http://myintranet/getData?pathToFile=/usr/lib/xxx/yyy/data.bin <http://myintranet/getData?pathToFile=/usr/lib/xxx/yyy/data.bin> > > PLTreader <- function(inputPath){ > URL <- file(inputPath, "rb") > PLT <- matrix(nrow=0, ncol=6) > compteurDePrints = 0 > compteurDeLignes <- 0 > maxiPrints = 5 > displayData <- FALSE > while (TRUE) { > periodIndex <- readBin(URL, integer(), size=4, n=1, endian="little") # int (4 bytes) > eventId <- readBin(URL, integer(), size=4, n=1, endian="little") # int (4 bytes) > dword1 <- readBin(URL, integer(), size=4, signed=FALSE, n=1, endian="little") # int > dword2 <- readBin(URL, integer(), size=4, signed=FALSE, n=1, endian="little") # int > if (dword1 < 0) { > dword1 = dword1 + 2^32-1; > } > eventDate = (dword2*2^32 + dword1)/1000 > repNum <- readBin(URL, integer(), size=2, n=1, endian="little") # short (2 bytes) > exp <- readBin(URL, numeric(), size=4, n=1, endian="little") # float (4 bytes, strangely enough, would expect 8) > loss <- readBin(URL, numeric(), size=4, n=1, endian="little") # float (4 bytes) > PLT <- rbind(PLT, c(periodIndex, eventId, eventDate, repNum, exp, loss)) > } # end while > return(PLT) > close(URL) > } > > ---------------- > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Appending to lists is only very slightly more efficient than incremental rbinding. Ideally you can figure out an upper bound for number of records, preallocate a data frame of that size, modify each element as you go in-place, and shrink the data frame once at the end as needed. If you cannot do that, you can append fixed size data frames and follow the same strategy in chunks with a single do.call/rbind at the end. Note that reproducible examples including example data often yield working code, while incomplete examples tend to yield handwaving descriptions like the above. I will note that any code placed after a return function is useless. I highly recommend avoiding the return function like the plague... use the expression-at-the-end-of-the-function method of returning. -- Sent from my phone. Please excuse my brevity. On September 17, 2016 7:10:05 AM PDT, Ismail SEZEN <sezenismail at gmail.com> wrote:>I suspect that rbind is responsible. Use list and append instead of >rbind. At the end, combine elements of list by do.call(?rbind?, list). > >> On 17 Sep 2016, at 15:05, Philippe de Rochambeau <phiroc at free.fr> >wrote: >> >> Hello, >> the following function, which stores numeric values extracted from a >binary file, into an R matrix, is very slow, especially when the said >file is several MB in size. >> Should I rewrite the function in inline C or in C/C++ using Rcpp? If >the latter case is true, how do you ? readBin ? in Rcpp (I?m a total >Rcpp newbie)? >> Many thanks. >> Best regards, >> phiroc >> >> >> ------------- >> >> # inputPath is something like >http://myintranet/getData?pathToFile=/usr/lib/xxx/yyy/data.bin ><http://myintranet/getData?pathToFile=/usr/lib/xxx/yyy/data.bin> >> >> PLTreader <- function(inputPath){ >> URL <- file(inputPath, "rb") >> PLT <- matrix(nrow=0, ncol=6) >> compteurDePrints = 0 >> compteurDeLignes <- 0 >> maxiPrints = 5 >> displayData <- FALSE >> while (TRUE) { >> periodIndex <- readBin(URL, integer(), size=4, n=1, >endian="little") # int (4 bytes) >> eventId <- readBin(URL, integer(), size=4, n=1, endian="little") # >int (4 bytes) >> dword1 <- readBin(URL, integer(), size=4, signed=FALSE, n=1, >endian="little") # int >> dword2 <- readBin(URL, integer(), size=4, signed=FALSE, n=1, >endian="little") # int >> if (dword1 < 0) { >> dword1 = dword1 + 2^32-1; >> } >> eventDate = (dword2*2^32 + dword1)/1000 >> repNum <- readBin(URL, integer(), size=2, n=1, endian="little") # >short (2 bytes) >> exp <- readBin(URL, numeric(), size=4, n=1, endian="little") # >float (4 bytes, strangely enough, would expect 8) >> loss <- readBin(URL, numeric(), size=4, n=1, endian="little") # >float (4 bytes) >> PLT <- rbind(PLT, c(periodIndex, eventId, eventDate, repNum, exp, >loss)) >> } # end while >> return(PLT) >> close(URL) >> } >> >> ---------------- >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
Your example was not reproducible. Also how do you "break" out of the "while" loop? Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Sat, Sep 17, 2016 at 8:05 AM, Philippe de Rochambeau <phiroc at free.fr> wrote:> Hello, > the following function, which stores numeric values extracted from a > binary file, into an R matrix, is very slow, especially when the said file > is several MB in size. > Should I rewrite the function in inline C or in C/C++ using Rcpp? If the > latter case is true, how do you ? readBin ? in Rcpp (I?m a total Rcpp > newbie)? > Many thanks. > Best regards, > phiroc > > > ------------- > > # inputPath is something like http://myintranet/getData? > pathToFile=/usr/lib/xxx/yyy/data.bin <http://myintranet/getData? > pathToFile=/usr/lib/xxx/yyy/data.bin> > > PLTreader <- function(inputPath){ > URL <- file(inputPath, "rb") > PLT <- matrix(nrow=0, ncol=6) > compteurDePrints = 0 > compteurDeLignes <- 0 > maxiPrints = 5 > displayData <- FALSE > while (TRUE) { > periodIndex <- readBin(URL, integer(), size=4, n=1, > endian="little") # int (4 bytes) > eventId <- readBin(URL, integer(), size=4, n=1, > endian="little") # int (4 bytes) > dword1 <- readBin(URL, integer(), size=4, signed=FALSE, > n=1, endian="little") # int > dword2 <- readBin(URL, integer(), size=4, signed=FALSE, > n=1, endian="little") # int > if (dword1 < 0) { > dword1 = dword1 + 2^32-1; > } > eventDate = (dword2*2^32 + dword1)/1000 > repNum <- readBin(URL, integer(), size=2, n=1, > endian="little") # short (2 bytes) > exp <- readBin(URL, numeric(), size=4, n=1, > endian="little") # float (4 bytes, strangely enough, would expect 8) > loss <- readBin(URL, numeric(), size=4, n=1, > endian="little") # float (4 bytes) > PLT <- rbind(PLT, c(periodIndex, eventId, eventDate, > repNum, exp, loss)) > } # end while > return(PLT) > close(URL) > } > > ---------------- > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
I noticed same issue but didnt care much :) On Sat, Sep 17, 2016, 18:01 jim holtman <jholtman at gmail.com> wrote:> Your example was not reproducible. Also how do you "break" out of the > "while" loop? > > > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > > On Sat, Sep 17, 2016 at 8:05 AM, Philippe de Rochambeau <phiroc at free.fr> > wrote: > > > Hello, > > the following function, which stores numeric values extracted from a > > binary file, into an R matrix, is very slow, especially when the said > file > > is several MB in size. > > Should I rewrite the function in inline C or in C/C++ using Rcpp? If the > > latter case is true, how do you ? readBin ? in Rcpp (I?m a total Rcpp > > newbie)? > > Many thanks. > > Best regards, > > phiroc > > > > > > ------------- > > > > # inputPath is something like http://myintranet/getData? > > pathToFile=/usr/lib/xxx/yyy/data.bin <http://myintranet/getData? > > pathToFile=/usr/lib/xxx/yyy/data.bin> > > > > PLTreader <- function(inputPath){ > > URL <- file(inputPath, "rb") > > PLT <- matrix(nrow=0, ncol=6) > > compteurDePrints = 0 > > compteurDeLignes <- 0 > > maxiPrints = 5 > > displayData <- FALSE > > while (TRUE) { > > periodIndex <- readBin(URL, integer(), size=4, n=1, > > endian="little") # int (4 bytes) > > eventId <- readBin(URL, integer(), size=4, n=1, > > endian="little") # int (4 bytes) > > dword1 <- readBin(URL, integer(), size=4, signed=FALSE, > > n=1, endian="little") # int > > dword2 <- readBin(URL, integer(), size=4, signed=FALSE, > > n=1, endian="little") # int > > if (dword1 < 0) { > > dword1 = dword1 + 2^32-1; > > } > > eventDate = (dword2*2^32 + dword1)/1000 > > repNum <- readBin(URL, integer(), size=2, n=1, > > endian="little") # short (2 bytes) > > exp <- readBin(URL, numeric(), size=4, n=1, > > endian="little") # float (4 bytes, strangely enough, would expect 8) > > loss <- readBin(URL, numeric(), size=4, n=1, > > endian="little") # float (4 bytes) > > PLT <- rbind(PLT, c(periodIndex, eventId, eventDate, > > repNum, exp, loss)) > > } # end while > > return(PLT) > > close(URL) > > } > > > > ---------------- > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]