On Sun, 18 Sep 2016, 19:04 Philippe de Rochambeau <phiroc at free.fr> wrote:> Please find below code that attempts to read ints, longs and floats from a > binary file (which is a simplification of my original program). > Please disregard the R inefficiencies, such as using rbind, for now. > I?ve also included Java code to generate the binary file. > The output shows that, at one point, anInt becomes undefined. > Unfortunately, I couldn?t find the correct R function to determine whether > inInt is undefined or not, as is.null, is.nan, and is.infinite don?t work. > Any help would be much appreciated. > Many thanks in advance. > Philippe > > ??????? > [1] "anInt = 1" > [1] "is.null FALSE" > [1] "is.nan FALSE" > [1] "is.infinite FALSE" > [1] "aLong = 2" > [1] "aFloat = 3.44440007209778" > [1] "--------------------------" > [1] "anInt = 2" > [1] "is.null FALSE" > [1] "is.nan FALSE" > [1] "is.infinite FALSE" > [1] "aLong = 22" > [1] "aFloat = 13.4644002914429" > [1] "--------------------------" > [1] "anInt = 3" > [1] "is.null FALSE" > [1] "is.nan FALSE" > [1] "is.infinite FALSE" > [1] "aLong = 55" > [1] "aFloat = 45.4444007873535" > [1] "--------------------------" > [1] "anInt = " > [1] "is.null FALSE" > [1] "is.nan " > [1] "is.infinite " > [1] "aLong = " > [1] "aFloat = " > [1] "--------------------------" > [,1] [,2] [,3] > [1,] 1 2 3.4444 > [2,] 2 22 13.4644 > [3,] 3 55 45.4444 > [4,] Integer,0 Integer,0 Numeric,0 > > > > ----------- > > > ????????????????????? > > readFile <- function(inputPath) { > URL <- file(inputPath, "rb") > PLT <- matrix(nrow=0, ncol=3) > counte <- 0 > max <- 4 > while (counte < max) { > anInt <- readBin(con=URL, what=integer(), size=4, n=1, endian="big") > print(paste("anInt =", anInt)) > #if (! (anInt == 0)) { print(paste("empty int")); break } > print(paste("is.null ", is.null(anInt))) > print(paste("is.nan ", is.nan(anInt))) > print(paste("is.infinite ", is.infinite(anInt))) > aLong <- readBin(URL, integer(), size=8, n=1, endian="big") > print(paste("aLong =", aLong)) > aFloat <- readBin(URL, numeric(), size=4, n=1, endian="big") > print(paste("aFloat =", aFloat)) > print("--------------------------") > PLT <- rbind(PLT, list(anInt, aLong, aFloat)) > counte <- counte + 1 > } # end while > close(URL) > PLT > } > fichier <- "/Users/philippe/Desktop/datatests/data0.bin" > PLT2 <- readFile(fichier) > print(PLT2) > ????????????????????? > > import java.io.*; > > public class Main { > > Main() { > writeData(); > } > > public static void main(String[] args) { > new Main(); > } > > public void writeData() { > > final String path > "/Users/philippe/Desktop/datatests/data0.bin"; > > DataOutputStream dos; > try { > dos = new DataOutputStream(new > BufferedOutputStream(new FileOutputStream(path))); > // big endian write! ("high byte first") , see > https://docs.oracle.com/javase/7/docs/api/java/io/DataOutputStream.html > dos.writeInt(1); > dos.writeLong(2L); > dos.writeFloat(3.4444F); > > dos.writeInt(2); > dos.writeLong(22L); > dos.writeFloat(13.4644F); > > dos.writeInt(3); > dos.writeLong(55L); > dos.writeFloat(45.4444F); > > dos.close(); > } catch (FileNotFoundException e) { > e.printStackTrace(); > } catch (IOException ioe) { > ioe.printStackTrace(); > } > > } > > } > > > ????????????????????? > > > > > > > > Le 17 sept. 2016 ? 20:45, Philippe de Rochambeau <phiroc at free.fr> a > ?crit : > > > > Hi Jim, > > this is exactly the answer I was look for. Many thanks. I didn?t R had a > pack function, as in PERL. > > To answer your earlier question, I am trying to update legacy code to > read a binary file with unknown size, over a network, slice up it into rows > each containing an integer, an integer, a long, a short, a float and a > float, and stuff the rows into a matrix. >It's possible to read all rows fast as raw(), then parse in a vectorised way with matrix indexing to group the bytes appropriately. There is an example on the mailing list somewhere, but otherwise I can show an example if that's of interest. Cheers, Mike> Best regards, > > Philippe > > > >> Le 17 sept. 2016 ? 20:38, jim holtman <jholtman at gmail.com <mailto: > jholtman at gmail.com>> a ?crit : > >> > >> Here is an example of how to do it: > >> > >> x <- 1:10 # integer values > >> xf <- seq(1.0, 2, by = 0.1) # floating point > >> > >> setwd("d:/temp") > >> > >> # create file to write to > >> output <- file('integer.bin', 'wb') > >> writeBin(x, output) # write integer > >> writeBin(xf, output) # write reals > >> close(output) > >> > >> > >> library(pack) > >> library(readr) > >> > >> # read all the data at once > >> allbin <- read_file_raw('integer.bin') > >> > >> # decode the data into a list > >> (result <- unpack("V V V V V V V V V V d d d d d d d d d d", allbin)) > >> > >> > >> > >> > >> Jim Holtman > >> Data Munger Guru > >> > >> What is the problem that you are trying to solve? > >> Tell me what you want to do, not how you want to do it. > >> > >> On Sat, Sep 17, 2016 at 11:04 AM, Ismail SEZEN <sezenismail at gmail.com > <mailto:sezenismail at gmail.com><mailto:sezenismail at gmail.com <mailto: > sezenismail at gmail.com>>> wrote: > >> I noticed same issue but didnt care much :) > >> > >> On Sat, Sep 17, 2016, 18:01 jim holtman <jholtman at gmail.com <mailto: > jholtman at gmail.com> <mailto:jholtman at gmail.com <mailto:jholtman at gmail.com>>> > wrote: > >> Your example was not reproducible. Also how do you "break" out of the > >> "while" loop? > >> > >> > >> Jim Holtman > >> Data Munger Guru > >> > >> What is the problem that you are trying to solve? > >> Tell me what you want to do, not how you want to do it. > >> > >> On Sat, Sep 17, 2016 at 8:05 AM, Philippe de Rochambeau <phiroc at free.fr > <mailto:phiroc at free.fr> <mailto:phiroc at free.fr <mailto:phiroc at free.fr>>> > >> wrote: > >> > >>> Hello, > >>> the following function, which stores numeric values extracted from a > >>> binary file, into an R matrix, is very slow, especially when the said > file > >>> is several MB in size. > >>> Should I rewrite the function in inline C or in C/C++ using Rcpp? If > the > >>> latter case is true, how do you ? readBin ? in Rcpp (I?m a total Rcpp > >>> newbie)? > >>> Many thanks. > >>> Best regards, > >>> phiroc > >>> > >>> > >>> ------------- > >>> > >>> # inputPath is something like http://myintranet/getData < > http://myintranet/getData><http://myintranet/getData < > http://myintranet/getData>>? > >>> pathToFile=/usr/lib/xxx/yyy/data.bin <http://myintranet/getData < > http://myintranet/getData> <http://myintranet/getData < > http://myintranet/getData>>? > >>> pathToFile=/usr/lib/xxx/yyy/data.bin> > >>> > >>> PLTreader <- function(inputPath){ > >>> URL <- file(inputPath, "rb") > >>> PLT <- matrix(nrow=0, ncol=6) > >>> compteurDePrints = 0 > >>> compteurDeLignes <- 0 > >>> maxiPrints = 5 > >>> displayData <- FALSE > >>> while (TRUE) { > >>> periodIndex <- readBin(URL, integer(), size=4, n=1, > >>> endian="little") # int (4 bytes) > >>> eventId <- readBin(URL, integer(), size=4, n=1, > >>> endian="little") # int (4 bytes) > >>> dword1 <- readBin(URL, integer(), size=4, signed=FALSE, > >>> n=1, endian="little") # int > >>> dword2 <- readBin(URL, integer(), size=4, signed=FALSE, > >>> n=1, endian="little") # int > >>> if (dword1 < 0) { > >>> dword1 = dword1 + 2^32-1; > >>> } > >>> eventDate = (dword2*2^32 + dword1)/1000 > >>> repNum <- readBin(URL, integer(), size=2, n=1, > >>> endian="little") # short (2 bytes) > >>> exp <- readBin(URL, numeric(), size=4, n=1, > >>> endian="little") # float (4 bytes, strangely enough, would expect 8) > >>> loss <- readBin(URL, numeric(), size=4, n=1, > >>> endian="little") # float (4 bytes) > >>> PLT <- rbind(PLT, c(periodIndex, eventId, eventDate, > >>> repNum, exp, loss)) > >>> } # end while > >>> return(PLT) > >>> close(URL) > >>> } > >>> > >>> ---------------- > >>> [[alternative HTML version deleted]] > >>> > >>> ______________________________________________ > >>> R-help at r-project.org <mailto:R-help at r-project.org> <mailto: > R-help at r-project.org <mailto:R-help at r-project.org>> mailing list -- To > UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help < > https://stat.ethz.ch/mailman/listinfo/r-help>< > https://stat.ethz.ch/mailman/listinfo/r-help < > https://stat.ethz.ch/mailman/listinfo/r-help>> > >>> PLEASE do read the posting guide http://www.R-project.org/ < > http://www.r-project.org/> <http://www.r-project.org/ < > http://www.r-project.org/>> > >>> posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >> > >> [[alternative HTML version deleted]] > >> > >> ______________________________________________ > >> R-help at r-project.org <mailto:R-help at r-project.org> <mailto: > R-help at r-project.org <mailto:R-help at r-project.org>> mailing list -- To > UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help < > https://stat.ethz.ch/mailman/listinfo/r-help>< > https://stat.ethz.ch/mailman/listinfo/r-help < > https://stat.ethz.ch/mailman/listinfo/r-help>> > >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html < > http://www.r-project.org/posting-guide.html> < > http://www.r-project.org/posting-guide.html < > http://www.r-project.org/posting-guide.html>> > >> and provide commented, minimal, self-contained, reproducible code. > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- To > UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help < > https://stat.ethz.ch/mailman/listinfo/r-help> > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html < > http://www.r-project.org/posting-guide.html> > > and provide commented, minimal, self-contained, reproducible code. > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Dr. Michael Sumner Software and Database Engineer Australian Antarctic Division 203 Channel Highway Kingston Tasmania 7050 Australia [[alternative HTML version deleted]]
I would gladly examine your example, Mike. Cheers, Philippe> Le 18 sept. 2016 ? 16:05, Michael Sumner <mdsumner at gmail.com> a ?crit : > > > >> On Sun, 18 Sep 2016, 19:04 Philippe de Rochambeau <phiroc at free.fr> wrote: >> Please find below code that attempts to read ints, longs and floats from a binary file (which is a simplification of my original program). >> Please disregard the R inefficiencies, such as using rbind, for now. >> I?ve also included Java code to generate the binary file. >> The output shows that, at one point, anInt becomes undefined. Unfortunately, I couldn?t find the correct R function to determine whether inInt is undefined or not, as is.null, is.nan, and is.infinite don?t work. >> Any help would be much appreciated. >> Many thanks in advance. >> Philippe >> >> ??????? >> [1] "anInt = 1" >> [1] "is.null FALSE" >> [1] "is.nan FALSE" >> [1] "is.infinite FALSE" >> [1] "aLong = 2" >> [1] "aFloat = 3.44440007209778" >> [1] "--------------------------" >> [1] "anInt = 2" >> [1] "is.null FALSE" >> [1] "is.nan FALSE" >> [1] "is.infinite FALSE" >> [1] "aLong = 22" >> [1] "aFloat = 13.4644002914429" >> [1] "--------------------------" >> [1] "anInt = 3" >> [1] "is.null FALSE" >> [1] "is.nan FALSE" >> [1] "is.infinite FALSE" >> [1] "aLong = 55" >> [1] "aFloat = 45.4444007873535" >> [1] "--------------------------" >> [1] "anInt = " >> [1] "is.null FALSE" >> [1] "is.nan " >> [1] "is.infinite " >> [1] "aLong = " >> [1] "aFloat = " >> [1] "--------------------------" >> [,1] [,2] [,3] >> [1,] 1 2 3.4444 >> [2,] 2 22 13.4644 >> [3,] 3 55 45.4444 >> [4,] Integer,0 Integer,0 Numeric,0 >> > >> >> ----------- >> >> >> ????????????????????? >> >> readFile <- function(inputPath) { >> URL <- file(inputPath, "rb") >> PLT <- matrix(nrow=0, ncol=3) >> counte <- 0 >> max <- 4 >> while (counte < max) { >> anInt <- readBin(con=URL, what=integer(), size=4, n=1, endian="big") >> print(paste("anInt =", anInt)) >> #if (! (anInt == 0)) { print(paste("empty int")); break } >> print(paste("is.null ", is.null(anInt))) >> print(paste("is.nan ", is.nan(anInt))) >> print(paste("is.infinite ", is.infinite(anInt))) >> aLong <- readBin(URL, integer(), size=8, n=1, endian="big") >> print(paste("aLong =", aLong)) >> aFloat <- readBin(URL, numeric(), size=4, n=1, endian="big") >> print(paste("aFloat =", aFloat)) >> print("--------------------------") >> PLT <- rbind(PLT, list(anInt, aLong, aFloat)) >> counte <- counte + 1 >> } # end while >> close(URL) >> PLT >> } >> fichier <- "/Users/philippe/Desktop/datatests/data0.bin" >> PLT2 <- readFile(fichier) >> print(PLT2) >> ????????????????????? >> >> import java.io.*; >> >> public class Main { >> >> Main() { >> writeData(); >> } >> >> public static void main(String[] args) { >> new Main(); >> } >> >> public void writeData() { >> >> final String path = "/Users/philippe/Desktop/datatests/data0.bin"; >> >> DataOutputStream dos; >> try { >> dos = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(path))); >> // big endian write! ("high byte first") , see https://docs.oracle.com/javase/7/docs/api/java/io/DataOutputStream.html >> dos.writeInt(1); >> dos.writeLong(2L); >> dos.writeFloat(3.4444F); >> >> dos.writeInt(2); >> dos.writeLong(22L); >> dos.writeFloat(13.4644F); >> >> dos.writeInt(3); >> dos.writeLong(55L); >> dos.writeFloat(45.4444F); >> >> dos.close(); >> } catch (FileNotFoundException e) { >> e.printStackTrace(); >> } catch (IOException ioe) { >> ioe.printStackTrace(); >> } >> >> } >> >> } >> >> >> ????????????????????? >> >> >> >> >> >> >> > Le 17 sept. 2016 ? 20:45, Philippe de Rochambeau <phiroc at free.fr> a ?crit : >> > >> > Hi Jim, >> > this is exactly the answer I was look for. Many thanks. I didn?t R had a pack function, as in PERL. >> > To answer your earlier question, I am trying to update legacy code to read a binary file with unknown size, over a network, slice up it into rows each containing an integer, an integer, a long, a short, a float and a float, and stuff the rows into a matrix. > > > > It's possible to read all rows fast as raw(), then parse in a vectorised way with matrix indexing to group the bytes appropriately. There is an example on the mailing list somewhere, but otherwise I can show an example if that's of interest. > > > Cheers, Mike > > >> > Best regards, >> > Philippe >> > >> >> Le 17 sept. 2016 ? 20:38, jim holtman <jholtman at gmail.com <mailto:jholtman at gmail.com>> a ?crit : >> >> >> >> Here is an example of how to do it: >> >> >> >> x <- 1:10 # integer values >> >> xf <- seq(1.0, 2, by = 0.1) # floating point >> >> >> >> setwd("d:/temp") >> >> >> >> # create file to write to >> >> output <- file('integer.bin', 'wb') >> >> writeBin(x, output) # write integer >> >> writeBin(xf, output) # write reals >> >> close(output) >> >> >> >> >> >> library(pack) >> >> library(readr) >> >> >> >> # read all the data at once >> >> allbin <- read_file_raw('integer.bin') >> >> >> >> # decode the data into a list >> >> (result <- unpack("V V V V V V V V V V d d d d d d d d d d", allbin)) >> >> >> >> >> >> >> >> >> >> Jim Holtman >> >> Data Munger Guru >> >> >> >> What is the problem that you are trying to solve? >> >> Tell me what you want to do, not how you want to do it. >> >> >> >> On Sat, Sep 17, 2016 at 11:04 AM, Ismail SEZEN <sezenismail at gmail.com <mailto:sezenismail at gmail.com><mailto:sezenismail at gmail.com <mailto:sezenismail at gmail.com>>> wrote: >> >> I noticed same issue but didnt care much :) >> >> >> >> On Sat, Sep 17, 2016, 18:01 jim holtman <jholtman at gmail.com <mailto:jholtman at gmail.com> <mailto:jholtman at gmail.com <mailto:jholtman at gmail.com>>> wrote: >> >> Your example was not reproducible. Also how do you "break" out of the >> >> "while" loop? >> >> >> >> >> >> Jim Holtman >> >> Data Munger Guru >> >> >> >> What is the problem that you are trying to solve? >> >> Tell me what you want to do, not how you want to do it. >> >> >> >> On Sat, Sep 17, 2016 at 8:05 AM, Philippe de Rochambeau <phiroc at free.fr <mailto:phiroc at free.fr> <mailto:phiroc at free.fr <mailto:phiroc at free.fr>>> >> >> wrote: >> >> >> >>> Hello, >> >>> the following function, which stores numeric values extracted from a >> >>> binary file, into an R matrix, is very slow, especially when the said file >> >>> is several MB in size. >> >>> Should I rewrite the function in inline C or in C/C++ using Rcpp? If the >> >>> latter case is true, how do you ? readBin ? in Rcpp (I?m a total Rcpp >> >>> newbie)? >> >>> Many thanks. >> >>> Best regards, >> >>> phiroc >> >>> >> >>> >> >>> ------------- >> >>> >> >>> # inputPath is something like http://myintranet/getData <http://myintranet/getData><http://myintranet/getData <http://myintranet/getData>>? >> >>> pathToFile=/usr/lib/xxx/yyy/data.bin <http://myintranet/getData <http://myintranet/getData> <http://myintranet/getData <http://myintranet/getData>>? >> >>> pathToFile=/usr/lib/xxx/yyy/data.bin> >> >>> >> >>> PLTreader <- function(inputPath){ >> >>> URL <- file(inputPath, "rb") >> >>> PLT <- matrix(nrow=0, ncol=6) >> >>> compteurDePrints = 0 >> >>> compteurDeLignes <- 0 >> >>> maxiPrints = 5 >> >>> displayData <- FALSE >> >>> while (TRUE) { >> >>> periodIndex <- readBin(URL, integer(), size=4, n=1, >> >>> endian="little") # int (4 bytes) >> >>> eventId <- readBin(URL, integer(), size=4, n=1, >> >>> endian="little") # int (4 bytes) >> >>> dword1 <- readBin(URL, integer(), size=4, signed=FALSE, >> >>> n=1, endian="little") # int >> >>> dword2 <- readBin(URL, integer(), size=4, signed=FALSE, >> >>> n=1, endian="little") # int >> >>> if (dword1 < 0) { >> >>> dword1 = dword1 + 2^32-1; >> >>> } >> >>> eventDate = (dword2*2^32 + dword1)/1000 >> >>> repNum <- readBin(URL, integer(), size=2, n=1, >> >>> endian="little") # short (2 bytes) >> >>> exp <- readBin(URL, numeric(), size=4, n=1, >> >>> endian="little") # float (4 bytes, strangely enough, would expect 8) >> >>> loss <- readBin(URL, numeric(), size=4, n=1, >> >>> endian="little") # float (4 bytes) >> >>> PLT <- rbind(PLT, c(periodIndex, eventId, eventDate, >> >>> repNum, exp, loss)) >> >>> } # end while >> >>> return(PLT) >> >>> close(URL) >> >>> } >> >>> >> >>> ---------------- >> >>> [[alternative HTML version deleted]] >> >>> >> >>> ______________________________________________ >> >>> R-help at r-project.org <mailto:R-help at r-project.org> <mailto:R-help at r-project.org <mailto:R-help at r-project.org>> mailing list -- To UNSUBSCRIBE and more, see >> >>> https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help><https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help>> >> >>> PLEASE do read the posting guide http://www.R-project.org/ <http://www.r-project.org/> <http://www.r-project.org/ <http://www.r-project.org/>> >> >>> posting-guide.html >> >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> ______________________________________________ >> >> R-help at r-project.org <mailto:R-help at r-project.org> <mailto:R-help at r-project.org <mailto:R-help at r-project.org>> mailing list -- To UNSUBSCRIBE and more, see >> >> https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help><https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help>> >> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html> <http://www.r-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html>> >> >> and provide commented, minimal, self-contained, reproducible code. >> > >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help> >> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html> >> > and provide commented, minimal, self-contained, reproducible code. >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- > Dr. Michael Sumner > Software and Database Engineer > Australian Antarctic Division > 203 Channel Highway > Kingston Tasmania 7050 Australia >[[alternative HTML version deleted]]
I second Mike's proposal - it works, e.g. https://github.com/HenrikBengtsson/affxparser/blob/5bf1a9162904c56d59c4735a8d7eb427e4f085e4/R/readCcg.R#L535-L583 Here's an outline. Say each row consists of tuple (iiii=4-byte integer, ffff=4-byte float, ss=2 byte integer) so that the byte-by-byte content of your file look like this: iiiiffffss iiiiffffss iiiiffffss ... iiiiffffss Then read this is as raw bytes (file_size can also be a very large number in case it's unknown): raw <- readBin(con, what="raw", n=file_size) Turn into a (4+4+2)-by-K raw matrix: raw <- matrix(raw, nrow=4+4+2) so that your raw bytes has the following layout: iii ... i iii ... i iii ... i iii ... i fff ... f fff ... f fff ... f fff ... f sss ... s sss ... s Then extract the three submatrices of interest: iiii <- raw[1:4,] ffff <- raw[5:8,] ss <- raw[9:10,] Here you can discard raw, i.e. rm(list="raw"). Since R stores matrices in a column-by-column order internally, your bytes are already in the proper order. Finally, re-read these with appropriate readBin() settings, e.g. i <- readBin(iiii, what="integer", size=4L) f <- readBin(ffff, what="double", size=4L) s <- readBin(ss, what="integer", size=2L) Put into a 3-by-K data.frame: data <- data.frame(i=i, f=f, s=s) /Henrik On Sun, Sep 18, 2016 at 8:02 AM, Philippe de Rochambeau <phiroc at free.fr> wrote:> I would gladly examine your example, Mike. > Cheers, > Philippe > >> Le 18 sept. 2016 ? 16:05, Michael Sumner <mdsumner at gmail.com> a ?crit : >> >> >> >>> On Sun, 18 Sep 2016, 19:04 Philippe de Rochambeau <phiroc at free.fr> wrote: >>> Please find below code that attempts to read ints, longs and floats from a binary file (which is a simplification of my original program). >>> Please disregard the R inefficiencies, such as using rbind, for now. >>> I?ve also included Java code to generate the binary file. >>> The output shows that, at one point, anInt becomes undefined. Unfortunately, I couldn?t find the correct R function to determine whether inInt is undefined or not, as is.null, is.nan, and is.infinite don?t work. >>> Any help would be much appreciated. >>> Many thanks in advance. >>> Philippe >>> >>> ??????? >>> [1] "anInt = 1" >>> [1] "is.null FALSE" >>> [1] "is.nan FALSE" >>> [1] "is.infinite FALSE" >>> [1] "aLong = 2" >>> [1] "aFloat = 3.44440007209778" >>> [1] "--------------------------" >>> [1] "anInt = 2" >>> [1] "is.null FALSE" >>> [1] "is.nan FALSE" >>> [1] "is.infinite FALSE" >>> [1] "aLong = 22" >>> [1] "aFloat = 13.4644002914429" >>> [1] "--------------------------" >>> [1] "anInt = 3" >>> [1] "is.null FALSE" >>> [1] "is.nan FALSE" >>> [1] "is.infinite FALSE" >>> [1] "aLong = 55" >>> [1] "aFloat = 45.4444007873535" >>> [1] "--------------------------" >>> [1] "anInt = " >>> [1] "is.null FALSE" >>> [1] "is.nan " >>> [1] "is.infinite " >>> [1] "aLong = " >>> [1] "aFloat = " >>> [1] "--------------------------" >>> [,1] [,2] [,3] >>> [1,] 1 2 3.4444 >>> [2,] 2 22 13.4644 >>> [3,] 3 55 45.4444 >>> [4,] Integer,0 Integer,0 Numeric,0 >>> > >>> >>> ----------- >>> >>> >>> ????????????????????? >>> >>> readFile <- function(inputPath) { >>> URL <- file(inputPath, "rb") >>> PLT <- matrix(nrow=0, ncol=3) >>> counte <- 0 >>> max <- 4 >>> while (counte < max) { >>> anInt <- readBin(con=URL, what=integer(), size=4, n=1, endian="big") >>> print(paste("anInt =", anInt)) >>> #if (! (anInt == 0)) { print(paste("empty int")); break } >>> print(paste("is.null ", is.null(anInt))) >>> print(paste("is.nan ", is.nan(anInt))) >>> print(paste("is.infinite ", is.infinite(anInt))) >>> aLong <- readBin(URL, integer(), size=8, n=1, endian="big") >>> print(paste("aLong =", aLong)) >>> aFloat <- readBin(URL, numeric(), size=4, n=1, endian="big") >>> print(paste("aFloat =", aFloat)) >>> print("--------------------------") >>> PLT <- rbind(PLT, list(anInt, aLong, aFloat)) >>> counte <- counte + 1 >>> } # end while >>> close(URL) >>> PLT >>> } >>> fichier <- "/Users/philippe/Desktop/datatests/data0.bin" >>> PLT2 <- readFile(fichier) >>> print(PLT2) >>> ????????????????????? >>> >>> import java.io.*; >>> >>> public class Main { >>> >>> Main() { >>> writeData(); >>> } >>> >>> public static void main(String[] args) { >>> new Main(); >>> } >>> >>> public void writeData() { >>> >>> final String path = "/Users/philippe/Desktop/datatests/data0.bin"; >>> >>> DataOutputStream dos; >>> try { >>> dos = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(path))); >>> // big endian write! ("high byte first") , see https://docs.oracle.com/javase/7/docs/api/java/io/DataOutputStream.html >>> dos.writeInt(1); >>> dos.writeLong(2L); >>> dos.writeFloat(3.4444F); >>> >>> dos.writeInt(2); >>> dos.writeLong(22L); >>> dos.writeFloat(13.4644F); >>> >>> dos.writeInt(3); >>> dos.writeLong(55L); >>> dos.writeFloat(45.4444F); >>> >>> dos.close(); >>> } catch (FileNotFoundException e) { >>> e.printStackTrace(); >>> } catch (IOException ioe) { >>> ioe.printStackTrace(); >>> } >>> >>> } >>> >>> } >>> >>> >>> ????????????????????? >>> >>> >>> >>> >>> >>> >>> > Le 17 sept. 2016 ? 20:45, Philippe de Rochambeau <phiroc at free.fr> a ?crit : >>> > >>> > Hi Jim, >>> > this is exactly the answer I was look for. Many thanks. I didn?t R had a pack function, as in PERL. >>> > To answer your earlier question, I am trying to update legacy code to read a binary file with unknown size, over a network, slice up it into rows each containing an integer, an integer, a long, a short, a float and a float, and stuff the rows into a matrix. >> >> >> >> It's possible to read all rows fast as raw(), then parse in a vectorised way with matrix indexing to group the bytes appropriately. There is an example on the mailing list somewhere, but otherwise I can show an example if that's of interest. >> >> >> Cheers, Mike >> >> >>> > Best regards, >>> > Philippe >>> > >>> >> Le 17 sept. 2016 ? 20:38, jim holtman <jholtman at gmail.com <mailto:jholtman at gmail.com>> a ?crit : >>> >> >>> >> Here is an example of how to do it: >>> >> >>> >> x <- 1:10 # integer values >>> >> xf <- seq(1.0, 2, by = 0.1) # floating point >>> >> >>> >> setwd("d:/temp") >>> >> >>> >> # create file to write to >>> >> output <- file('integer.bin', 'wb') >>> >> writeBin(x, output) # write integer >>> >> writeBin(xf, output) # write reals >>> >> close(output) >>> >> >>> >> >>> >> library(pack) >>> >> library(readr) >>> >> >>> >> # read all the data at once >>> >> allbin <- read_file_raw('integer.bin') >>> >> >>> >> # decode the data into a list >>> >> (result <- unpack("V V V V V V V V V V d d d d d d d d d d", allbin)) >>> >> >>> >> >>> >> >>> >> >>> >> Jim Holtman >>> >> Data Munger Guru >>> >> >>> >> What is the problem that you are trying to solve? >>> >> Tell me what you want to do, not how you want to do it. >>> >> >>> >> On Sat, Sep 17, 2016 at 11:04 AM, Ismail SEZEN <sezenismail at gmail.com <mailto:sezenismail at gmail.com><mailto:sezenismail at gmail.com <mailto:sezenismail at gmail.com>>> wrote: >>> >> I noticed same issue but didnt care much :) >>> >> >>> >> On Sat, Sep 17, 2016, 18:01 jim holtman <jholtman at gmail.com <mailto:jholtman at gmail.com> <mailto:jholtman at gmail.com <mailto:jholtman at gmail.com>>> wrote: >>> >> Your example was not reproducible. Also how do you "break" out of the >>> >> "while" loop? >>> >> >>> >> >>> >> Jim Holtman >>> >> Data Munger Guru >>> >> >>> >> What is the problem that you are trying to solve? >>> >> Tell me what you want to do, not how you want to do it. >>> >> >>> >> On Sat, Sep 17, 2016 at 8:05 AM, Philippe de Rochambeau <phiroc at free.fr <mailto:phiroc at free.fr> <mailto:phiroc at free.fr <mailto:phiroc at free.fr>>> >>> >> wrote: >>> >> >>> >>> Hello, >>> >>> the following function, which stores numeric values extracted from a >>> >>> binary file, into an R matrix, is very slow, especially when the said file >>> >>> is several MB in size. >>> >>> Should I rewrite the function in inline C or in C/C++ using Rcpp? If the >>> >>> latter case is true, how do you ? readBin ? in Rcpp (I?m a total Rcpp >>> >>> newbie)? >>> >>> Many thanks. >>> >>> Best regards, >>> >>> phiroc >>> >>> >>> >>> >>> >>> ------------- >>> >>> >>> >>> # inputPath is something like http://myintranet/getData <http://myintranet/getData><http://myintranet/getData <http://myintranet/getData>>? >>> >>> pathToFile=/usr/lib/xxx/yyy/data.bin <http://myintranet/getData <http://myintranet/getData> <http://myintranet/getData <http://myintranet/getData>>? >>> >>> pathToFile=/usr/lib/xxx/yyy/data.bin> >>> >>> >>> >>> PLTreader <- function(inputPath){ >>> >>> URL <- file(inputPath, "rb") >>> >>> PLT <- matrix(nrow=0, ncol=6) >>> >>> compteurDePrints = 0 >>> >>> compteurDeLignes <- 0 >>> >>> maxiPrints = 5 >>> >>> displayData <- FALSE >>> >>> while (TRUE) { >>> >>> periodIndex <- readBin(URL, integer(), size=4, n=1, >>> >>> endian="little") # int (4 bytes) >>> >>> eventId <- readBin(URL, integer(), size=4, n=1, >>> >>> endian="little") # int (4 bytes) >>> >>> dword1 <- readBin(URL, integer(), size=4, signed=FALSE, >>> >>> n=1, endian="little") # int >>> >>> dword2 <- readBin(URL, integer(), size=4, signed=FALSE, >>> >>> n=1, endian="little") # int >>> >>> if (dword1 < 0) { >>> >>> dword1 = dword1 + 2^32-1; >>> >>> } >>> >>> eventDate = (dword2*2^32 + dword1)/1000 >>> >>> repNum <- readBin(URL, integer(), size=2, n=1, >>> >>> endian="little") # short (2 bytes) >>> >>> exp <- readBin(URL, numeric(), size=4, n=1, >>> >>> endian="little") # float (4 bytes, strangely enough, would expect 8) >>> >>> loss <- readBin(URL, numeric(), size=4, n=1, >>> >>> endian="little") # float (4 bytes) >>> >>> PLT <- rbind(PLT, c(periodIndex, eventId, eventDate, >>> >>> repNum, exp, loss)) >>> >>> } # end while >>> >>> return(PLT) >>> >>> close(URL) >>> >>> } >>> >>> >>> >>> ---------------- >>> >>> [[alternative HTML version deleted]] >>> >>> >>> >>> ______________________________________________ >>> >>> R-help at r-project.org <mailto:R-help at r-project.org> <mailto:R-help at r-project.org <mailto:R-help at r-project.org>> mailing list -- To UNSUBSCRIBE and more, see >>> >>> https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help><https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help>> >>> >>> PLEASE do read the posting guide http://www.R-project.org/ <http://www.r-project.org/> <http://www.r-project.org/ <http://www.r-project.org/>> >>> >>> posting-guide.html >>> >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >>> >> [[alternative HTML version deleted]] >>> >> >>> >> ______________________________________________ >>> >> R-help at r-project.org <mailto:R-help at r-project.org> <mailto:R-help at r-project.org <mailto:R-help at r-project.org>> mailing list -- To UNSUBSCRIBE and more, see >>> >> https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help><https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help>> >>> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html> <http://www.r-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html>> >>> >> and provide commented, minimal, self-contained, reproducible code. >>> > >>> > >>> > [[alternative HTML version deleted]] >>> > >>> > ______________________________________________ >>> > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see >>> > https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help> >>> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html> >>> > and provide commented, minimal, self-contained, reproducible code. >>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> -- >> Dr. Michael Sumner >> Software and Database Engineer >> Australian Antarctic Division >> 203 Channel Highway >> Kingston Tasmania 7050 Australia >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.