thr3ads.net - R help - [R] rounding down with as.integer [Jan 2015]

If this information is useful, please help other people find it:
Share via:

Richard M. Heiberger

2015-Jan-01 19:10 UTC

[R] rounding down with as.integer

Interesting.  Following someone on this list today the goal is input
the data correctly.
My inclination would be to read the file as text, pad each number to
the right, drop the decimal point,
and then read it as an integer.
0 1 2 0.325 1.12 1.9
0.000 1.000 2.000 0.325 1.120 1.900
0000 1000 2000 0325 1120 1900

The pad step is the interesting step.

## 0 1 2 0.325 1.12 1.9
## 0.000 1.000 2.000 0.325 1.120 1.900
## 0000 1000 2000 0325 1120 1900

x.in <- scan(text="
0 1 2 0.325 1.12 1.9 1.
", what="")

padding <- c(".000", "000", "00",
"0", "")

x.pad <- paste(x.in, padding[nchar(x.in)], sep="")

x.nodot <- sub(".", "", x.pad, fixed=TRUE)

x <- as.integer(x.nodot)


Rich


On Thu, Jan 1, 2015 at 1:21 PM, Mike Miller <mbmiller+l at gmail.com>
wrote:> On Thu, 1 Jan 2015, Duncan Murdoch wrote:
>
>> On 31/12/2014 8:44 PM, David Winsemius wrote:
>>>
>>>
>>> On Dec 31, 2014, at 3:24 PM, Mike Miller wrote:
>>>
>>>> This is probably a FAQ, and I don't really have a question
about it, but
>>>> I just ran across this in something I was working on:
>>>>
>>>>> as.integer(1000*1.003)
>>>>
>>>> [1] 1002
>>>>
>>>> I didn't expect it, but maybe I should have.  I guess
it's about the
>>>> machine precision added to the fact that as.integer always
rounds down:
>>>>
>>>>
>>>>> as.integer(1000*1.003 + 255 * .Machine$double.eps)
>>>>
>>>> [1] 1002
>>>>
>>>>> as.integer(1000*1.003 + 256 * .Machine$double.eps)
>>>>
>>>> [1] 1003
>>>>
>>>>
>>>> This does it right...
>>>>
>>>>> as.integer( round( 1000*1.003 ) )
>>>>
>>>> [1] 1003
>>>>
>>>> ...but this seems to always give the same answer and it is a
little
>>>> faster in my application:
>>>>
>>>>> as.integer( 1000*1.003 + .1 )
>>>>
>>>> [1] 1003
>>>>
>>>>
>>>> FYI - I'm reading in a long vector of numbers from a text
file with no
>>>> more than three digits to the right of the decimal.  I'm
converting them to
>>>> integers and saving them in binary format.
>>>>
>>>
>>> So just add 0.0001 or even .0000001 to all of them and coerce to
integer.
>>
>>
>> I don't think the original problem was stated clearly, so I'm
not sure
>> whether this is a solution, but it looks wrong to me.  If you want to
round
>> to the nearest integer, why not use round() (without the as.integer
>> afterwards)?  Or if you really do want an integer, why add 0.1 or
0.0001,
>> why not add 0.5 before calling as.integer()?  This is the classical way
to
>> implement round().
>>
>> To state the problem clearly, I'd like to know what result is
expected for
>> any real number x.  Since R's numeric type only approximates the
real
>> numbers we might not be able to get a perfect match, but at least we
could
>> quantify how close we get.  Or is the input really character data?  The
>> original post mentioned reading numbers from a text file.
>
>
>
> Maybe you'd like to know what I'm really doing.  I have 1600 text
files each
> with up to 16,000 lines with 3100 numbers per line, delimited by a single
> space.  The numbers are between 0 and 2, inclusive, and they have up to
> three digits to the right of the decimal.  Every possible value in that
> range will occur in the data.  Some examples numbers: 0 1 2 0.325 1.12 1.9.
> I want to multiply by 1000 and store them as 16-bit integers (uint16).
>
> I've been reading in the data like so:
>
>> data <- scan( file=FILE, what=double(), nmax=3100*16000)
>
>
> At first I tried making the integers like so:
>
>> ptm <- proc.time() ; ints <- as.integer( 1000 * data ) ;
proc.time()-ptm
>
>    user  system elapsed
>   0.187   0.387   0.574
>
> I decided I should compare with the result I got using round():
>
>> ptm <- proc.time() ; ints2 <- as.integer( round( 1000 * data ) )
;
>> proc.time()-ptm
>
>    user  system elapsed
>   1.595   0.757   2.352
>
> It is a curious fact that only a few of the values from 0 to 2000 disagree
> between the two methods:
>
>> table( ints2[ ints2 != ints ] )
>
>
>  1001  1003  1005  1007  1009  1011  1013  1015  1017  1019  1021  1023
> 35651 27020 15993 11505  8967  7549  6885  6064  5512  4828  4533  4112
>
> I understand that it's all about the problem of representing digital
numbers
> in binary, but I still find some of the results a little surprising, like
> that list of numbers from the table() output.  For another example:
>
>> 1000+3 - 1000*(1+3/1000)
>
> [1] 1.136868e-13
>
>> 3 - 1000*(0+3/1000)
>
> [1] 0
>
>> 2000+3 - 1000*(2+3/1000)
>
> [1] 0
>
> See what I mean?  So there is something special about the numbers around
> 1000.
>
> Back to the quesion at hand:  I can avoid use of round() and speed things
up
> a little bit by just adding a small number after multiplying by 1000:
>
>> ptm <- proc.time() ; R3 <- as.integer( 1000 * data + .1 ) ;
>> proc.time()-ptm
>
>    user  system elapsed
>   0.224   0.594   0.818
>
> You point out that adding .5 makes sense.  That is probably a better idea
> and I should take that approach under most conditions, but in this case we
> can add anything between 2e-13 and about 0.99999999999 and always get the
> same answer.  We also have to remember that if a number might be negative
> (not a problem for me in this application), we need to subtract 0.5 instead
> of adding it.
>
> Anyway, right now this is what I'm actually doing:
>
>> con <- file( paste0(FILE, ".uint16"), "wb" )
>> ptm <- proc.time() ; writeBin( as.integer( 1000 * scan( file=FILE,
>> what=double(), nmax=3100*16000 ) + .1 ), con, size=2 ) ;
proc.time()-ptm
>
> Read 48013406 items
>    user  system elapsed
>  10.263   0.733  10.991
>>
>> close(con)
>
>
> By the way, writeBin() is something that I learned about here, from you,
> Duncan.  Thanks for that, too.
>
> Mike
>
> --
> Michael B. Miller, Ph.D.
> University of Minnesota
> http://scholar.google.com/citations?user=EV_phq4AAAAJ
>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Mike Miller

2015-Jan-01 19:58 UTC

head link

[R] rounding down with as.integer

I'd have to say thanks, but no thanks, to that one!  ;-)  The problem is 
that it will take a long time and it will give the same answer.

The first time I did this kind of thing, a year or two ago, I manipulated 
the text data to produce integers before putting the data into R.  The 
data were a little different -- already zero padded with three digits to 
the right of the decimal and one to the left, so all I had to do was drop 
the decimal point.  The as.integer(1000*x+.5) method is very fast and it 
works great.

I could have done that this time, but I was also saving to other formats, 
so I had the data already in the format I described.

Mike


On Thu, 1 Jan 2015, Richard M. Heiberger wrote:
> Interesting.  Following someone on this list today the goal is input
> the data correctly.
> My inclination would be to read the file as text, pad each number to
> the right, drop the decimal point,
> and then read it as an integer.
> 0 1 2 0.325 1.12 1.9
> 0.000 1.000 2.000 0.325 1.120 1.900
> 0000 1000 2000 0325 1120 1900
>
> The pad step is the interesting step.
>
> ## 0 1 2 0.325 1.12 1.9
> ## 0.000 1.000 2.000 0.325 1.120 1.900
> ## 0000 1000 2000 0325 1120 1900
>
> x.in <- scan(text="
> 0 1 2 0.325 1.12 1.9 1.
> ", what="")
>
> padding <- c(".000", "000", "00",
"0", "")
>
> x.pad <- paste(x.in, padding[nchar(x.in)], sep="")
>
> x.nodot <- sub(".", "", x.pad, fixed=TRUE)
>
> x <- as.integer(x.nodot)
>
>
> Rich
>
>
> On Thu, Jan 1, 2015 at 1:21 PM, Mike Miller <mbmiller+l at gmail.com>
wrote:
>> On Thu, 1 Jan 2015, Duncan Murdoch wrote:
>>
>>> On 31/12/2014 8:44 PM, David Winsemius wrote:
>>>>
>>>>
>>>> On Dec 31, 2014, at 3:24 PM, Mike Miller wrote:
>>>>
>>>>> This is probably a FAQ, and I don't really have a
question about it, but
>>>>> I just ran across this in something I was working on:
>>>>>
>>>>>> as.integer(1000*1.003)
>>>>>
>>>>> [1] 1002
>>>>>
>>>>> I didn't expect it, but maybe I should have.  I guess
it's about the
>>>>> machine precision added to the fact that as.integer always
rounds down:
>>>>>
>>>>>
>>>>>> as.integer(1000*1.003 + 255 * .Machine$double.eps)
>>>>>
>>>>> [1] 1002
>>>>>
>>>>>> as.integer(1000*1.003 + 256 * .Machine$double.eps)
>>>>>
>>>>> [1] 1003
>>>>>
>>>>>
>>>>> This does it right...
>>>>>
>>>>>> as.integer( round( 1000*1.003 ) )
>>>>>
>>>>> [1] 1003
>>>>>
>>>>> ...but this seems to always give the same answer and it is
a little
>>>>> faster in my application:
>>>>>
>>>>>> as.integer( 1000*1.003 + .1 )
>>>>>
>>>>> [1] 1003
>>>>>
>>>>>
>>>>> FYI - I'm reading in a long vector of numbers from a
text file with no
>>>>> more than three digits to the right of the decimal. 
I'm converting them to
>>>>> integers and saving them in binary format.
>>>>>
>>>>
>>>> So just add 0.0001 or even .0000001 to all of them and coerce
to integer.
>>>
>>>
>>> I don't think the original problem was stated clearly, so
I'm not sure
>>> whether this is a solution, but it looks wrong to me.  If you want
to round
>>> to the nearest integer, why not use round() (without the as.integer
>>> afterwards)?  Or if you really do want an integer, why add 0.1 or
0.0001,
>>> why not add 0.5 before calling as.integer()?  This is the classical
way to
>>> implement round().
>>>
>>> To state the problem clearly, I'd like to know what result is
expected for
>>> any real number x.  Since R's numeric type only approximates
the real
>>> numbers we might not be able to get a perfect match, but at least
we could
>>> quantify how close we get.  Or is the input really character data? 
The
>>> original post mentioned reading numbers from a text file.
>>
>>
>>
>> Maybe you'd like to know what I'm really doing.  I have 1600
text files each
>> with up to 16,000 lines with 3100 numbers per line, delimited by a
single
>> space.  The numbers are between 0 and 2, inclusive, and they have up to
>> three digits to the right of the decimal.  Every possible value in that
>> range will occur in the data.  Some examples numbers: 0 1 2 0.325 1.12
1.9.
>> I want to multiply by 1000 and store them as 16-bit integers (uint16).
>>
>> I've been reading in the data like so:
>>
>>> data <- scan( file=FILE, what=double(), nmax=3100*16000)
>>
>>
>> At first I tried making the integers like so:
>>
>>> ptm <- proc.time() ; ints <- as.integer( 1000 * data ) ;
proc.time()-ptm
>>
>>    user  system elapsed
>>   0.187   0.387   0.574
>>
>> I decided I should compare with the result I got using round():
>>
>>> ptm <- proc.time() ; ints2 <- as.integer( round( 1000 * data
) ) ;
>>> proc.time()-ptm
>>
>>    user  system elapsed
>>   1.595   0.757   2.352
>>
>> It is a curious fact that only a few of the values from 0 to 2000
disagree
>> between the two methods:
>>
>>> table( ints2[ ints2 != ints ] )
>>
>>
>>  1001  1003  1005  1007  1009  1011  1013  1015  1017  1019  1021  1023
>> 35651 27020 15993 11505  8967  7549  6885  6064  5512  4828  4533  4112
>>
>> I understand that it's all about the problem of representing
digital numbers
>> in binary, but I still find some of the results a little surprising,
like
>> that list of numbers from the table() output.  For another example:
>>
>>> 1000+3 - 1000*(1+3/1000)
>>
>> [1] 1.136868e-13
>>
>>> 3 - 1000*(0+3/1000)
>>
>> [1] 0
>>
>>> 2000+3 - 1000*(2+3/1000)
>>
>> [1] 0
>>
>> See what I mean?  So there is something special about the numbers
around
>> 1000.
>>
>> Back to the quesion at hand:  I can avoid use of round() and speed
things up
>> a little bit by just adding a small number after multiplying by 1000:
>>
>>> ptm <- proc.time() ; R3 <- as.integer( 1000 * data + .1 ) ;
>>> proc.time()-ptm
>>
>>    user  system elapsed
>>   0.224   0.594   0.818
>>
>> You point out that adding .5 makes sense.  That is probably a better
idea
>> and I should take that approach under most conditions, but in this case
we
>> can add anything between 2e-13 and about 0.99999999999 and always get
the
>> same answer.  We also have to remember that if a number might be
negative
>> (not a problem for me in this application), we need to subtract 0.5
instead
>> of adding it.
>>
>> Anyway, right now this is what I'm actually doing:
>>
>>> con <- file( paste0(FILE, ".uint16"), "wb" )
>>> ptm <- proc.time() ; writeBin( as.integer( 1000 * scan(
file=FILE,
>>> what=double(), nmax=3100*16000 ) + .1 ), con, size=2 ) ;
proc.time()-ptm
>>
>> Read 48013406 items
>>    user  system elapsed
>>  10.263   0.733  10.991
>>>
>>> close(con)
>>
>>
>> By the way, writeBin() is something that I learned about here, from
you,
>> Duncan.  Thanks for that, too.
>>
>> Mike
>>
>> --
>> Michael B. Miller, Ph.D.
>> University of Minnesota
>> http://scholar.google.com/citations?user=EV_phq4AAAAJ
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

Ted Harding

2015-Jan-01 21:28 UTC

head link

[R] rounding down with as.integer

I've been followeing this little tour round the murkier bistros
in the back-streets of R with interest! Then it occurred to me:
What is wrong with [using example data]:

  x0 <- c(0,1,2,0.325,1.12,1.9,1.003)
  x1 <- as.integer(as.character(1000*x0))
  n1 <- c(0,1000,2000,325,1120,1900,1003)

  x1 - n1
  ## [1] 0 0 0 0 0 0 0

  ## But, of course:
  1000*x0 - n1
  ## [1]  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
  ## [5]  0.000000e+00  0.000000e+00 -1.136868e-13

Or am I missing somthing else in what Mike Miller is seeking to do?
Ted.

On 01-Jan-2015 19:58:02 Mike Miller wrote:> I'd have to say thanks, but no thanks, to that one!  ;-)  The problem
is
> that it will take a long time and it will give the same answer.
> 
> The first time I did this kind of thing, a year or two ago, I manipulated 
> the text data to produce integers before putting the data into R.  The 
> data were a little different -- already zero padded with three digits to 
> the right of the decimal and one to the left, so all I had to do was drop 
> the decimal point.  The as.integer(1000*x+.5) method is very fast and it 
> works great.
> 
> I could have done that this time, but I was also saving to other formats, 
> so I had the data already in the format I described.
> 
> Mike
> 
> 
> On Thu, 1 Jan 2015, Richard M. Heiberger wrote:
> 
>> Interesting.  Following someone on this list today the goal is input
>> the data correctly.
>> My inclination would be to read the file as text, pad each number to
>> the right, drop the decimal point,
>> and then read it as an integer.
>> 0 1 2 0.325 1.12 1.9
>> 0.000 1.000 2.000 0.325 1.120 1.900
>> 0000 1000 2000 0325 1120 1900
>>
>> The pad step is the interesting step.
>>
>> ## 0 1 2 0.325 1.12 1.9
>> ## 0.000 1.000 2.000 0.325 1.120 1.900
>> ## 0000 1000 2000 0325 1120 1900
>>
>> x.in <- scan(text="
>> 0 1 2 0.325 1.12 1.9 1.
>> ", what="")
>>
>> padding <- c(".000", "000", "00",
"0", "")
>>
>> x.pad <- paste(x.in, padding[nchar(x.in)], sep="")
>>
>> x.nodot <- sub(".", "", x.pad, fixed=TRUE)
>>
>> x <- as.integer(x.nodot)
>>
>>
>> Rich
>>
>>
>> On Thu, Jan 1, 2015 at 1:21 PM, Mike Miller <mbmiller+l at
gmail.com> wrote:
>>> On Thu, 1 Jan 2015, Duncan Murdoch wrote:
>>>
>>>> On 31/12/2014 8:44 PM, David Winsemius wrote:
>>>>>
>>>>>
>>>>> On Dec 31, 2014, at 3:24 PM, Mike Miller wrote:
>>>>>
>>>>>> This is probably a FAQ, and I don't really have a
question about it, but
>>>>>> I just ran across this in something I was working on:
>>>>>>
>>>>>>> as.integer(1000*1.003)
>>>>>>
>>>>>> [1] 1002
>>>>>>
>>>>>> I didn't expect it, but maybe I should have.  I
guess it's about the
>>>>>> machine precision added to the fact that as.integer
always rounds down:
>>>>>>
>>>>>>
>>>>>>> as.integer(1000*1.003 + 255 * .Machine$double.eps)
>>>>>>
>>>>>> [1] 1002
>>>>>>
>>>>>>> as.integer(1000*1.003 + 256 * .Machine$double.eps)
>>>>>>
>>>>>> [1] 1003
>>>>>>
>>>>>>
>>>>>> This does it right...
>>>>>>
>>>>>>> as.integer( round( 1000*1.003 ) )
>>>>>>
>>>>>> [1] 1003
>>>>>>
>>>>>> ...but this seems to always give the same answer and it
is a little
>>>>>> faster in my application:
>>>>>>
>>>>>>> as.integer( 1000*1.003 + .1 )
>>>>>>
>>>>>> [1] 1003
>>>>>>
>>>>>>
>>>>>> FYI - I'm reading in a long vector of numbers from
a text file with no
>>>>>> more than three digits to the right of the decimal. 
I'm converting them
>>>>>> to
>>>>>> integers and saving them in binary format.
>>>>>>
>>>>>
>>>>> So just add 0.0001 or even .0000001 to all of them and
coerce to integer.
>>>>
>>>>
>>>> I don't think the original problem was stated clearly, so
I'm not sure
>>>> whether this is a solution, but it looks wrong to me.  If you
want to
>>>> round
>>>> to the nearest integer, why not use round() (without the
as.integer
>>>> afterwards)?  Or if you really do want an integer, why add 0.1
or 0.0001,
>>>> why not add 0.5 before calling as.integer()?  This is the
classical way to
>>>> implement round().
>>>>
>>>> To state the problem clearly, I'd like to know what result
is expected for
>>>> any real number x.  Since R's numeric type only
approximates the real
>>>> numbers we might not be able to get a perfect match, but at
least we could
>>>> quantify how close we get.  Or is the input really character
data?  The
>>>> original post mentioned reading numbers from a text file.
>>>
>>>
>>>
>>> Maybe you'd like to know what I'm really doing.  I have
1600 text files
>>> each
>>> with up to 16,000 lines with 3100 numbers per line, delimited by a
single
>>> space.  The numbers are between 0 and 2, inclusive, and they have
up to
>>> three digits to the right of the decimal.  Every possible value in
that
>>> range will occur in the data.  Some examples numbers: 0 1 2 0.325
1.12 1.9.
>>> I want to multiply by 1000 and store them as 16-bit integers
(uint16).
>>>
>>> I've been reading in the data like so:
>>>
>>>> data <- scan( file=FILE, what=double(), nmax=3100*16000)
>>>
>>>
>>> At first I tried making the integers like so:
>>>
>>>> ptm <- proc.time() ; ints <- as.integer( 1000 * data ) ;
proc.time()-ptm
>>>
>>>    user  system elapsed
>>>   0.187   0.387   0.574
>>>
>>> I decided I should compare with the result I got using round():
>>>
>>>> ptm <- proc.time() ; ints2 <- as.integer( round( 1000 *
data ) ) ;
>>>> proc.time()-ptm
>>>
>>>    user  system elapsed
>>>   1.595   0.757   2.352
>>>
>>> It is a curious fact that only a few of the values from 0 to 2000
disagree
>>> between the two methods:
>>>
>>>> table( ints2[ ints2 != ints ] )
>>>
>>>
>>>  1001  1003  1005  1007  1009  1011  1013  1015  1017  1019  1021 
1023
>>> 35651 27020 15993 11505  8967  7549  6885  6064  5512  4828  4533 
4112
>>>
>>> I understand that it's all about the problem of representing
digital
>>> numbers
>>> in binary, but I still find some of the results a little
surprising, like
>>> that list of numbers from the table() output.  For another example:
>>>
>>>> 1000+3 - 1000*(1+3/1000)
>>>
>>> [1] 1.136868e-13
>>>
>>>> 3 - 1000*(0+3/1000)
>>>
>>> [1] 0
>>>
>>>> 2000+3 - 1000*(2+3/1000)
>>>
>>> [1] 0
>>>
>>> See what I mean?  So there is something special about the numbers
around
>>> 1000.
>>>
>>> Back to the quesion at hand:  I can avoid use of round() and speed
things
>>> up
>>> a little bit by just adding a small number after multiplying by
1000:
>>>
>>>> ptm <- proc.time() ; R3 <- as.integer( 1000 * data + .1 )
;
>>>> proc.time()-ptm
>>>
>>>    user  system elapsed
>>>   0.224   0.594   0.818
>>>
>>> You point out that adding .5 makes sense.  That is probably a
better idea
>>> and I should take that approach under most conditions, but in this
case we
>>> can add anything between 2e-13 and about 0.99999999999 and always
get the
>>> same answer.  We also have to remember that if a number might be
negative
>>> (not a problem for me in this application), we need to subtract 0.5
instead
>>> of adding it.
>>>
>>> Anyway, right now this is what I'm actually doing:
>>>
>>>> con <- file( paste0(FILE, ".uint16"),
"wb" )
>>>> ptm <- proc.time() ; writeBin( as.integer( 1000 * scan(
file=FILE,
>>>> what=double(), nmax=3100*16000 ) + .1 ), con, size=2 ) ;
proc.time()-ptm
>>>
>>> Read 48013406 items
>>>    user  system elapsed
>>>  10.263   0.733  10.991
>>>>
>>>> close(con)
>>>
>>>
>>> By the way, writeBin() is something that I learned about here, from
you,
>>> Duncan.  Thanks for that, too.
>>>
>>> Mike
>>>
>>> --
>>> Michael B. Miller, Ph.D.
>>> University of Minnesota
>>> http://scholar.google.com/citations?user=EV_phq4AAAAJ
>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
Date: 01-Jan-2015  Time: 21:28:22
This message was sent by XFMail

R help - Jan 2015 - rounding down with as.integer

[R] rounding down with as.integer

[R] rounding down with as.integer

[R] rounding down with as.integer