thr3ads.net - R help - [R] as.factor and floating point numbers [Jan 2023]

If this information is useful, please help other people find it:
Share via:

Tobias Fellinger

2023-Jan-25 09:03 UTC

[R] as.factor and floating point numbers

Hello,

I'm encountering the following error: 

In a package for survival analysis I use a data.frame is created, one column is
created by applying unique on the event times while others are created by
running table on the event times and the treatment arm.

When there are event times very close together they are put in the same factor
level when coerced to factor while unique outputs both values, leading to
different lengths of the columns.

Try this to reproduce: 
x <- c(1, 1+.Machine$double.eps)
unique(x)
table(x)

Is there a general best practice to deal with such issues?

Should calling table on floats be avoided in general?

What can one use instead? 

One could easily iterate over the unique values and compare all values with the
whole vector but this are N*N comparisons, compared to N*log(N) when sorting
first and taking into account that the vector is sorted.

I think for my purposes I'll round to a hundredth of a day before calling
the function, but any advice on avoiding this issue an writing more fault
tolerant code is greatly appreciated.

all the best, Tobias


	[[alternative HTML version deleted]]

Andrew Simmons

2023-Jan-25 09:13 UTC

head link

[R] as.factor and floating point numbers

R converts floats to strings with ~15 digits of accuracy, specifically
to avoid differentiating between 1 and 1 + .Machine$double.eps, it is
assumed that small differences such as this are due to rounding errors
and are unimportant.

So, if when making your factor, you want all digits, you could write
this: `as.factor(format(x, digits = 17L))`

On Wed, Jan 25, 2023 at 4:03 AM Tobias Fellinger <tobby at htu.at>
wrote:>
> Hello,
>
> I'm encountering the following error:
>
> In a package for survival analysis I use a data.frame is created, one
column is created by applying unique on the event times while others are created
by running table on the event times and the treatment arm.
>
> When there are event times very close together they are put in the same
factor level when coerced to factor while unique outputs both values, leading to
different lengths of the columns.
>
> Try this to reproduce:
> x <- c(1, 1+.Machine$double.eps)
> unique(x)
> table(x)
>
> Is there a general best practice to deal with such issues?
>
> Should calling table on floats be avoided in general?
>
> What can one use instead?
>
> One could easily iterate over the unique values and compare all values with
the whole vector but this are N*N comparisons, compared to N*log(N) when sorting
first and taking into account that the vector is sorted.
>
> I think for my purposes I'll round to a hundredth of a day before
calling the function, but any advice on avoiding this issue an writing more
fault tolerant code is greatly appreciated.
>
> all the best, Tobias
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Valentin Petzel

2023-Jan-25 19:59 UTC

head link

[R] as.factor and floating point numbers

Hello Tobias,

A factor is basically a way to get a character to behave like an integer. It 
consists of an integer with values from 1 to nlev, and a character vector 
levels, specifying for each value a level name.

But this means that factors only really make sense with characters, and 
anything that is not a character will be forced to be a character. Thus two 
values that are represented by the same value in as.character will be treated 
as the same.

Now this is probably reasonable most of the time, as numeric values will 
usually represent metric data, which tends to make little sense as factor. But 
if we want to do this we can easily build or own factors from floats, and even 
write some convenience wrapper around table, as shown in the appended file.

Best regards,
Valentin


Am Mittwoch, 25. J?nner 2023, 10:03:01 CET schrieb Tobias
Fellinger:> Hello,
> 
> I'm encountering the following error:
> 
> In a package for survival analysis I use a data.frame is created, one
column
> is created by applying unique on the event times while others are created
> by running table on the event times and the treatment arm.
> 
> When there are event times very close together they are put in the same
> factor level when coerced to factor while unique outputs both values,
> leading to different lengths of the columns.
> 
> Try this to reproduce:
> x <- c(1, 1+.Machine$double.eps)
> unique(x)
> table(x)
> 
> Is there a general best practice to deal with such issues?
> 
> Should calling table on floats be avoided in general?
> 
> What can one use instead?
> 
> One could easily iterate over the unique values and compare all values with
> the whole vector but this are N*N comparisons, compared to N*log(N) when
> sorting first and taking into account that the vector is sorted.
> 
> I think for my purposes I'll round to a hundredth of a day before
calling
> the function, but any advice on avoiding this issue an writing more fault
> tolerant code is greatly appreciated.
> 
> all the best, Tobias
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20230125/20efc180/attachment.sig>

R help - Jan 2023 - as.factor and floating point numbers

[R] as.factor and floating point numbers

[R] as.factor and floating point numbers

[R] as.factor and floating point numbers