龙华
2025-Sep-25  08:19 UTC
[Rd] Proposal: Enhance as.character() for Consistency with as.numeric() Precision
Dear R Core Team and R-devel Community,
I hope this message finds you well. I am writing to propose an enhancement to
the `as.character()` function in R's base package to address an
inconsistency with `as.numeric()` when handling high-precision floating-point
numbers. This issue has practical implications for code reliability, especially
in scientific computing and data analysis, and I believe a small adjustment
could align the behavior more closely with modern user expectations and R's
evolving use cases.
Problem Description
The current behavior of `as.character()` and `as.numeric()` leads to logical
inconsistencies when converting high-precision decimal strings. For example,
consider the string `"7.999999999999999111822"` (22 significant
digits):
- `as.numeric("7.999999999999999111822")` converts this to a
double-precision floating-point number (per IEEE 754), which is stored as
approximately `7.9999999999999991118` (verifiable with `print(x, digits = 20)`).
The difference from 8 (`8 - x ? 8.88178e-16`) is slightly greater than half the
machine epsilon (`0.5 * .Machine$double.eps ? 1.11e-16`), so it is not rounded
to `8.0`.
- However, `as.character(as.numeric("7.999999999999999111822"))`
returns `"8"`, simplifying the value and losing the small difference.
This leads to a mismatch: `x < 8` is `TRUE`, but `as.numeric(as.character(x))
== 8` is also `TRUE`.
This inconsistency arises because `as.numeric()` preserves the precision of the
IEEE 754 double (up to ~15-17 decimal digits), while `as.character()` defaults
to a human-readable simplification, often rounding to the nearest integer when
the difference is below its internal display threshold.
Proposed Solution
I suggest either of the following enhancements to improve consistency:
1. Swap the Functionality of `format()` and `as.character()`:
   - Redefine `as.character(x)` to inherit `format()`'s
behavior, providing a default precision (e.g., `digits = 17`) to match the
effective decimal precision of double-precision floats. This would output
`"7.99999999999999911"` for the example above.
   - Redefine `format(x)` to inherit `as.character()`'s
current behavior, serving as a utility for concise, human-readable output (e.g.,
`"8"`).
   - Naming would then align with intent: `as.character()` for
type conversion with precision, `format()` for formatting adjustments.
2. Add a `digits` Parameter to `as.character()`:
   - Extend `as.character()` to accept a `digits` argument
(defaulting to `NULL` for current behavior, or e.g., `17` for precision
matching). Example:
     x <-
as.numeric("7.999999999999999111822")
     as.character(x, digits = 17)  #
"7.99999999999999911"
     as.character(x)      
        # "8" (current default)
   - This would allow users to opt for precise conversion
while preserving backward compatibility.
Rationale
- Consistency: `as.numeric()` and `as.character()` are similarly named base
functions, suggesting they should follow analogous precision rules. The current
discrepancy violates the expectation of round-trip consistency (string ? numeric
? string).
- Modern Use Cases: With R's growing use in scientific computing
and data science, high-precision handling is increasingly critical. The proposed
change aligns R with tools like Python (`str(float(x))` retains more precision)
and NumPy.
- User Experience: Explicit control via `digits` or a redefined `as.character()`
would reduce confusion, especially for users relying on type conversion for
logical operations.
Use Case
Consider a data validation script:
s1 <- "7.999999999999999111822"
x <- as.numeric(s1)
if (x < 8) print("Less than 8")  # TRUE, correct
if (as.numeric(as.character(x)) == 8) print("Equal to 8")  #
TRUE, inconsistent
The second condition fails due to `as.character(x)` simplifying to
`"8"`. With the proposed change (e.g., `as.character(x, digits =
17)`), both conditions would align with the stored value (`< 8`).
Implementation Considerations
- Backward Compatibility: Option 2 (adding `digits`) is less disruptive,
allowing existing code to use the default `as.character()` behavior. Option 1
requires a transition period or deprecation notice.
- Performance: High-precision formatting may have minor overhead, but this is
negligible for modern hardware.
- Documentation: Clear guidance on the new `digits` parameter or redefined roles
would be essential.
Next Steps
I would be happy to assist with testing or drafting a patch if this proposal
gains traction. Please let me know your thoughts or any additional
considerations. This issue was identified with the help of Grok (xAI), and I
believe community feedback could refine the approach.
Thank you for your time and the incredible work on R!
Best regards
??
longhua880 at foxmail.com
	[[alternative HTML version deleted]]