Jim,
there are indeed many mathematical areas where data are not quite fixed.
Consider inequalities such as a value that can be higher than some number but
lower than another. A grade of A can often mean a score between 90 and 100 (no
extra credit). An event deemed to be "significant at the 95% level of
probability can be in a 5% range or based on various errors, may not even be in
the range. Some places you can have infinitesimals or things approaching
infinity and yet sometimes cancel things out without having an exact number.
The list of such things is vast and as was already pointed out here, many such
cases have some info, even USEFUL info, that is lost if you declare them to be
an NA or an Inf or by say choosing to view an A is exactly 95. If a student has
straight A's, there is an excellent chance many of those A's came from
scores above 95. A student with an overall C average may be more likely to have
the single A be in the low 90's.
R was not necessarily designed to work this way. For some purposes, you may want
to use a variable that is more of a range. When I make plots in ggplot, I often
use Inf or -Inf to specify one end of a range, so that, for example, whatever
the data makes ggplot choose for upper and lower bounds, something I draw in the
background will extend to that border.
But there is a difference between how we store info, and how we use it. Many R
functions have a feature like saying na.rm=TRUE that may not make sense if you
store a value as an NA whose meaning is "between 95 and 100". You
might want to write code that makes two copies of any vector which has an NA
value associated with a range, and do something like place the minimum value(s)
in one and the maximum in the other and then do some complex calculation.
Or consider a value like measuring a room with a ruler accurate only to 1/4
inch? If a side is 100 inches, the real value can be between 99.75 and 100.25
inches. Each measurement can be stored as a number and a plus/minus. To
calculate the volume of a room, you might multiply all the low values to get one
number and the high values to get another and store that as a range or whatever
else makes send like averaging the two.
Still, some of that is normally ignored or done some other way, without
inventing new meanings for NA. I noted earlier that programs outside R will
often do something like store out-of-band info that when imported into R is
always treated as NA. Some thig may be unavailable because the person did not
show up, others because they had horrible handwriting and the one who typed it
in guessed what it said, and others who refused to answer . It may be that much
of your program should treat all those as NA but other parts might want to
record that some percent of the responders did this or that. As noted, Adrian
Dusa and others had such needs and have a package that in some way annotates NA
values when asked. I have played with it but currently have no need for it. And,
just FYI, Adrian tried other things first as there already are multiple bit
patterns that mean specific variation on an NA such as NA_integer_ (note the two
underscores) and other variants for character, real, complex and a few more. In
a bizarre way, you can play games and test them as in:
> a=NA_integer_
> b=NA_character_
> identical(a, NA_integer_)
[1] TRUE
> identical(a, NA_character_)
[1] FALSE
> identical(a, a)
[1] TRUE
> identical(a, b)
[1] FALSE
> identical(a, NA)
[1] FALSE
So, in THEORY, you might get away to using these oddball bitmap variations, or
adding to them but they do not survive well in vectors which must in some sense
only contain one type. I have had some minor success making a list and test the
contents, which normally show all version as NA but clearly retain subtle
differences:
> temp=list(1, NA_integer_, 2, NA_character_, 3, NA)
> temp
[[1]]
[1] 1
[[2]]
[1] NA
[[3]]
[1] 2
[[4]]
[1] NA
[[5]]
[1] 3
[[6]]
[1] NA
> temp[[2]]
[1] NA
> identical(temp[[2]], NA_integer_)
[1] TRUE
> identical(temp[[2]], NA_character_)
[1] FALSE
> identical(temp[[4]], NA_character_)
[1] TRUE
So, yes, I can imagine a subtle window of opportunity for re-using some of these
NA variants to act like an NA but also be able to carefully signal some other
opportunities. But as noted, vectors break the scheme so your data.frame might
need to use list columns, which is doable. I bet many tools you use, especially
ones that make copies or conversions, will break the scheme.
Please note that for ME, the above discussion is academic and a reaction to the
ideas raised by others. I am not in any way suggesting R is deficient for not
being designed for things like this, nor that wanting some such feature is a bad
thing. What Adrian provided is sort of in between as real NA are stored but also
some attributes record what the NA is supposed to represent.
-----Original Message-----
From: Jim Lemon <drjimlemon at gmail.com>
Sent: Tuesday, December 21, 2021 5:00 PM
To: Avi Gross <avigross at verizon.net>
Cc: r-help mailing list <r-help at r-project.org>; Adrian Du?a
<dusa.adrian at unibuc.ro>
Subject: Re: [R] Creating NA equivalent
Please pardon a comment that may be off-target as well as off-topic.
This appears similar to a number of things like fuzzy logic, where an instance
can take incompatible truth values.
It is known that an instance may have an attribute with a numeric value, but
that value cannot be determined.
It seems to me that an appropriate designation for the value is Unk, perhaps
with an associated probability of determination to distinguish it from NA (it is
definitely not known).
Jim
On Wed, Dec 22, 2021 at 6:55 AM Avi Gross via R-help <r-help at
r-project.org> wrote:>
> I wonder if the package Adrian Du?a created might be helpful or point you
along the way.
>
> It was eventually named "declared"
>
> https://cran.r-project.org/web/packages/declared/index.html
>
> With a vignette here:
>
> https://cran.r-project.org/web/packages/declared/vignettes/declared.pd
> f
>
> I do not know if it would easily satisfy your needs but it may be a step
along the way. A package called Haven was part of the motivation and Adrian
wanted a way to import data from external sources that had more than one
category of NA that sounds a bit like what you want. His functions should allow
the creation of such data within R, as well. I am including him in this email if
you want to contact him or he has something to say.
>
>
> -----Original Message-----
> From: R-help <r-help-bounces at r-project.org> On Behalf Of Duncan
> Murdoch
> Sent: Tuesday, December 21, 2021 5:26 AM
> To: Marc Girondot <marc_grt at yahoo.fr>; r-help at r-project.org
> Subject: Re: [R] Creating NA equivalent
>
> On 20/12/2021 11:41 p.m., Marc Girondot via R-help wrote:
> > Dear members,
> >
> > I work about dosage and some values are bellow the detection limit.
> > I would like create new "numbers" like LDL (to represent
lower than
> > detection limit) and UDL (upper the detection limit) that behave
> > like NA, with the possibility to test them using for example
> > is.LDL() or is.UDL().
> >
> > Note that NA is not the same than LDL or UDL: NA represent missing
data.
> > Here the data is available as LDL or UDL.
> >
> > NA is built in R language very deep... any option to create new
> > version of NA-equivalent ?
> >
>
> There was a discussion of this back in May. Here's a link to one
approach that I suggested:
>
> https://stat.ethz.ch/pipermail/r-devel/2021-May/080776.html
>
> Read the followup messages, I made at least one suggested improvement.
> I don't know if anyone has packaged this, but there's a later
version of the code here:
>
> https://stackoverflow.com/a/69179441/2554330
>
> Duncan Murdoch
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.