thr3ads.net - R help - [R] Creating NA equivalent [Dec 2021]

If this information is useful, please help other people find it:
Share via:

Jim Lemon

2021-Dec-21 22:00 UTC

[R] Creating NA equivalent

Please pardon a comment that may be off-target as well as off-topic.
This appears similar to a number of things like fuzzy logic, where an
instance can take incompatible truth values.

It is known that an instance may have an attribute with a numeric
value, but that value cannot be determined.

It seems to me that an appropriate designation for the value is Unk,
perhaps with an associated probability of determination to distinguish
it from NA (it is definitely not known).

Jim

On Wed, Dec 22, 2021 at 6:55 AM Avi Gross via R-help
<r-help at r-project.org> wrote:>
> I wonder if the package Adrian Du?a created might be helpful or point you
along the way.
>
> It was eventually named "declared"
>
> https://cran.r-project.org/web/packages/declared/index.html
>
> With a vignette here:
>
> https://cran.r-project.org/web/packages/declared/vignettes/declared.pdf
>
> I do not know if it would easily satisfy your needs but it may be a step
along the way. A package called Haven was part of the motivation and Adrian
wanted a way to import data from external sources that had more than one
category of NA that sounds a bit like what you want. His functions should allow
the creation of such data within R, as well. I am including him in this email if
you want to contact him or he has something to say.
>
>
> -----Original Message-----
> From: R-help <r-help-bounces at r-project.org> On Behalf Of Duncan
Murdoch
> Sent: Tuesday, December 21, 2021 5:26 AM
> To: Marc Girondot <marc_grt at yahoo.fr>; r-help at r-project.org
> Subject: Re: [R] Creating NA equivalent
>
> On 20/12/2021 11:41 p.m., Marc Girondot via R-help wrote:
> > Dear members,
> >
> > I work about dosage and some values are bellow the detection limit. I
> > would like create new "numbers" like LDL (to represent lower
than
> > detection limit) and UDL (upper the detection limit) that behave like
> > NA, with the possibility to test them using for example is.LDL() or
> > is.UDL().
> >
> > Note that NA is not the same than LDL or UDL: NA represent missing
data.
> > Here the data is available as LDL or UDL.
> >
> > NA is built in R language very deep... any option to create new
> > version of NA-equivalent ?
> >
>
> There was a discussion of this back in May.  Here's a link to one
approach that I suggested:
>
>    https://stat.ethz.ch/pipermail/r-devel/2021-May/080776.html
>
> Read the followup messages, I made at least one suggested improvement.
> I don't know if anyone has packaged this, but there's a later
version of the code here:
>
>    https://stackoverflow.com/a/69179441/2554330
>
> Duncan Murdoch
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Bert Gunter

2021-Dec-21 22:55 UTC

head link

[R] Creating NA equivalent

But you appear to be missing something, Jim -- see inline below (and
the original post):

Bert


On Tue, Dec 21, 2021 at 2:00 PM Jim Lemon <drjimlemon at gmail.com>
wrote:>
> Please pardon a comment that may be off-target as well as off-topic.
> This appears similar to a number of things like fuzzy logic, where an
> instance can take incompatible truth values.
>
> It is known that an instance may have an attribute with a numeric
> value, but that value cannot be determined.Yes, but **something** about the value is known: that it is > an upper
value or < a lower value. Such information should be used
(censoring!), not characterized as completely unknown. Think about it
in terms of survival time: saying that a person lasted longer than k
months is much more informative than saying that how long they lasted
is completely unknown!
>
> It seems to me that an appropriate designation for the value is Unk,
> perhaps with an associated probability of determination to distinguish
> it from NA (it is definitely not known).
>
> Jim
>
> On Wed, Dec 22, 2021 at 6:55 AM Avi Gross via R-help
> <r-help at r-project.org> wrote:
> >
> > I wonder if the package Adrian Du?a created might be helpful or point
you along the way.
> >
> > It was eventually named "declared"
> >
> > https://cran.r-project.org/web/packages/declared/index.html
> >
> > With a vignette here:
> >
> >
https://cran.r-project.org/web/packages/declared/vignettes/declared.pdf
> >
> > I do not know if it would easily satisfy your needs but it may be a
step along the way. A package called Haven was part of the motivation and Adrian
wanted a way to import data from external sources that had more than one
category of NA that sounds a bit like what you want. His functions should allow
the creation of such data within R, as well. I am including him in this email if
you want to contact him or he has something to say.
> >
> >
> > -----Original Message-----
> > From: R-help <r-help-bounces at r-project.org> On Behalf Of
Duncan Murdoch
> > Sent: Tuesday, December 21, 2021 5:26 AM
> > To: Marc Girondot <marc_grt at yahoo.fr>; r-help at
r-project.org
> > Subject: Re: [R] Creating NA equivalent
> >
> > On 20/12/2021 11:41 p.m., Marc Girondot via R-help wrote:
> > > Dear members,
> > >
> > > I work about dosage and some values are bellow the detection
limit. I
> > > would like create new "numbers" like LDL (to represent
lower than
> > > detection limit) and UDL (upper the detection limit) that behave
like
> > > NA, with the possibility to test them using for example is.LDL()
or
> > > is.UDL().
> > >
> > > Note that NA is not the same than LDL or UDL: NA represent
missing data.
> > > Here the data is available as LDL or UDL.
> > >
> > > NA is built in R language very deep... any option to create new
> > > version of NA-equivalent ?
> > >
> >
> > There was a discussion of this back in May.  Here's a link to one
approach that I suggested:
> >
> >    https://stat.ethz.ch/pipermail/r-devel/2021-May/080776.html
> >
> > Read the followup messages, I made at least one suggested improvement.
> > I don't know if anyone has packaged this, but there's a later
version of the code here:
> >
> >    https://stackoverflow.com/a/69179441/2554330
> >
> > Duncan Murdoch
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Avi Gross

2021-Dec-22 01:36 UTC

head link

[R] Creating NA equivalent

Jim,

there are indeed many mathematical areas where data are not quite fixed.
Consider inequalities such as a value that can be higher than some number but
lower than another. A grade of A can often mean a score between 90 and 100 (no
extra credit). An event deemed to be "significant at the 95% level of
probability can be in a 5% range or based on various errors, may not even be in
the range. Some places you can have infinitesimals or things approaching
infinity and yet sometimes cancel things out without having an exact number.

The list of such things is vast and as was already pointed out here, many such
cases have some info, even USEFUL info, that is lost if you declare them to be
an NA or an Inf or by say choosing to view an A is exactly 95. If a student has
straight A's, there is an excellent chance many of those A's came from
scores above 95. A student with an overall C average may be more likely to have
the single A be in the low 90's.

R was not necessarily designed to work this way. For some purposes, you may want
to use a variable that is more of a range. When I make plots in ggplot, I often
use Inf or -Inf to specify one end of a range, so that, for example, whatever
the data makes ggplot choose for upper and lower bounds, something I draw in the
background will extend to that border.

But there is a difference between how we store info, and how we use it. Many R
functions have a feature like saying na.rm=TRUE that may not make sense if you
store a value as an NA whose meaning is "between 95 and 100". You
might want to write code that makes two copies of any vector which has an NA
value associated with a range, and do something like place the minimum value(s)
in one and the maximum in the other and then do some complex calculation.

Or consider a value like measuring a room with a ruler accurate only to 1/4
inch? If a side is 100 inches, the real value can be between 99.75 and 100.25
inches. Each measurement can be stored as a number and a plus/minus. To
calculate the volume of a room, you might multiply all the low values to get one
number and the high values to get another and store that as a range or whatever
else makes send like averaging the two.

Still, some of that is normally ignored or done some other way, without
inventing new meanings for NA. I noted earlier that programs outside R will
often do something like store out-of-band info that when imported into R is
always treated as NA. Some thig may be unavailable because the person did not
show up, others because they had horrible handwriting and the one who typed it
in guessed what it said, and others who refused to answer . It may be that much
of your program should treat all those as NA but other parts might want to
record that some percent of the responders did this or that. As noted, Adrian
Dusa and others had such needs and have a package that in some way annotates NA
values when asked. I have played with it but currently have no need for it. And,
just FYI, Adrian tried other things first as there already are multiple bit
patterns that mean specific variation on an NA such as NA_integer_ (note the two
underscores) and other variants for character, real, complex and a few more. In
a bizarre way, you can play games and test them as in:

  > a=NA_integer_
  > b=NA_character_
  > identical(a, NA_integer_)
  [1] TRUE
  > identical(a, NA_character_)
  [1] FALSE
  > identical(a, a)
  [1] TRUE
  > identical(a, b)
  [1] FALSE
  > identical(a, NA)
  [1] FALSE

So, in THEORY, you might get away to using these oddball bitmap variations, or
adding to them but they do not survive well in vectors which must in some sense
only contain one type. I have had some minor success making a list and test the
contents, which normally show all version as NA but clearly retain subtle
differences:

  > temp=list(1, NA_integer_, 2, NA_character_, 3, NA)
  > temp
  [[1]]
  [1] 1

  [[2]]
  [1] NA

  [[3]]
  [1] 2

  [[4]]
  [1] NA

  [[5]]
  [1] 3

  [[6]]
  [1] NA

  > temp[[2]]
  [1] NA
  > identical(temp[[2]], NA_integer_)
  [1] TRUE
  > identical(temp[[2]], NA_character_)
  [1] FALSE
  > identical(temp[[4]], NA_character_)
  [1] TRUE

So, yes, I can imagine a subtle window of opportunity for re-using some of these
NA variants to act like an NA but also be able to carefully signal some other
opportunities. But as noted, vectors break the scheme so your data.frame might
need to use list columns, which is doable. I bet many tools you use, especially
ones that make copies or conversions, will break the scheme.

Please note that for ME, the above discussion is academic and a reaction to the
ideas raised by others. I am not in any way suggesting R is deficient for not
being designed for things like this, nor that wanting some such feature is a bad
thing. What Adrian provided is sort of in between as real NA are stored but also
some attributes record what the NA is supposed to represent.

-----Original Message-----
From: Jim Lemon <drjimlemon at gmail.com> 
Sent: Tuesday, December 21, 2021 5:00 PM
To: Avi Gross <avigross at verizon.net>
Cc: r-help mailing list <r-help at r-project.org>; Adrian Du?a
<dusa.adrian at unibuc.ro>
Subject: Re: [R] Creating NA equivalent

Please pardon a comment that may be off-target as well as off-topic.
This appears similar to a number of things like fuzzy logic, where an instance
can take incompatible truth values.

It is known that an instance may have an attribute with a numeric value, but
that value cannot be determined.

It seems to me that an appropriate designation for the value is Unk, perhaps
with an associated probability of determination to distinguish it from NA (it is
definitely not known).

Jim

On Wed, Dec 22, 2021 at 6:55 AM Avi Gross via R-help <r-help at
r-project.org> wrote:>
> I wonder if the package Adrian Du?a created might be helpful or point you
along the way.
>
> It was eventually named "declared"
>
> https://cran.r-project.org/web/packages/declared/index.html
>
> With a vignette here:
>
> https://cran.r-project.org/web/packages/declared/vignettes/declared.pd
> f
>
> I do not know if it would easily satisfy your needs but it may be a step
along the way. A package called Haven was part of the motivation and Adrian
wanted a way to import data from external sources that had more than one
category of NA that sounds a bit like what you want. His functions should allow
the creation of such data within R, as well. I am including him in this email if
you want to contact him or he has something to say.
>
>
> -----Original Message-----
> From: R-help <r-help-bounces at r-project.org> On Behalf Of Duncan 
> Murdoch
> Sent: Tuesday, December 21, 2021 5:26 AM
> To: Marc Girondot <marc_grt at yahoo.fr>; r-help at r-project.org
> Subject: Re: [R] Creating NA equivalent
>
> On 20/12/2021 11:41 p.m., Marc Girondot via R-help wrote:
> > Dear members,
> >
> > I work about dosage and some values are bellow the detection limit. 
> > I would like create new "numbers" like LDL (to represent
lower than
> > detection limit) and UDL (upper the detection limit) that behave 
> > like NA, with the possibility to test them using for example 
> > is.LDL() or is.UDL().
> >
> > Note that NA is not the same than LDL or UDL: NA represent missing
data.
> > Here the data is available as LDL or UDL.
> >
> > NA is built in R language very deep... any option to create new 
> > version of NA-equivalent ?
> >
>
> There was a discussion of this back in May.  Here's a link to one
approach that I suggested:
>
>    https://stat.ethz.ch/pipermail/r-devel/2021-May/080776.html
>
> Read the followup messages, I made at least one suggested improvement.
> I don't know if anyone has packaged this, but there's a later
version of the code here:
>
>    https://stackoverflow.com/a/69179441/2554330
>
> Duncan Murdoch
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Dec 2021 - Creating NA equivalent

[R] Creating NA equivalent

[R] Creating NA equivalent

[R] Creating NA equivalent