After 5 minutes more thought: - code non-missing as missingKind = NA, not 0, so that missingKind could be a character vector, or missingKind = 0 could be supported. - print methods should return the main argument, so mine should be print.MultiMissing <- function(x, ...) { vals <- as.character(x) if (!is.character(x) || inherits(x, "noquote")) print(noquote(vals)) else print(vals) invisible(x) } This still needs a lot of improvement to be a good print method, but I'll leave that to you. Duncan Murdoch On 26/05/2021 11:43 a.m., Duncan Murdoch wrote:> On 26/05/2021 10:22 a.m., Adrian Du?a wrote: >> Dear?Duncan, >> >> On Wed, May 26, 2021 at 2:27 AM Duncan Murdoch <murdoch.duncan at gmail.com >> <mailto:murdoch.duncan at gmail.com>> wrote: >> >> You've already been told how to solve this:? just add attributes to the >> objects. Use the standard NA to indicate that there is some kind of >> missingness, and the attribute to describe exactly what it is.? Stick a >> class on those objects and define methods so that subsetting and >> arithmetic preserves the extra info you've added. If you do some >> operation that turns those NAs into NaNs, big deal:? the attribute will >> still be there, and is.na <is.na>(NaN) still returns TRUE. >> >> >> I've already tried the attributes way, it is not so easy. > > If you have specific operations that are needed but that you can't get > to work, post the issue here. > >> In the best case scenario, it unnecessarily triples the size of the >> data, but perhaps this is the only way forward. > > I don't see how it could triple the size. Surely an integer has enough > values to cover all possible kinds of missingness. So on integer or > factor data you'd double the size, on real or character data you'd > increase it by 50%. (This is assuming you're on a 64 bit platform with > 32 bit integers and 64 bit reals and pointers.) > > Here's a tiny implementation to show what I'm talking about: > > asMultiMissing <- function(x) { > if (isMultiMissing(x)) > return(x) > missingKind <- ifelse(is.na(x), 1, 0) > structure(x, > missingKind = missingKind, > class = c("MultiMissing", class(x))) > } > > isMultiMissing <- function(x) > inherits(x, "MultiMissing") > > missingKind <- function(x) { > if (isMultiMissing(x)) > attr(x, "missingKind") > else > ifelse(is.na(x), 1, 0) > } > > `missingKind<-` <- function(x, value) { > class(x) <- setdiff(class(x), "MultiMissing") > x[value != 0] <- NA > x <- asMultiMissing(x) > attr(x, "missingKind") <- value > x > } > > `[.MultiMissing` <- function(x, i, ...) { > missings <- missingKind(x) > x <- NextMethod() > missings <- missings[i] > missingKind(x) <- missings > x > } > > print.MultiMissing <- function(x, ...) { > vals <- as.character(x) > if (!is.character(x) || inherits(x, "noquote")) > print(noquote(vals)) > else > print(vals) > } > > `[<-.MultiMissing` <- function(x, i, value, ...) { > missings <- missingKind(x) > class(x) <- setdiff(class(x), "MultiMissing") > x[i] <- value > missings[i] <- missingKind(value) > missingKind(x) <- missings > x > } > > as.character.MultiMissing <- function(x, ...) { > missings <- missingKind(x) > result <- NextMethod() > ifelse(missings != 0, > paste0("NA.", missings), result) > > } > > This is incomplete. It doesn't do printing very well, and it doesn't > handle the case of assigning a MultiMissing value to a regular vector at > all. (I think you'd need an S4 implementation if you want to support > that.) But it does the basics: > > > x <- 1:10 > > missingKind(x)[4] <- 23 > > x > [1] 1 2 3 NA.23 5 6 7 8 9 > [10] 10 > > is.na(x) > [1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE > [10] FALSE > > missingKind(x) > [1] 0 0 0 23 0 0 0 0 0 0 > > > > Duncan Murdoch > >> >> Base R doesn't need anything else. >> >> You complained that users shouldn't need to know about attributes, and >> they won't:? you, as the author of the package that does this, will >> handle all those details.? Working in your subject area you know all >> the >> different kinds of NAs that people care about, and how they code >> them in >> input data, so you can make it all totally transparent.? If you do it >> well, someone in some other subject area with a completely different >> set >> of kinds of missingness will be able to adapt your code to their use. >> >> >> But that is the whole point: the package author does not define possible >> NAs (the possibilities are infinite), users do that. >> The package should only provide a simple method to achieve that. >> >> >> I imagine this has all been done in one of the thousands of packages on >> CRAN, but if it hasn't been done well enough for you, do it better. >> >> >> If it were, I would have found it by now... >> >> Best wishes, >> Adrian >
Yes, that is even better. Best, Adrian On Wed, May 26, 2021 at 7:05 PM Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> After 5 minutes more thought: > > - code non-missing as missingKind = NA, not 0, so that missingKind could > be a character vector, or missingKind = 0 could be supported. > > - print methods should return the main argument, so mine should be > > print.MultiMissing <- function(x, ...) { > vals <- as.character(x) > if (!is.character(x) || inherits(x, "noquote")) > print(noquote(vals)) > else > print(vals) > invisible(x) > } > > This still needs a lot of improvement to be a good print method, but > I'll leave that to you. > > Duncan Murdoch > > On 26/05/2021 11:43 a.m., Duncan Murdoch wrote: > > On 26/05/2021 10:22 a.m., Adrian Du?a wrote: > >> Dear Duncan, > >> > >> On Wed, May 26, 2021 at 2:27 AM Duncan Murdoch < > murdoch.duncan at gmail.com > >> <mailto:murdoch.duncan at gmail.com>> wrote: > >> > >> You've already been told how to solve this: just add attributes > to the > >> objects. Use the standard NA to indicate that there is some kind of > >> missingness, and the attribute to describe exactly what it is. > Stick a > >> class on those objects and define methods so that subsetting and > >> arithmetic preserves the extra info you've added. If you do some > >> operation that turns those NAs into NaNs, big deal: the attribute > will > >> still be there, and is.na <is.na>(NaN) still returns TRUE. > >> > >> > >> I've already tried the attributes way, it is not so easy. > > > > If you have specific operations that are needed but that you can't get > > to work, post the issue here. > > > >> In the best case scenario, it unnecessarily triples the size of the > >> data, but perhaps this is the only way forward. > > > > I don't see how it could triple the size. Surely an integer has enough > > values to cover all possible kinds of missingness. So on integer or > > factor data you'd double the size, on real or character data you'd > > increase it by 50%. (This is assuming you're on a 64 bit platform with > > 32 bit integers and 64 bit reals and pointers.) > > > > Here's a tiny implementation to show what I'm talking about: > > > > asMultiMissing <- function(x) { > > if (isMultiMissing(x)) > > return(x) > > missingKind <- ifelse(is.na(x), 1, 0) > > structure(x, > > missingKind = missingKind, > > class = c("MultiMissing", class(x))) > > } > > > > isMultiMissing <- function(x) > > inherits(x, "MultiMissing") > > > > missingKind <- function(x) { > > if (isMultiMissing(x)) > > attr(x, "missingKind") > > else > > ifelse(is.na(x), 1, 0) > > } > > > > `missingKind<-` <- function(x, value) { > > class(x) <- setdiff(class(x), "MultiMissing") > > x[value != 0] <- NA > > x <- asMultiMissing(x) > > attr(x, "missingKind") <- value > > x > > } > > > > `[.MultiMissing` <- function(x, i, ...) { > > missings <- missingKind(x) > > x <- NextMethod() > > missings <- missings[i] > > missingKind(x) <- missings > > x > > } > > > > print.MultiMissing <- function(x, ...) { > > vals <- as.character(x) > > if (!is.character(x) || inherits(x, "noquote")) > > print(noquote(vals)) > > else > > print(vals) > > } > > > > `[<-.MultiMissing` <- function(x, i, value, ...) { > > missings <- missingKind(x) > > class(x) <- setdiff(class(x), "MultiMissing") > > x[i] <- value > > missings[i] <- missingKind(value) > > missingKind(x) <- missings > > x > > } > > > > as.character.MultiMissing <- function(x, ...) { > > missings <- missingKind(x) > > result <- NextMethod() > > ifelse(missings != 0, > > paste0("NA.", missings), result) > > > > } > > > > This is incomplete. It doesn't do printing very well, and it doesn't > > handle the case of assigning a MultiMissing value to a regular vector at > > all. (I think you'd need an S4 implementation if you want to support > > that.) But it does the basics: > > > > > x <- 1:10 > > > missingKind(x)[4] <- 23 > > > x > > [1] 1 2 3 NA.23 5 6 7 8 9 > > [10] 10 > > > is.na(x) > > [1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE > > [10] FALSE > > > missingKind(x) > > [1] 0 0 0 23 0 0 0 0 0 0 > > > > > > > Duncan Murdoch > > > >> > >> Base R doesn't need anything else. > >> > >> You complained that users shouldn't need to know about attributes, > and > >> they won't: you, as the author of the package that does this, will > >> handle all those details. Working in your subject area you know > all > >> the > >> different kinds of NAs that people care about, and how they code > >> them in > >> input data, so you can make it all totally transparent. If you do > it > >> well, someone in some other subject area with a completely > different > >> set > >> of kinds of missingness will be able to adapt your code to their > use. > >> > >> > >> But that is the whole point: the package author does not define possible > >> NAs (the possibilities are infinite), users do that. > >> The package should only provide a simple method to achieve that. > >> > >> > >> I imagine this has all been done in one of the thousands of > packages on > >> CRAN, but if it hasn't been done well enough for you, do it better. > >> > >> > >> If it were, I would have found it by now... > >> > >> Best wishes, > >> Adrian > > > >[[alternative HTML version deleted]]
I would be glad to add this to one of my R packages, probably `gdata`.. -G Gregory R. Warnes, Ph.D. greg at warnes.net Eternity is a long time, take a friend!> On May 26, 2021, at 1:09 PM, Adrian Du?a <dusa.adrian at gmail.com> wrote: > > Yes, that is even better. > Best, > Adrian > > On Wed, May 26, 2021 at 7:05 PM Duncan Murdoch <murdoch.duncan at gmail.com <mailto:murdoch.duncan at gmail.com>> > wrote: > >> After 5 minutes more thought: >> >> - code non-missing as missingKind = NA, not 0, so that missingKind could >> be a character vector, or missingKind = 0 could be supported. >> >> - print methods should return the main argument, so mine should be >> >> print.MultiMissing <- function(x, ...) { >> vals <- as.character(x) >> if (!is.character(x) || inherits(x, "noquote")) >> print(noquote(vals)) >> else >> print(vals) >> invisible(x) >> } >> >> This still needs a lot of improvement to be a good print method, but >> I'll leave that to you. >> >> Duncan Murdoch >> >> On 26/05/2021 11:43 a.m., Duncan Murdoch wrote: >>> On 26/05/2021 10:22 a.m., Adrian Du?a wrote: >>>> Dear Duncan, >>>> >>>> On Wed, May 26, 2021 at 2:27 AM Duncan Murdoch < >> murdoch.duncan at gmail.com >>>> <mailto:murdoch.duncan at gmail.com>> wrote: >>>> >>>> You've already been told how to solve this: just add attributes >> to the >>>> objects. Use the standard NA to indicate that there is some kind of >>>> missingness, and the attribute to describe exactly what it is. >> Stick a >>>> class on those objects and define methods so that subsetting and >>>> arithmetic preserves the extra info you've added. If you do some >>>> operation that turns those NAs into NaNs, big deal: the attribute >> will >>>> still be there, and is.na <is.na>(NaN) still returns TRUE. >>>> >>>> >>>> I've already tried the attributes way, it is not so easy. >>> >>> If you have specific operations that are needed but that you can't get >>> to work, post the issue here. >>> >>>> In the best case scenario, it unnecessarily triples the size of the >>>> data, but perhaps this is the only way forward. >>> >>> I don't see how it could triple the size. Surely an integer has enough >>> values to cover all possible kinds of missingness. So on integer or >>> factor data you'd double the size, on real or character data you'd >>> increase it by 50%. (This is assuming you're on a 64 bit platform with >>> 32 bit integers and 64 bit reals and pointers.) >>> >>> Here's a tiny implementation to show what I'm talking about: >>> >>> asMultiMissing <- function(x) { >>> if (isMultiMissing(x)) >>> return(x) >>> missingKind <- ifelse(is.na(x), 1, 0) >>> structure(x, >>> missingKind = missingKind, >>> class = c("MultiMissing", class(x))) >>> } >>> >>> isMultiMissing <- function(x) >>> inherits(x, "MultiMissing") >>> >>> missingKind <- function(x) { >>> if (isMultiMissing(x)) >>> attr(x, "missingKind") >>> else >>> ifelse(is.na(x), 1, 0) >>> } >>> >>> `missingKind<-` <- function(x, value) { >>> class(x) <- setdiff(class(x), "MultiMissing") >>> x[value != 0] <- NA >>> x <- asMultiMissing(x) >>> attr(x, "missingKind") <- value >>> x >>> } >>> >>> `[.MultiMissing` <- function(x, i, ...) { >>> missings <- missingKind(x) >>> x <- NextMethod() >>> missings <- missings[i] >>> missingKind(x) <- missings >>> x >>> } >>> >>> print.MultiMissing <- function(x, ...) { >>> vals <- as.character(x) >>> if (!is.character(x) || inherits(x, "noquote")) >>> print(noquote(vals)) >>> else >>> print(vals) >>> } >>> >>> `[<-.MultiMissing` <- function(x, i, value, ...) { >>> missings <- missingKind(x) >>> class(x) <- setdiff(class(x), "MultiMissing") >>> x[i] <- value >>> missings[i] <- missingKind(value) >>> missingKind(x) <- missings >>> x >>> } >>> >>> as.character.MultiMissing <- function(x, ...) { >>> missings <- missingKind(x) >>> result <- NextMethod() >>> ifelse(missings != 0, >>> paste0("NA.", missings), result) >>> >>> } >>> >>> This is incomplete. It doesn't do printing very well, and it doesn't >>> handle the case of assigning a MultiMissing value to a regular vector at >>> all. (I think you'd need an S4 implementation if you want to support >>> that.) But it does the basics: >>> >>>> x <- 1:10 >>>> missingKind(x)[4] <- 23 >>>> x >>> [1] 1 2 3 NA.23 5 6 7 8 9 >>> [10] 10 >>>> is.na(x) >>> [1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE >>> [10] FALSE >>>> missingKind(x) >>> [1] 0 0 0 23 0 0 0 0 0 0 >>>> >>> >>> Duncan Murdoch >>> >>>> >>>> Base R doesn't need anything else. >>>> >>>> You complained that users shouldn't need to know about attributes, >> and >>>> they won't: you, as the author of the package that does this, will >>>> handle all those details. Working in your subject area you know >> all >>>> the >>>> different kinds of NAs that people care about, and how they code >>>> them in >>>> input data, so you can make it all totally transparent. If you do >> it >>>> well, someone in some other subject area with a completely >> different >>>> set >>>> of kinds of missingness will be able to adapt your code to their >> use. >>>> >>>> >>>> But that is the whole point: the package author does not define possible >>>> NAs (the possibilities are infinite), users do that. >>>> The package should only provide a simple method to achieve that. >>>> >>>> >>>> I imagine this has all been done in one of the thousands of >> packages on >>>> CRAN, but if it hasn't been done well enough for you, do it better. >>>> >>>> >>>> If it were, I would have found it by now... >>>> >>>> Best wishes, >>>> Adrian >>> >> >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org <mailto:R-devel at r-project.org> mailing list > stat.ethz.ch/mailman/listinfo/r-devel <stat.ethz.ch/mailman/listinfo/r-devel>[[alternative HTML version deleted]]