On Tue, May 24, 2016 at 9:30 AM, Jeroen Ooms <jeroen.ooms at
stat.ucla.edu>
wrote:
> On Tue, May 24, 2016 at 5:59 PM, Gabriel Becker <gmbecker at
ucdavis.edu>
> wrote:
> > Shouldn't Rf_mkString(NULL) return (the c-level equivalent of)
> character()
> > rather than the NA_character_?
>
> No. It should still be safe to assume that mkString() always returns a
> character vector of exactly length one. Anything else could lead to
> type errors.
>
Well the thing is you're passing an invalid pointer, that doesn't point
to
a C string, to a constructor expecting a valid const char *. I'm fine with
the contract being that mkString always returns a character vector of
length one, but that doesn't necessarily mean that the function needs to
accept NULL pointers. The contract as I understand it is that if you give
it a C string, it will create a CHARSXP for that string. In this light,
Bill's suggestion that it throw an error seems the most principled
response. I would think you would need to at the very least emit a warning.
>
> > An empty string and NULL aren't the same.
>
> Exactly! So if you pass in an empty C string, you get an empty R
> string, and if you pass in a null pointer you get NA.
>
> Rf_mkString(NULL) <--> NA
> Rf_mkString("") <--> ""
>
> There is no ambiguity, and much better than segfaulting.
>
Well, better than segfaulting is not really relevant here. No one is
arguing that it should segfault. The question is what behavior it should
have when it doesn't segfault.
It's true that a C empty string is not the same as NULL, but NULL isn't
the
same as NA either. Semantically, for your use-case (which I gather arose
from interactions we had :) ) the NULL means there is no version, while NA
indicates there is a version but we don't know what it is. Imagine an
object class that represents a persons name (first, middle, last). Now take
two people, One has no middle name (and we know that when creating the
object) and another for whom we don't have any information about the middle
name, only first and last were reported. I would expect the first one to
have middle name either NULL or (in a data.frame context) "", while
the
second would have NA_character_. In this light, mkString should arguably
generate "". i don't think the fact that there is another way to
get "" is
a particularly large problem.
On the other hand, and in support of your position it came up as Michael
Lawrence and I were talking about this that asChar from utils.c will give
you NA_STRING when you give it R_NilValue. That is a coercion though,
whereas arguably mkString is not. That said, consistency would probably be
good.
~G
--
Gabriel Becker, PhD
Associate Scientist (Bioinformatics)
Genentech Research
[[alternative HTML version deleted]]