On Tue, May 24, 2016 at 9:30 AM, Jeroen Ooms <jeroen.ooms at stat.ucla.edu> wrote:> On Tue, May 24, 2016 at 5:59 PM, Gabriel Becker <gmbecker at ucdavis.edu> > wrote: > > Shouldn't Rf_mkString(NULL) return (the c-level equivalent of) > character() > > rather than the NA_character_? > > No. It should still be safe to assume that mkString() always returns a > character vector of exactly length one. Anything else could lead to > type errors. >Well the thing is you're passing an invalid pointer, that doesn't point to a C string, to a constructor expecting a valid const char *. I'm fine with the contract being that mkString always returns a character vector of length one, but that doesn't necessarily mean that the function needs to accept NULL pointers. The contract as I understand it is that if you give it a C string, it will create a CHARSXP for that string. In this light, Bill's suggestion that it throw an error seems the most principled response. I would think you would need to at the very least emit a warning.> > > An empty string and NULL aren't the same. > > Exactly! So if you pass in an empty C string, you get an empty R > string, and if you pass in a null pointer you get NA. > > Rf_mkString(NULL) <--> NA > Rf_mkString("") <--> "" > > There is no ambiguity, and much better than segfaulting. >Well, better than segfaulting is not really relevant here. No one is arguing that it should segfault. The question is what behavior it should have when it doesn't segfault. It's true that a C empty string is not the same as NULL, but NULL isn't the same as NA either. Semantically, for your use-case (which I gather arose from interactions we had :) ) the NULL means there is no version, while NA indicates there is a version but we don't know what it is. Imagine an object class that represents a persons name (first, middle, last). Now take two people, One has no middle name (and we know that when creating the object) and another for whom we don't have any information about the middle name, only first and last were reported. I would expect the first one to have middle name either NULL or (in a data.frame context) "", while the second would have NA_character_. In this light, mkString should arguably generate "". i don't think the fact that there is another way to get "" is a particularly large problem. On the other hand, and in support of your position it came up as Michael Lawrence and I were talking about this that asChar from utils.c will give you NA_STRING when you give it R_NilValue. That is a coercion though, whereas arguably mkString is not. That said, consistency would probably be good. ~G -- Gabriel Becker, PhD Associate Scientist (Bioinformatics) Genentech Research [[alternative HTML version deleted]]
>>>>> Gabriel Becker <gmbecker at ucdavis.edu> >>>>> on Tue, 24 May 2016 10:30:48 -0700 writes:> On Tue, May 24, 2016 at 9:30 AM, Jeroen Ooms <jeroen.ooms at stat.ucla.edu> > wrote: >> On Tue, May 24, 2016 at 5:59 PM, Gabriel Becker <gmbecker at ucdavis.edu> >> wrote: >> > Shouldn't Rf_mkString(NULL) return (the c-level equivalent of) >> character() >> > rather than the NA_character_? >> >> No. It should still be safe to assume that mkString() always returns a >> character vector of exactly length one. Anything else could lead to >> type errors. >> > Well the thing is you're passing an invalid pointer, that doesn't point to > a C string, to a constructor expecting a valid const char *. I'm fine with > the contract being that mkString always returns a character vector of > length one, but that doesn't necessarily mean that the function needs to > accept NULL pointers. The contract as I understand it is that if you give > it a C string, it will create a CHARSXP for that string. In this light, > Bill's suggestion that it throw an error seems the most principled > response. I would think you would need to at the very least emit a warning. I agree with Jerooen that mkChar() and mkString() may be used in contexts where they can end up with a NULL and hence should not segfault... and hence am willing the extra (very small) penalty of checking for NULL. >> >> > An empty string and NULL aren't the same. >> >> Exactly! So if you pass in an empty C string, you get an empty R >> string, and if you pass in a null pointer you get NA. >> >> Rf_mkString(NULL) <--> NA >> Rf_mkString("") <--> "" >> >> There is no ambiguity, and much better than segfaulting. Better than segfaulting, yes, but really agree with Bill (and Gabe), also for Rf_mkChar(NULL): I think both functions should give an error in such a case rather than returning NA_character_ It is an accident of some kind if they got NULL, no? -- Martin Maechler, ETH Zurich > Well, better than segfaulting is not really relevant here. No one is > arguing that it should segfault. The question is what behavior it should > have when it doesn't segfault. > It's true that a C empty string is not the same as NULL, but NULL isn't the > same as NA either. Semantically, for your use-case (which I gather arose > from interactions we had :) ) the NULL means there is no version, while NA > indicates there is a version but we don't know what it is. Imagine an > object class that represents a persons name (first, middle, last). Now take > two people, One has no middle name (and we know that when creating the > object) and another for whom we don't have any information about the middle > name, only first and last were reported. I would expect the first one to > have middle name either NULL or (in a data.frame context) "", while the > second would have NA_character_. In this light, mkString should arguably > generate "". i don't think the fact that there is another way to get "" is > a particularly large problem. > On the other hand, and in support of your position it came up as Michael > Lawrence and I were talking about this that asChar from utils.c will give > you NA_STRING when you give it R_NilValue. That is a coercion though, > whereas arguably mkString is not. That said, consistency would probably be > good. > ~G > -- > Gabriel Becker, PhD > Associate Scientist (Bioinformatics) > Genentech Research > [[alternative HTML version deleted]] > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
On Wed, May 25, 2016 at 12:31 PM, Martin Maechler <maechler at stat.math.ethz.ch> wrote:> Better than segfaulting, yes, but really agree with Bill (and > Gabe), also for Rf_mkChar(NULL): > I think both functions should give an error in such a case > rather than returning NA_character_ > > It is an accident of some kind if they got NULL, no?Not necessarily. A char* of NULL can be a string which is not initiated or simply unavailable due to configuration. The example from my original email was in curl package which exposes the version string of libz that was used to build libcurl: mkString(data->libz_version) This worked on all platforms that I tested. However a user found that if libcurl was configured --without-libz (which is uncommon) the libz_version string does not get set by libcurl and is always NULL. I had not foreseen this and it would lead to a segfault. I think making mkString() return NA for null strings lead to the most robust behavior. Raising an exception seems a little harsh to me, as there is no way the user would be able to recover from this, and there might not be an actual problem at all.