Mike Miller
2014-Dec-24 20:39 UTC
[R] ave(x, y, FUN=length) produces character output when x is character
On Wed, 24 Dec 2014, Bert Gunter wrote:> You said: > "The elements of the first vector are irrelevant because they are only > counted, so we should get the same result if it were a character > vector, but we don't: " > > You don't get to invent your own rules! ?ave -- always nice to read the > Help docs **before posting** -- clearly states that the x argument must > be __numeric__. So if you choose to ignore what you are told, you do so > at your own risk. Who knows what you'll get? -- it's a user error, not a > bug.I guess the goal is to humiliate the person who posted the question. I've had trouble convincing doctoral students in biostat to post questions here because they are afraid of being treated like dirt. It doesn't bother me personally, but I see it as counterproductive. The code I was working with was written by such a student and it has been in CRAN for a couple of years. I'm just trying to fix it. Your comment is helpful, but it would have been even better without the hostile tone. Regarding the way ave() works -- why doesn't it check that the input vector is numeric? Apparently, integer input is acceptable. Does numeric sometimes mean "numeric" and sometimes "either 'integer' or 'numeric'"? Either way, if character is unacceptable, it could throw an error instead of pumping out an almost-correct answer. That made it much harder to track down the bug in the code base I was working on. Also, regarding the sacred text, "x A numeric." is a bit terse. The same text later refers to length(x), so I suspect that "A numeric" is short for "A numeric vector", but that might not mean "a vector of 'numeric' type." https://stat.ethz.ch/R-manual/R-devel/library/stats/html/ave.html> And if (my understanding of) what you say is the case, this whole post > is silly. See ?table to do exactly what you claim is wanted without > trying to invent square wheels.table() counts elements but it has to repeat them in the proper pattern. For every element of a vector we want to know how many times it occurs in that vector. So if the vector is c("A","A","B","C","C","C") the output should be c(2,2,1,3,3,3). I'm sure we all know that table() will count the elements, but it doesn't place them in a vector as desired. I can do this with a character vector:> charvec <- c("A","A","B","C","C","C") > as.vector(( table( charvec )[charvec] ))[1] 2 2 1 3 3 3 It's slightly trickier with an integer vector:> intvec <- c(4,4,5,6,6,6) > table( intvec )[intvec]intvec <NA> <NA> <NA> <NA> <NA> <NA> NA NA NA NA NA NA> as.vector(table( intvec )[as.character(intvec)])[1] 2 2 1 3 3 3 So I think this will always work for vectors of either type: as.vector(table( as.character(vec) )[as.character(vec)]) To me that looks like the right way to do it. Think so? Best, Mike> On Wed, Dec 24, 2014 at 11:30 AM, Mike Miller <mbmiller+l at gmail.com> wrote: >> R 3.0.1 on Linux 64... >> >> I was working with someone else's code. They were using ave() in a way that >> I guess is nonstandard: Isn't FUN always supposed to be a variant of >> mean()? The idea was to count for every element of a factor vector how many >> times the level of that element occurs in the factor vector. >> >> >> gl() makes a factor: >> >>> gl(2,2,5) >> >> [1] 1 1 2 2 1 >> Levels: 1 2 >> >> >> ave() applies FUN to produce the desired count, and it works: >> >>> ave( 1:5, gl(2,2,5), FUN=length ) >> >> [1] 3 3 2 2 3 >> >> >> The elements of the first vector are irrelevant because they are only >> counted, so we should get the same result if it were a character vector, but >> we don't: >> >>> ave( as.character(1:5), gl(2,2,5), FUN=length ) >> >> [1] "3" "3" "2" "2" "3" >> >> The output has character type, but it is supposed to be a collection of >> vector lengths. >> >> >> Two questions: >> >> (1) Is that a bug in ave()? It certainly is unexpected. >> >> (2) What is the best way to do this sort of thing? >> >> The truth is that we start with a character vector and we want to create an >> integer vector that tells us for every element of the character vector how >> many times that string occurs. Here are two vectors of length 6 that should >> give the same result: >> >>> intvec <- c(4,5,6,5,6,6) >>> charvec <- c("A","B","C","B","C","C") >> >> >> The code was used like this with integer vectors and it seemed to work: >> >>> ave( intvec, intvec, FUN=length ) >> >> [1] 1 2 3 2 3 3 >> >> When a character vector came along, it would fail by producing a character >> vector as output: >> >>> ave( charvec, charvec, FUN=length ) >> >> [1] "1" "2" "3" "2" "3" "3" >> >> This seems more appropriate, and it might always work, but is it OK?: >> >>> ave( rep(1, length(charvec)), as.factor(charvec), FUN=sum ) >> >> [1] 1 2 3 2 3 3 >> >> I suspect that ave() isn't the best choice, but what is the best way to do >> this? >> >> >> Thanks in advance. >> >> Mike >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >
Mike Miller
2014-Dec-25 02:49 UTC
[R] ave(x, y, FUN=length) produces character output when x is character
On Wed, 24 Dec 2014, Mike Miller wrote:> Also, regarding the sacred text, "x A numeric." is a bit terse. The > same text later refers to length(x), so I suspect that "A numeric" is > short for "A numeric vector", but that might not mean "a vector of > 'numeric' type."I just realized that numeric type includes integer so that anything of type integer also is type numeric. I'm working on another message. Mike
Jeff Newmiller
2014-Dec-25 04:19 UTC
[R] ave(x, y, FUN=length) produces character output when x is character
But all numeric types in R are vectors. So although it might be a good idea to be redundant to aid beginners, the phrase "a numeric" is accurate. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. On December 24, 2014 6:49:47 PM PST, Mike Miller <mbmiller+l at gmail.com> wrote:>On Wed, 24 Dec 2014, Mike Miller wrote: > >> Also, regarding the sacred text, "x A numeric." is a bit terse. The >> same text later refers to length(x), so I suspect that "A numeric" is > >> short for "A numeric vector", but that might not mean "a vector of >> 'numeric' type." > > >I just realized that numeric type includes integer so that anything of >type integer also is type numeric. I'm working on another message. > >Mike > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.