Mike Miller
2014-Dec-24 19:30 UTC
[R] ave(x, y, FUN=length) produces character output when x is character
R 3.0.1 on Linux 64... I was working with someone else's code. They were using ave() in a way that I guess is nonstandard: Isn't FUN always supposed to be a variant of mean()? The idea was to count for every element of a factor vector how many times the level of that element occurs in the factor vector. gl() makes a factor:> gl(2,2,5)[1] 1 1 2 2 1 Levels: 1 2 ave() applies FUN to produce the desired count, and it works:> ave( 1:5, gl(2,2,5), FUN=length )[1] 3 3 2 2 3 The elements of the first vector are irrelevant because they are only counted, so we should get the same result if it were a character vector, but we don't:> ave( as.character(1:5), gl(2,2,5), FUN=length )[1] "3" "3" "2" "2" "3" The output has character type, but it is supposed to be a collection of vector lengths. Two questions: (1) Is that a bug in ave()? It certainly is unexpected. (2) What is the best way to do this sort of thing? The truth is that we start with a character vector and we want to create an integer vector that tells us for every element of the character vector how many times that string occurs. Here are two vectors of length 6 that should give the same result:> intvec <- c(4,5,6,5,6,6) > charvec <- c("A","B","C","B","C","C")The code was used like this with integer vectors and it seemed to work:> ave( intvec, intvec, FUN=length )[1] 1 2 3 2 3 3 When a character vector came along, it would fail by producing a character vector as output:> ave( charvec, charvec, FUN=length )[1] "1" "2" "3" "2" "3" "3" This seems more appropriate, and it might always work, but is it OK?:> ave( rep(1, length(charvec)), as.factor(charvec), FUN=sum )[1] 1 2 3 2 3 3 I suspect that ave() isn't the best choice, but what is the best way to do this? Thanks in advance. Mike
Bert Gunter
2014-Dec-24 19:49 UTC
[R] ave(x, y, FUN=length) produces character output when x is character
You said: "The elements of the first vector are irrelevant because they are only counted, so we should get the same result if it were a character vector, but we don't: " You don't get to invent your own rules! ?ave -- always nice to read the Help docs **before posting** -- clearly states that the x argument must be __numeric__. So if you choose to ignore what you are told, you do so at your own risk. Who knows what you'll get? -- it's a user error, not a bug. And if (my understanding of) what you say is the case, this whole post is silly. See ?table to do exactly what you claim is wanted without trying to invent square wheels. Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Wed, Dec 24, 2014 at 11:30 AM, Mike Miller <mbmiller+l at gmail.com> wrote:> R 3.0.1 on Linux 64... > > I was working with someone else's code. They were using ave() in a way that > I guess is nonstandard: Isn't FUN always supposed to be a variant of > mean()? The idea was to count for every element of a factor vector how many > times the level of that element occurs in the factor vector. > > > gl() makes a factor: > >> gl(2,2,5) > > [1] 1 1 2 2 1 > Levels: 1 2 > > > ave() applies FUN to produce the desired count, and it works: > >> ave( 1:5, gl(2,2,5), FUN=length ) > > [1] 3 3 2 2 3 > > > The elements of the first vector are irrelevant because they are only > counted, so we should get the same result if it were a character vector, but > we don't: > >> ave( as.character(1:5), gl(2,2,5), FUN=length ) > > [1] "3" "3" "2" "2" "3" > > The output has character type, but it is supposed to be a collection of > vector lengths. > > > Two questions: > > (1) Is that a bug in ave()? It certainly is unexpected. > > (2) What is the best way to do this sort of thing? > > The truth is that we start with a character vector and we want to create an > integer vector that tells us for every element of the character vector how > many times that string occurs. Here are two vectors of length 6 that should > give the same result: > >> intvec <- c(4,5,6,5,6,6) >> charvec <- c("A","B","C","B","C","C") > > > The code was used like this with integer vectors and it seemed to work: > >> ave( intvec, intvec, FUN=length ) > > [1] 1 2 3 2 3 3 > > When a character vector came along, it would fail by producing a character > vector as output: > >> ave( charvec, charvec, FUN=length ) > > [1] "1" "2" "3" "2" "3" "3" > > This seems more appropriate, and it might always work, but is it OK?: > >> ave( rep(1, length(charvec)), as.factor(charvec), FUN=sum ) > > [1] 1 2 3 2 3 3 > > I suspect that ave() isn't the best choice, but what is the best way to do > this? > > > Thanks in advance. > > Mike > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Nordlund, Dan (DSHS/RDA)
2014-Dec-24 20:06 UTC
[R] ave(x, y, FUN=length) produces character output when x is character
> -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Mike > Miller > Sent: Wednesday, December 24, 2014 11:31 AM > To: R-Help List > Subject: [R] ave(x, y, FUN=length) produces character output when x is > character > > R 3.0.1 on Linux 64... > > I was working with someone else's code. They were using ave() in a way > that I guess is nonstandard: Isn't FUN always supposed to be a variant > of > mean()? The idea was to count for every element of a factor vector how > many times the level of that element occurs in the factor vector. > > > gl() makes a factor: > > > gl(2,2,5) > [1] 1 1 2 2 1 > Levels: 1 2 > > > ave() applies FUN to produce the desired count, and it works: > > > ave( 1:5, gl(2,2,5), FUN=length ) > [1] 3 3 2 2 3 > > > The elements of the first vector are irrelevant because they are only > counted, so we should get the same result if it were a character > vector, > but we don't: > > > ave( as.character(1:5), gl(2,2,5), FUN=length ) > [1] "3" "3" "2" "2" "3" > > The output has character type, but it is supposed to be a collection of > vector lengths. > > > Two questions: > > (1) Is that a bug in ave()? It certainly is unexpected. > > (2) What is the best way to do this sort of thing? > > The truth is that we start with a character vector and we want to > create > an integer vector that tells us for every element of the character > vector > how many times that string occurs. Here are two vectors of length 6 > that > should give the same result: > > > intvec <- c(4,5,6,5,6,6) > > charvec <- c("A","B","C","B","C","C") > > The code was used like this with integer vectors and it seemed to work: > > > ave( intvec, intvec, FUN=length ) > [1] 1 2 3 2 3 3 > > When a character vector came along, it would fail by producing a > character > vector as output: > > > ave( charvec, charvec, FUN=length ) > [1] "1" "2" "3" "2" "3" "3" > > This seems more appropriate, and it might always work, but is it OK?: > > > ave( rep(1, length(charvec)), as.factor(charvec), FUN=sum ) > [1] 1 2 3 2 3 3 > > I suspect that ave() isn't the best choice, but what is the best way to > do > this? > > > Thanks in advance. > > MikeFor your character vector example, this will get you the counts. table(charvec)[charvec] Hope this is helpful, Dan Daniel J. Nordlund, PhD Research and Data Analysis Division Services & Enterprise Support Administration Washington State Department of Social and Health Services
Mike Miller
2014-Dec-24 20:39 UTC
[R] ave(x, y, FUN=length) produces character output when x is character
On Wed, 24 Dec 2014, Bert Gunter wrote:> You said: > "The elements of the first vector are irrelevant because they are only > counted, so we should get the same result if it were a character > vector, but we don't: " > > You don't get to invent your own rules! ?ave -- always nice to read the > Help docs **before posting** -- clearly states that the x argument must > be __numeric__. So if you choose to ignore what you are told, you do so > at your own risk. Who knows what you'll get? -- it's a user error, not a > bug.I guess the goal is to humiliate the person who posted the question. I've had trouble convincing doctoral students in biostat to post questions here because they are afraid of being treated like dirt. It doesn't bother me personally, but I see it as counterproductive. The code I was working with was written by such a student and it has been in CRAN for a couple of years. I'm just trying to fix it. Your comment is helpful, but it would have been even better without the hostile tone. Regarding the way ave() works -- why doesn't it check that the input vector is numeric? Apparently, integer input is acceptable. Does numeric sometimes mean "numeric" and sometimes "either 'integer' or 'numeric'"? Either way, if character is unacceptable, it could throw an error instead of pumping out an almost-correct answer. That made it much harder to track down the bug in the code base I was working on. Also, regarding the sacred text, "x A numeric." is a bit terse. The same text later refers to length(x), so I suspect that "A numeric" is short for "A numeric vector", but that might not mean "a vector of 'numeric' type." https://stat.ethz.ch/R-manual/R-devel/library/stats/html/ave.html> And if (my understanding of) what you say is the case, this whole post > is silly. See ?table to do exactly what you claim is wanted without > trying to invent square wheels.table() counts elements but it has to repeat them in the proper pattern. For every element of a vector we want to know how many times it occurs in that vector. So if the vector is c("A","A","B","C","C","C") the output should be c(2,2,1,3,3,3). I'm sure we all know that table() will count the elements, but it doesn't place them in a vector as desired. I can do this with a character vector:> charvec <- c("A","A","B","C","C","C") > as.vector(( table( charvec )[charvec] ))[1] 2 2 1 3 3 3 It's slightly trickier with an integer vector:> intvec <- c(4,4,5,6,6,6) > table( intvec )[intvec]intvec <NA> <NA> <NA> <NA> <NA> <NA> NA NA NA NA NA NA> as.vector(table( intvec )[as.character(intvec)])[1] 2 2 1 3 3 3 So I think this will always work for vectors of either type: as.vector(table( as.character(vec) )[as.character(vec)]) To me that looks like the right way to do it. Think so? Best, Mike> On Wed, Dec 24, 2014 at 11:30 AM, Mike Miller <mbmiller+l at gmail.com> wrote: >> R 3.0.1 on Linux 64... >> >> I was working with someone else's code. They were using ave() in a way that >> I guess is nonstandard: Isn't FUN always supposed to be a variant of >> mean()? The idea was to count for every element of a factor vector how many >> times the level of that element occurs in the factor vector. >> >> >> gl() makes a factor: >> >>> gl(2,2,5) >> >> [1] 1 1 2 2 1 >> Levels: 1 2 >> >> >> ave() applies FUN to produce the desired count, and it works: >> >>> ave( 1:5, gl(2,2,5), FUN=length ) >> >> [1] 3 3 2 2 3 >> >> >> The elements of the first vector are irrelevant because they are only >> counted, so we should get the same result if it were a character vector, but >> we don't: >> >>> ave( as.character(1:5), gl(2,2,5), FUN=length ) >> >> [1] "3" "3" "2" "2" "3" >> >> The output has character type, but it is supposed to be a collection of >> vector lengths. >> >> >> Two questions: >> >> (1) Is that a bug in ave()? It certainly is unexpected. >> >> (2) What is the best way to do this sort of thing? >> >> The truth is that we start with a character vector and we want to create an >> integer vector that tells us for every element of the character vector how >> many times that string occurs. Here are two vectors of length 6 that should >> give the same result: >> >>> intvec <- c(4,5,6,5,6,6) >>> charvec <- c("A","B","C","B","C","C") >> >> >> The code was used like this with integer vectors and it seemed to work: >> >>> ave( intvec, intvec, FUN=length ) >> >> [1] 1 2 3 2 3 3 >> >> When a character vector came along, it would fail by producing a character >> vector as output: >> >>> ave( charvec, charvec, FUN=length ) >> >> [1] "1" "2" "3" "2" "3" "3" >> >> This seems more appropriate, and it might always work, but is it OK?: >> >>> ave( rep(1, length(charvec)), as.factor(charvec), FUN=sum ) >> >> [1] 1 2 3 2 3 3 >> >> I suspect that ave() isn't the best choice, but what is the best way to do >> this? >> >> >> Thanks in advance. >> >> Mike >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >
Mike Miller
2014-Dec-24 20:44 UTC
[R] ave(x, y, FUN=length) produces character output when x is character
On Wed, 24 Dec 2014, Nordlund, Dan (DSHS/RDA) wrote:> For your character vector example, this will get you the counts. > > table(charvec)[charvec] > > Hope this is helpful,It does help, Dan! I came up with the same idea and expanded on it a bit to work properly with other kinds of vectors: as.vector(table( as.character(vec) )[as.character(vec)]) If there are, say, 10,000 different elements in vec, each repeated an average of 5-10 times, will this still work correctly? In other words, the length of the table output array is unlimited, right? Mike
William Dunlap
2014-Dec-24 21:34 UTC
[R] ave(x, y, FUN=length) produces character output when x is character
> ave( as.character(1:5), gl(2,2,5), FUN=length )[1] "3" "3" "2" "2" "3"The output has character type, but it is supposed to be a collection of vector lengths. ave() uses its first argument, 'x', to set the length of its output and to make an initial guess at the type of its output. The return value of FUN can alter the type, but only in an 'upward' direction where logical<integer<numeric<complex <character<list. (This is the same rule that x[i]<-newvalue uses.) As currently written, ave also lets FUN(xi) return a vector the length of xi, not just a single value. E.g., > ave(105:101, c("A","A","B","A","B"), FUN=sort) [1] 102 104 101 105 103 > ave(105:101, c("A","A","B","A","B"), FUN=function(xi)xi-mean(xi)) [1] 1.3333333 0.3333333 1.0000000 -1.6666667 -1.0000000 I don't know what the docs say about that, but I often find that more useful than having it repeat the output of mean(xi). Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Dec 24, 2014 at 11:30 AM, Mike Miller <mbmiller+l at gmail.com> wrote:> > R 3.0.1 on Linux 64... > > I was working with someone else's code. They were using ave() in a way > that I guess is nonstandard: Isn't FUN always supposed to be a variant of > mean()? The idea was to count for every element of a factor vector how > many times the level of that element occurs in the factor vector. > > > gl() makes a factor: > > gl(2,2,5) >> > [1] 1 1 2 2 1 > Levels: 1 2 > > > ave() applies FUN to produce the desired count, and it works: > > ave( 1:5, gl(2,2,5), FUN=length ) >> > [1] 3 3 2 2 3 > > > The elements of the first vector are irrelevant because they are only > counted, so we should get the same result if it were a character vector, > but we don't: > > ave( as.character(1:5), gl(2,2,5), FUN=length ) >> > [1] "3" "3" "2" "2" "3" > > The output has character type, but it is supposed to be a collection of > vector lengths. > > > Two questions: > > (1) Is that a bug in ave()? It certainly is unexpected. > > (2) What is the best way to do this sort of thing? > > The truth is that we start with a character vector and we want to create > an integer vector that tells us for every element of the character vector > how many times that string occurs. Here are two vectors of length 6 that > should give the same result: > > intvec <- c(4,5,6,5,6,6) >> charvec <- c("A","B","C","B","C","C") >> > > The code was used like this with integer vectors and it seemed to work: > > ave( intvec, intvec, FUN=length ) >> > [1] 1 2 3 2 3 3 > > When a character vector came along, it would fail by producing a character > vector as output: > > ave( charvec, charvec, FUN=length ) >> > [1] "1" "2" "3" "2" "3" "3" > > This seems more appropriate, and it might always work, but is it OK?: > > ave( rep(1, length(charvec)), as.factor(charvec), FUN=sum ) >> > [1] 1 2 3 2 3 3 > > I suspect that ave() isn't the best choice, but what is the best way to do > this? > > > Thanks in advance. > > Mike > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]