brenbarn at brenbarn.net
2009-Apr-19 21:05 UTC
[Rd] ave returns wrong data type (PR#13664)
Full_Name: Brendan Barnwell Version: 2.9.0 OS: Windows XP Pro Submission from: (NULL) (71.102.131.29) The ave() function returns an incorrect datatype. Specifically, ave(x, g, f) always returns a vector with the same mode as x, rather than using the mode of the vector returned by f. Observe:> x[1] "A" "B" "C" "A" "B" "C" "A" "B" "C" "A" "B" "C" "A" "B" "C" "A" "B" "C" "A" "B" "C" "A" "B" "C" "A" "B" "C" "A" "B" "C"> g[1] "X" "Y" "X" "Y" "X" "Y" "X" "Y" "X" "Y" "X" "Y" "X" "Y" "X" "Y" "X" "Y" "X" "Y" "X" "Y" "X" "Y" "X" "Y" "X" "Y" "X" "Y"> ave(x, g, FUN=length)[1] "15" "15" "15" "15" "15" "15" "15" "15" "15" "15" "15" "15" "15" "15" "15" "15" "15" "15" "15" "15" "15" "15" "15" "15" [25] "15" "15" "15" "15" "15" "15" Even though the length() function returns a vector of integers, ave() inappropriately converts this to a character vector. The bug is due to this line in the definition of ave(): split(x, g) <- lapply(split(x, g), FUN) By sticking the result of the lapply back into the original argument x, it coerces that result to the type of that argument. This contradicts the documentation, which says that the value of ave() is "a numeric vector". I would suggest that this documentation itself doesn't describe the desired behavior. The result vector should be of the type returned by FUN (just as it is for tapply). Otherwise it is impossible to use ave() to compute summary statistics whose type differs from that of the argument.
Note that according to ?ave, the first argument of ave(), 'x' should be a *numeric* vector. In your case 'x' is not numeric, it is a character vector. So I think that ave() works as documented, i.e., if you supply as first argument a numeric vector, then you do get as an output a numeric vector. Best, Dimitris brenbarn at brenbarn.net wrote:> Full_Name: Brendan Barnwell > Version: 2.9.0 > OS: Windows XP Pro > Submission from: (NULL) (71.102.131.29) > > > The ave() function returns an incorrect datatype. Specifically, ave(x, g, f) > always returns a vector with the same mode as x, rather than using the mode of > the vector returned by f. Observe: > >> x > [1] "A" "B" "C" "A" "B" "C" "A" "B" "C" "A" "B" "C" "A" "B" "C" "A" "B" "C" "A" > "B" "C" "A" "B" "C" "A" "B" "C" "A" "B" "C" >> g > [1] "X" "Y" "X" "Y" "X" "Y" "X" "Y" "X" "Y" "X" "Y" "X" "Y" "X" "Y" "X" "Y" "X" > "Y" "X" "Y" "X" "Y" "X" "Y" "X" "Y" "X" "Y" >> ave(x, g, FUN=length) > [1] "15" "15" "15" "15" "15" "15" "15" "15" "15" "15" "15" "15" "15" "15" "15" > "15" "15" "15" "15" "15" "15" "15" "15" "15" > [25] "15" "15" "15" "15" "15" "15" > > Even though the length() function returns a vector of integers, ave() > inappropriately converts this to a character vector. The bug is due to this > line in the definition of ave(): > > split(x, g) <- lapply(split(x, g), FUN) > > By sticking the result of the lapply back into the original argument x, it > coerces that result to the type of that argument. This contradicts the > documentation, which says that the value of ave() is "a numeric vector". I > would suggest that this documentation itself doesn't describe the desired > behavior. The result vector should be of the type returned by FUN (just as it > is for tapply). Otherwise it is impossible to use ave() to compute summary > statistics whose type differs from that of the argument. > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014