bsaville at bios.unc.edu
2006-Apr-19 20:25 UTC
[Rd] gsummary function (nlme library) (PR#8782)
Full_Name: Ben Saville Version: 2.1 OS: Windows XP Submission from: (NULL) (152.2.94.145) I'm using the gsummary function to calculate a sum of V1 (column one) from my data 'mytest' by group (V2,or column 2). If V1 (the variable of interest) is all the same value (in this case all 2's), I do not get back the correct summation. If there is at least one difference in V1 (all 2's except for one 1), it gives me correct values. So either I am doing something wrong or there is a bug in the gsummary function. # Incorrect sums mytest <- as.data.frame(matrix(c(2,rep(2,8),1,1,2,2,2,3,3,3,3),ncol=2)) mytest gsummary(mytest,form=V1~1|V2, FUN=sum)[,1] # Correct sums mytest <- as.data.frame(matrix(c(1,rep(2,8),1,1,2,2,2,3,3,3,3),ncol=2)) mytest gsummary(mytest,form=V1~1|V2, FUN=sum)[,1]
The documentation for gsummary describes the argument FUN as
FUN: an optional summary function or a list of summary functions
to be applied to each variable in the frame. The function or
functions are applied only to variables in 'object' that vary
within the groups defined by 'groups'. Invariant variables
are always summarized by group using the unique value that
they assume within that group. If 'FUN' is a single function
it will be applied to each non-invariant variable by group to
produce the summary for that variable. If 'FUN' is a list of
functions, the names in the list should designate classes of
variables in the frame such as 'ordered', 'factor', or
'numeric'. The indicated function will be applied to any
non-invariant variables of that class. The default functions
to be used are 'mean' for numeric factors, and 'Mode'
for
both 'factor' and 'ordered'. The 'Mode'
function, defined
internally in 'gsummary', returns the modal or most popular
value of the variable. It is different from the 'mode'
function that returns the S-language mode of the variable.
so the behavior you noticed is documented.
The "summary" in "gsummary" is used in the sense of a
representative
value, not in the more general sense of a numerical summary of any
sort. If the values do not vary within a group then the common value
within the group is, according to our definition, the representative
value.
On 4/19/06, bsaville at bios.unc.edu <bsaville at bios.unc.edu>
wrote:> Full_Name: Ben Saville
> Version: 2.1
> OS: Windows XP
> Submission from: (NULL) (152.2.94.145)
>
>
> I'm using the gsummary function to calculate a sum of V1 (column one)
from my
> data 'mytest' by group (V2,or column 2). If V1 (the variable of
interest) is
> all the same value (in this case all 2's), I do not get back the
correct
> summation. If there is at least one difference in V1 (all 2's except
for one
> 1), it gives me correct values. So either I am doing something wrong or
there
> is a bug in the gsummary function.
>
> # Incorrect sums
> mytest <- as.data.frame(matrix(c(2,rep(2,8),1,1,2,2,2,3,3,3,3),ncol=2))
> mytest
> gsummary(mytest,form=V1~1|V2, FUN=sum)[,1]
>
> # Correct sums
> mytest <- as.data.frame(matrix(c(1,rep(2,8),1,1,2,2,2,3,3,3,3),ncol=2))
> mytest
> gsummary(mytest,form=V1~1|V2, FUN=sum)[,1]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>