thr3ads.net - R help - [R] Sparseby Problems [Apr 2010]

If this information is useful, please help other people find it:
Share via:

Ben Stewart

2010-Apr-21 17:16 UTC

[R] Sparseby Problems

I've got a problem with the sparseby command (reshape library), and I have
reached the peak of my R knowledge (it isn't really that high).

I have a small data frame of 23 rows and 15 columns, here is a subset, the
first four columns are factors and the rest are numeric (only one, line54 is
provided).

   bearID YEAR Season SEX      line54
5    1900    8      3   0  16.3923519
11   2270    5      1   0 233.7414014
12   2271    5      1   0 290.8207652
13   2271    5      2   0 244.7820844
15   2291    5      1   0   0.0000000
16   2291    5      2   0  14.5037795
17   2291    6      1   0   0.0000000
18   2293    5      2   0 144.7440752
19   2293    5      3   0   0.0000000
20   2293    6      1   0  16.0592270
21   2293    6      2   0  30.1383426
28   2298    5      1   0   0.9741067
29   2298    5      2   0   9.6641018
30   2298    6      2   0   8.6533828
31   2309    5      2   0  85.9781303
32   2325    6      1   0 110.8892153
35   2331    6      1   0  26.7335562
44   2390    7      2   0   7.1690620
45   2390    8      2   0  44.1109897
46   2390    8      3   0 503.9074898
47   2390    9      2   0   8.4393660
54   2416    7      3   0  48.6910907
58   2418    8      2   0   5.7951139

Sparseby works fine when I try to calculate mean
>sparseby(mF[1:5], mF$Season, mean)
  mF$Season bearID YEAR Season SEX    line54
1         1     NA   NA     NA   0  84.90228
2         2     NA   NA     NA   0  54.90713
3         3     NA   NA     NA   0 142.24773

But it goes nuts when looking for max or min
> sparseby(mF[5:6], mF$Season, max)  mF$Season structure(c(2169.49621795108, 1885.22677689026, 2492.17544685464
1         1                                                         2169.496
2         2                                                         1885.227
3         3                                                         2492.175

Any ideas? All I want is to calculate create three data.frames, mean, min
and max.

Thanks,

Ben Stewart

William Dunlap

2010-Apr-21 20:48 UTC

head link

[R] Sparseby Problems

sparseby's FUN argument needs to be a function
that accepts a data.frame.

mean() has a data.frame method that returns a vector
of column means but most of the other summary functions
(e.g., median, min, max, quantile) do not work on data.frames.

You can use aggregate() instead of by() or sparseby(), as
its function is applied to each column of each data.frame,
not to a whole data.frame.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  
> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Ben Stewart
> Sent: Wednesday, April 21, 2010 10:17 AM
> To: r-help at r-project.org
> Subject: [R] Sparseby Problems
> 
> I've got a problem with the sparseby command (reshape 
> library), and I have
> reached the peak of my R knowledge (it isn't really that high).
> 
> I have a small data frame of 23 rows and 15 columns, here is 
> a subset, the
> first four columns are factors and the rest are numeric (only 
> one, line54 is
> provided).
> 
>    bearID YEAR Season SEX      line54
> 5    1900    8      3   0  16.3923519
> 11   2270    5      1   0 233.7414014
> 12   2271    5      1   0 290.8207652
> 13   2271    5      2   0 244.7820844
> 15   2291    5      1   0   0.0000000
> 16   2291    5      2   0  14.5037795
> 17   2291    6      1   0   0.0000000
> 18   2293    5      2   0 144.7440752
> 19   2293    5      3   0   0.0000000
> 20   2293    6      1   0  16.0592270
> 21   2293    6      2   0  30.1383426
> 28   2298    5      1   0   0.9741067
> 29   2298    5      2   0   9.6641018
> 30   2298    6      2   0   8.6533828
> 31   2309    5      2   0  85.9781303
> 32   2325    6      1   0 110.8892153
> 35   2331    6      1   0  26.7335562
> 44   2390    7      2   0   7.1690620
> 45   2390    8      2   0  44.1109897
> 46   2390    8      3   0 503.9074898
> 47   2390    9      2   0   8.4393660
> 54   2416    7      3   0  48.6910907
> 58   2418    8      2   0   5.7951139
> 
> Sparseby works fine when I try to calculate mean
> 
> >sparseby(mF[1:5], mF$Season, mean)
> 
>   mF$Season bearID YEAR Season SEX    line54
> 1         1     NA   NA     NA   0  84.90228
> 2         2     NA   NA     NA   0  54.90713
> 3         3     NA   NA     NA   0 142.24773
> 
> But it goes nuts when looking for max or min
> 
> > sparseby(mF[5:6], mF$Season, max)
>   mF$Season structure(c(2169.49621795108, 1885.22677689026, 
> 2492.17544685464
> 1         1                                                   
>       2169.496
> 2         2                                                   
>       1885.227
> 3         3                                                   
>       2492.175
> 
> Any ideas? All I want is to calculate create three 
> data.frames, mean, min
> and max.
> 
> Thanks,
> 
> Ben Stewart
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Dennis Murphy

2010-Apr-21 23:53 UTC

head link

[R] Sparseby Problems

Hi:

I'm a big fan of the reshape package, but this time I think that the doBy
and plyr
packages may better suit your needs. Since you mentioned wanting to get the
min/mean/max of several variables simultaneously, I took out line54 and
added
some vectors of Gaussian(0, 1) random numbers for testing:

test <- data.frame(mF[, -5], x1 = rnorm(23), x2 = rnorm(23), x3 = rnorm(23))

### doBy approach:
# Create a function for doBy to use on a specific variable:

f <- function(x) {
   c(min = min(x, na.rm = TRUE), mean = mean(x, na.rm = TRUE),
     max = max(x, na.rm = TRUE))
  }

library(doBy)> summaryBy(x1 + x2 + x3 ~ Season, data = test, FUN = f)  Season    x1.min    x1.mean   x1.max     x2.min     x2.mean    x2.max
1      1 -1.108496 -0.2590727 1.692468 -0.8958644 -0.00485722 0.6525678
2      2 -1.686261  0.4655741 2.097220 -0.9484292  0.37197098 2.6325965
3      3 -1.093520 -0.2049273 0.390061 -0.6886613  0.49534667 2.4263802
       x3.min     x3.mean    x3.max
1 -2.07369239 -0.05164301 1.6199843
2 -0.43556155  0.31221804 1.1939009
3 -0.04847558  0.15200570 0.4355102

The LHS of the formula consists of the variables you want summarized,
the RHS contains the grouping variable(s), the data supplied MUST be a data
frame and FUN is the function you want applied to each variable. In this
case,
the function returns a vector of the min, mean and max of the input
variable.
Notice that the names given in the function are appended to the variable
name,
separated by a dot. (A nice touch by the package author...)

If you have a number of variables to summarize in this fashion, doBy is well
designed for this type of task in the sense that the syntax is pretty
straightforward.

#### plyr approach
To accomplish the same task in plyr with ddply(), you've got to be a little
more clever -
use numcolwise() in combination with each(). numcolwise() applies the same
function
to each numeric variable in the input data frame; each() applies the list of
functions
supplied as its arguments to a single input variable. The call below is a
composition of
the two functions:
> ddply(test, .(Season), numcolwise(each(min, mean, max)))  Season         x1          x2          x3
1      1 -1.1084957 -0.89586438 -2.07369239
2      1 -0.2590727 -0.00485722 -0.05164301
3      1  1.6924681  0.65256782  1.61998433
4      2 -1.6862610 -0.94842919 -0.43556155
5      2  0.4655741  0.37197098  0.31221804
6      2  2.0972202  2.63259653  1.19390094
7      3 -1.0935199 -0.68866127 -0.04847558
8      3 -0.2049273  0.49534667  0.15200570
9      3  0.3900610  2.42638021  0.43551022

To distinguish the measures in each row, create a factor of stat names
and then rearrange the order of columns to get something a little more
presentable:> summ <- ddply(test, .(Season), numcolwise(each(min, mean, max)))
> summ$stat <- rep(c('Min', 'Mean', 'Max'), 3)   #
add vector of names
> summ <- summ[, c(1, 5, 2:4)]   # column rearrangement
> summ  Season stat         x1          x2          x3
1      1  Min -1.1084957 -0.89586438 -2.07369239
2      1 Mean -0.2590727 -0.00485722 -0.05164301
3      1  Max  1.6924681  0.65256782  1.61998433
4      2  Min -1.6862610 -0.94842919 -0.43556155
5      2 Mean  0.4655741  0.37197098  0.31221804
6      2  Max  2.0972202  2.63259653  1.19390094
7      3  Min -1.0935199 -0.68866127 -0.04847558
8      3 Mean -0.2049273  0.49534667  0.15200570
9      3  Max  0.3900610  2.42638021  0.43551022

The two functions give you two different ways to present the summaries; take
your pick.

HTH,
Dennis


On Wed, Apr 21, 2010 at 10:16 AM, Ben Stewart <bpstewar@uvic.ca> wrote:
> I've got a problem with the sparseby command (reshape library), and I
have
> reached the peak of my R knowledge (it isn't really that high).
>
> I have a small data frame of 23 rows and 15 columns, here is a subset, the
> first four columns are factors and the rest are numeric (only one, line54
> is
> provided).
>
>   bearID YEAR Season SEX      line54
> 5    1900    8      3   0  16.3923519
> 11   2270    5      1   0 233.7414014
> 12   2271    5      1   0 290.8207652
> 13   2271    5      2   0 244.7820844
> 15   2291    5      1   0   0.0000000
> 16   2291    5      2   0  14.5037795
> 17   2291    6      1   0   0.0000000
> 18   2293    5      2   0 144.7440752
> 19   2293    5      3   0   0.0000000
> 20   2293    6      1   0  16.0592270
> 21   2293    6      2   0  30.1383426
> 28   2298    5      1   0   0.9741067
> 29   2298    5      2   0   9.6641018
> 30   2298    6      2   0   8.6533828
> 31   2309    5      2   0  85.9781303
> 32   2325    6      1   0 110.8892153
> 35   2331    6      1   0  26.7335562
> 44   2390    7      2   0   7.1690620
> 45   2390    8      2   0  44.1109897
> 46   2390    8      3   0 503.9074898
> 47   2390    9      2   0   8.4393660
> 54   2416    7      3   0  48.6910907
> 58   2418    8      2   0   5.7951139
>
> Sparseby works fine when I try to calculate mean
>
> >sparseby(mF[1:5], mF$Season, mean)
>
>  mF$Season bearID YEAR Season SEX    line54
> 1         1     NA   NA     NA   0  84.90228
> 2         2     NA   NA     NA   0  54.90713
> 3         3     NA   NA     NA   0 142.24773
>
> But it goes nuts when looking for max or min
>
> > sparseby(mF[5:6], mF$Season, max)
>  mF$Season structure(c(2169.49621795108, 1885.22677689026, 2492.17544685464
> 1         1
> 2169.496
> 2         2
> 1885.227
> 3         3
> 2492.175
>
> Any ideas? All I want is to calculate create three data.frames, mean, min
> and max.
>
> Thanks,
>
> Ben Stewart
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Reasonably Related Threads

Search for more reasonably related threads

R help - Apr 2010 - Sparseby Problems

[R] Sparseby Problems

[R] Sparseby Problems

[R] Sparseby Problems

Reasonably Related Threads