thr3ads.net - R help - [R] Summary information by groups programming assitance [Dec 2008]

If this information is useful, please help other people find it:
Share via:

Ranney, Steven

2008-Dec-22 21:51 UTC

[R] Summary information by groups programming assitance

All - 

I have data that looks like

          psd 	Species Lake Length  Weight    St.weight    Wr
Wr.1     vol
432  substock     SMB      Clear    150   41.00      0.01  95.12438
95.10118  0.0105
433  substock     SMB      Clear    152   39.00      0.01  86.72916
86.70692  0.0105
434  substock     SMB      Clear    152   40.00      3.11  88.95298
82.03689  3.2655
435  substock     SMB      Clear    159   48.00      0.04  92.42095
92.34393  0.0420
436  substock     SMB      Clear    159   48.00      0.01  92.42095
92.40170  0.0105
437  substock     SMB      Clear    165   47.00      0.03  80.38023
80.32892  0.0315
438  substock     SMB      Clear    171   62.00      0.21  94.58105
94.26070  0.2205
439  substock     SMB      Clear    178   70.00      0.01  93.91912
93.90571  0.0105
440  substock     SMB      Clear    179   76.00      1.38 100.15760
98.33895  1.4490
441       S-Q     SMB      Clear    180   75.00      0.01  97.09330
97.08035  0.0105
442       S-Q     SMB      Clear    180   92.00      0.02 119.10111
119.07522  0.0210
...
[truncated] 

where psd and lake are categorical variables, with five and four
categories, respectively.  I'd like to find the maximum vol and the
lengths associated with each maximum vol by each category by each lake.
In other words, I'd like to have a data frame that looks something like 

Lake		Category	Length	vol
Clear		substock	152		3.2655
Clear		S-Q		266		11.73
Clear		Q-P		330		14.89
...
Pickerel	substock	170		3.4965
Pickerel	S-Q		248		10.69
Pickerel	Q-P		335		25.62
Pickerel	P-M		415		32.62
Pickerel	M-T		442		17.25	


In order to originally get this, I used 

with(smb[Lake=="Clear",], tapply(vol, list(Length, psd),max))
with(smb[Lake=="Enemy.Swim",], tapply(vol, list(Length, psd),max))
with(smb[Lake=="Pickerel",], tapply(vol, list(Length, psd),max))
with(smb[Lake=="Roy",], tapply(vol, list(Length, psd),max))

and pulled the values I needed out by hand and put them into a .csv.
Unfortunately, I've got a number of other data sets upon which I'll need
to do the same analysis.  Finding a programmable alternative would
provide a much easier (and likely less error prone) method to achieve
the same results.  Ideally, the "Length" and "vol" data
would be in a
data frame such that I could then analyze with nls.  

Does anyone have any thoughts as to how I might accomplish this?  

Thanks in advance, 

Steven Ranney

hadley wickham

2008-Dec-22 21:59 UTC

head link

[R] Summary information by groups programming assitance

On Mon, Dec 22, 2008 at 3:51 PM, Ranney, Steven
<steven.ranney at montana.edu> wrote:> All -
>
> I have data that looks like
>
>          psd   Species Lake Length  Weight    St.weight    Wr
> Wr.1     vol
> 432  substock     SMB      Clear    150   41.00      0.01  95.12438
> 95.10118  0.0105
> 433  substock     SMB      Clear    152   39.00      0.01  86.72916
> 86.70692  0.0105
> 434  substock     SMB      Clear    152   40.00      3.11  88.95298
> 82.03689  3.2655
> 435  substock     SMB      Clear    159   48.00      0.04  92.42095
> 92.34393  0.0420
> 436  substock     SMB      Clear    159   48.00      0.01  92.42095
> 92.40170  0.0105
> 437  substock     SMB      Clear    165   47.00      0.03  80.38023
> 80.32892  0.0315
> 438  substock     SMB      Clear    171   62.00      0.21  94.58105
> 94.26070  0.2205
> 439  substock     SMB      Clear    178   70.00      0.01  93.91912
> 93.90571  0.0105
> 440  substock     SMB      Clear    179   76.00      1.38 100.15760
> 98.33895  1.4490
> 441       S-Q     SMB      Clear    180   75.00      0.01  97.09330
> 97.08035  0.0105
> 442       S-Q     SMB      Clear    180   92.00      0.02 119.10111
> 119.07522  0.0210
> ...
> [truncated]
>
> where psd and lake are categorical variables, with five and four
> categories, respectively.  I'd like to find the maximum vol and the
> lengths associated with each maximum vol by each category by each lake.
> In other words, I'd like to have a data frame that looks something like
>
> Lake            Category        Length  vol
> Clear           substock        152             3.2655
> Clear           S-Q             266             11.73
> Clear           Q-P             330             14.89
> ...
> Pickerel        substock        170             3.4965
> Pickerel        S-Q             248             10.69
> Pickerel        Q-P             335             25.62
> Pickerel        P-M             415             32.62
> Pickerel        M-T             442             17.25
>
>
> In order to originally get this, I used
>
> with(smb[Lake=="Clear",], tapply(vol, list(Length, psd),max))
> with(smb[Lake=="Enemy.Swim",], tapply(vol, list(Length,
psd),max))
> with(smb[Lake=="Pickerel",], tapply(vol, list(Length, psd),max))
> with(smb[Lake=="Roy",], tapply(vol, list(Length, psd),max))
>
> and pulled the values I needed out by hand and put them into a .csv.
> Unfortunately, I've got a number of other data sets upon which I'll
need
> to do the same analysis.  Finding a programmable alternative would
> provide a much easier (and likely less error prone) method to achieve
> the same results.  Ideally, the "Length" and "vol" data
would be in a
> data frame such that I could then analyze with nls.
>
> Does anyone have any thoughts as to how I might accomplish this?
You might want to have a look at the plyr package,
http://had.co.nz/plyr, which provides a set of tools to make tasks
like this easy.  The are a number of similar examples in the
introductory pdf that should get you started.

Regards,

Hadley

-- 
http://had.co.nz/

Søren Højsgaard

2008-Dec-22 22:25 UTC

head link

[R] Summary information by groups programming assitance

Maybe summaryBy (or lapplyBy/splitBy) in the doBy package might help you.
Regards
S?ren

________________________________

Fra: r-help-bounces at r-project.org p? vegne af Ranney, Steven
Sendt: ma 22-12-2008 22:51
Til: r-help at r-project.org
Emne: [R] Summary information by groups programming assitance



All -

I have data that looks like

          psd   Species Lake Length  Weight    St.weight    Wr
Wr.1     vol
432  substock     SMB      Clear    150   41.00      0.01  95.12438
95.10118  0.0105
433  substock     SMB      Clear    152   39.00      0.01  86.72916
86.70692  0.0105
434  substock     SMB      Clear    152   40.00      3.11  88.95298
82.03689  3.2655
435  substock     SMB      Clear    159   48.00      0.04  92.42095
92.34393  0.0420
436  substock     SMB      Clear    159   48.00      0.01  92.42095
92.40170  0.0105
437  substock     SMB      Clear    165   47.00      0.03  80.38023
80.32892  0.0315
438  substock     SMB      Clear    171   62.00      0.21  94.58105
94.26070  0.2205
439  substock     SMB      Clear    178   70.00      0.01  93.91912
93.90571  0.0105
440  substock     SMB      Clear    179   76.00      1.38 100.15760
98.33895  1.4490
441       S-Q     SMB      Clear    180   75.00      0.01  97.09330
97.08035  0.0105
442       S-Q     SMB      Clear    180   92.00      0.02 119.10111
119.07522  0.0210
...
[truncated]

where psd and lake are categorical variables, with five and four
categories, respectively.  I'd like to find the maximum vol and the
lengths associated with each maximum vol by each category by each lake.
In other words, I'd like to have a data frame that looks something like

Lake            Category        Length  vol
Clear           substock        152             3.2655
Clear           S-Q             266             11.73
Clear           Q-P             330             14.89
...
Pickerel        substock        170             3.4965
Pickerel        S-Q             248             10.69
Pickerel        Q-P             335             25.62
Pickerel        P-M             415             32.62
Pickerel        M-T             442             17.25  


In order to originally get this, I used

with(smb[Lake=="Clear",], tapply(vol, list(Length, psd),max))
with(smb[Lake=="Enemy.Swim",], tapply(vol, list(Length, psd),max))
with(smb[Lake=="Pickerel",], tapply(vol, list(Length, psd),max))
with(smb[Lake=="Roy",], tapply(vol, list(Length, psd),max))

and pulled the values I needed out by hand and put them into a .csv.
Unfortunately, I've got a number of other data sets upon which I'll need
to do the same analysis.  Finding a programmable alternative would
provide a much easier (and likely less error prone) method to achieve
the same results.  Ideally, the "Length" and "vol" data
would be in a
data frame such that I could then analyze with nls. 

Does anyone have any thoughts as to how I might accomplish this? 

Thanks in advance,

Steven Ranney  

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Gabor Grothendieck

2008-Dec-23 00:15 UTC

head link

[R] Summary information by groups programming assitance

Here are two solutions assuming DF is your data frame:

# 1. aggregate is in the base of R

aggregate(DF[c("Length", "vol")], DF[c("Lake",
"psd")], max)

or the following which is the same except it labels psd as Category:

aggregate(DF[c("Length", "vol")], with(DF, list(Lake = Lake,
Category
= psd)), max)


# 2. sqldf.  The sqldf package allows specification using SQL notation:

library|(sqldf)
sqldf("select Lake, psd as Category, max(Length), max(vol) from DF
group by Lake, psd")

There are many other good solutions too using various packages which
have already
been mentioned on this thread.

On Mon, Dec 22, 2008 at 4:51 PM, Ranney, Steven
<steven.ranney at montana.edu> wrote:> All -
>
> I have data that looks like
>
>          psd   Species Lake Length  Weight    St.weight    Wr
> Wr.1     vol
> 432  substock     SMB      Clear    150   41.00      0.01  95.12438
> 95.10118  0.0105
> 433  substock     SMB      Clear    152   39.00      0.01  86.72916
> 86.70692  0.0105
> 434  substock     SMB      Clear    152   40.00      3.11  88.95298
> 82.03689  3.2655
> 435  substock     SMB      Clear    159   48.00      0.04  92.42095
> 92.34393  0.0420
> 436  substock     SMB      Clear    159   48.00      0.01  92.42095
> 92.40170  0.0105
> 437  substock     SMB      Clear    165   47.00      0.03  80.38023
> 80.32892  0.0315
> 438  substock     SMB      Clear    171   62.00      0.21  94.58105
> 94.26070  0.2205
> 439  substock     SMB      Clear    178   70.00      0.01  93.91912
> 93.90571  0.0105
> 440  substock     SMB      Clear    179   76.00      1.38 100.15760
> 98.33895  1.4490
> 441       S-Q     SMB      Clear    180   75.00      0.01  97.09330
> 97.08035  0.0105
> 442       S-Q     SMB      Clear    180   92.00      0.02 119.10111
> 119.07522  0.0210
> ...
> [truncated]
>
> where psd and lake are categorical variables, with five and four
> categories, respectively.  I'd like to find the maximum vol and the
> lengths associated with each maximum vol by each category by each lake.
> In other words, I'd like to have a data frame that looks something like
>
> Lake            Category        Length  vol
> Clear           substock        152             3.2655
> Clear           S-Q             266             11.73
> Clear           Q-P             330             14.89
> ...
> Pickerel        substock        170             3.4965
> Pickerel        S-Q             248             10.69
> Pickerel        Q-P             335             25.62
> Pickerel        P-M             415             32.62
> Pickerel        M-T             442             17.25
>
>
> In order to originally get this, I used
>
> with(smb[Lake=="Clear",], tapply(vol, list(Length, psd),max))
> with(smb[Lake=="Enemy.Swim",], tapply(vol, list(Length,
psd),max))
> with(smb[Lake=="Pickerel",], tapply(vol, list(Length, psd),max))
> with(smb[Lake=="Roy",], tapply(vol, list(Length, psd),max))
>
> and pulled the values I needed out by hand and put them into a .csv.
> Unfortunately, I've got a number of other data sets upon which I'll
need
> to do the same analysis.  Finding a programmable alternative would
> provide a much easier (and likely less error prone) method to achieve
> the same results.  Ideally, the "Length" and "vol" data
would be in a
> data frame such that I could then analyze with nls.
>
> Does anyone have any thoughts as to how I might accomplish this?
>
> Thanks in advance,
>
> Steven Ranney
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Seemingly Similar Threads

Search for more maybe matching threads

R help - Dec 2008 - Summary information by groups programming assitance

[R] Summary information by groups programming assitance

[R] Summary information by groups programming assitance

[R] Summary information by groups programming assitance

[R] Summary information by groups programming assitance

Seemingly Similar Threads