Ranney, Steven
2008-Dec-22 21:51 UTC
[R] Summary information by groups programming assitance
All - I have data that looks like psd Species Lake Length Weight St.weight Wr Wr.1 vol 432 substock SMB Clear 150 41.00 0.01 95.12438 95.10118 0.0105 433 substock SMB Clear 152 39.00 0.01 86.72916 86.70692 0.0105 434 substock SMB Clear 152 40.00 3.11 88.95298 82.03689 3.2655 435 substock SMB Clear 159 48.00 0.04 92.42095 92.34393 0.0420 436 substock SMB Clear 159 48.00 0.01 92.42095 92.40170 0.0105 437 substock SMB Clear 165 47.00 0.03 80.38023 80.32892 0.0315 438 substock SMB Clear 171 62.00 0.21 94.58105 94.26070 0.2205 439 substock SMB Clear 178 70.00 0.01 93.91912 93.90571 0.0105 440 substock SMB Clear 179 76.00 1.38 100.15760 98.33895 1.4490 441 S-Q SMB Clear 180 75.00 0.01 97.09330 97.08035 0.0105 442 S-Q SMB Clear 180 92.00 0.02 119.10111 119.07522 0.0210 ... [truncated] where psd and lake are categorical variables, with five and four categories, respectively. I'd like to find the maximum vol and the lengths associated with each maximum vol by each category by each lake. In other words, I'd like to have a data frame that looks something like Lake Category Length vol Clear substock 152 3.2655 Clear S-Q 266 11.73 Clear Q-P 330 14.89 ... Pickerel substock 170 3.4965 Pickerel S-Q 248 10.69 Pickerel Q-P 335 25.62 Pickerel P-M 415 32.62 Pickerel M-T 442 17.25 In order to originally get this, I used with(smb[Lake=="Clear",], tapply(vol, list(Length, psd),max)) with(smb[Lake=="Enemy.Swim",], tapply(vol, list(Length, psd),max)) with(smb[Lake=="Pickerel",], tapply(vol, list(Length, psd),max)) with(smb[Lake=="Roy",], tapply(vol, list(Length, psd),max)) and pulled the values I needed out by hand and put them into a .csv. Unfortunately, I've got a number of other data sets upon which I'll need to do the same analysis. Finding a programmable alternative would provide a much easier (and likely less error prone) method to achieve the same results. Ideally, the "Length" and "vol" data would be in a data frame such that I could then analyze with nls. Does anyone have any thoughts as to how I might accomplish this? Thanks in advance, Steven Ranney
hadley wickham
2008-Dec-22 21:59 UTC
[R] Summary information by groups programming assitance
On Mon, Dec 22, 2008 at 3:51 PM, Ranney, Steven <steven.ranney at montana.edu> wrote:> All - > > I have data that looks like > > psd Species Lake Length Weight St.weight Wr > Wr.1 vol > 432 substock SMB Clear 150 41.00 0.01 95.12438 > 95.10118 0.0105 > 433 substock SMB Clear 152 39.00 0.01 86.72916 > 86.70692 0.0105 > 434 substock SMB Clear 152 40.00 3.11 88.95298 > 82.03689 3.2655 > 435 substock SMB Clear 159 48.00 0.04 92.42095 > 92.34393 0.0420 > 436 substock SMB Clear 159 48.00 0.01 92.42095 > 92.40170 0.0105 > 437 substock SMB Clear 165 47.00 0.03 80.38023 > 80.32892 0.0315 > 438 substock SMB Clear 171 62.00 0.21 94.58105 > 94.26070 0.2205 > 439 substock SMB Clear 178 70.00 0.01 93.91912 > 93.90571 0.0105 > 440 substock SMB Clear 179 76.00 1.38 100.15760 > 98.33895 1.4490 > 441 S-Q SMB Clear 180 75.00 0.01 97.09330 > 97.08035 0.0105 > 442 S-Q SMB Clear 180 92.00 0.02 119.10111 > 119.07522 0.0210 > ... > [truncated] > > where psd and lake are categorical variables, with five and four > categories, respectively. I'd like to find the maximum vol and the > lengths associated with each maximum vol by each category by each lake. > In other words, I'd like to have a data frame that looks something like > > Lake Category Length vol > Clear substock 152 3.2655 > Clear S-Q 266 11.73 > Clear Q-P 330 14.89 > ... > Pickerel substock 170 3.4965 > Pickerel S-Q 248 10.69 > Pickerel Q-P 335 25.62 > Pickerel P-M 415 32.62 > Pickerel M-T 442 17.25 > > > In order to originally get this, I used > > with(smb[Lake=="Clear",], tapply(vol, list(Length, psd),max)) > with(smb[Lake=="Enemy.Swim",], tapply(vol, list(Length, psd),max)) > with(smb[Lake=="Pickerel",], tapply(vol, list(Length, psd),max)) > with(smb[Lake=="Roy",], tapply(vol, list(Length, psd),max)) > > and pulled the values I needed out by hand and put them into a .csv. > Unfortunately, I've got a number of other data sets upon which I'll need > to do the same analysis. Finding a programmable alternative would > provide a much easier (and likely less error prone) method to achieve > the same results. Ideally, the "Length" and "vol" data would be in a > data frame such that I could then analyze with nls. > > Does anyone have any thoughts as to how I might accomplish this?You might want to have a look at the plyr package, http://had.co.nz/plyr, which provides a set of tools to make tasks like this easy. The are a number of similar examples in the introductory pdf that should get you started. Regards, Hadley -- http://had.co.nz/
Søren Højsgaard
2008-Dec-22 22:25 UTC
[R] Summary information by groups programming assitance
Maybe summaryBy (or lapplyBy/splitBy) in the doBy package might help you. Regards S?ren ________________________________ Fra: r-help-bounces at r-project.org p? vegne af Ranney, Steven Sendt: ma 22-12-2008 22:51 Til: r-help at r-project.org Emne: [R] Summary information by groups programming assitance All - I have data that looks like psd Species Lake Length Weight St.weight Wr Wr.1 vol 432 substock SMB Clear 150 41.00 0.01 95.12438 95.10118 0.0105 433 substock SMB Clear 152 39.00 0.01 86.72916 86.70692 0.0105 434 substock SMB Clear 152 40.00 3.11 88.95298 82.03689 3.2655 435 substock SMB Clear 159 48.00 0.04 92.42095 92.34393 0.0420 436 substock SMB Clear 159 48.00 0.01 92.42095 92.40170 0.0105 437 substock SMB Clear 165 47.00 0.03 80.38023 80.32892 0.0315 438 substock SMB Clear 171 62.00 0.21 94.58105 94.26070 0.2205 439 substock SMB Clear 178 70.00 0.01 93.91912 93.90571 0.0105 440 substock SMB Clear 179 76.00 1.38 100.15760 98.33895 1.4490 441 S-Q SMB Clear 180 75.00 0.01 97.09330 97.08035 0.0105 442 S-Q SMB Clear 180 92.00 0.02 119.10111 119.07522 0.0210 ... [truncated] where psd and lake are categorical variables, with five and four categories, respectively. I'd like to find the maximum vol and the lengths associated with each maximum vol by each category by each lake. In other words, I'd like to have a data frame that looks something like Lake Category Length vol Clear substock 152 3.2655 Clear S-Q 266 11.73 Clear Q-P 330 14.89 ... Pickerel substock 170 3.4965 Pickerel S-Q 248 10.69 Pickerel Q-P 335 25.62 Pickerel P-M 415 32.62 Pickerel M-T 442 17.25 In order to originally get this, I used with(smb[Lake=="Clear",], tapply(vol, list(Length, psd),max)) with(smb[Lake=="Enemy.Swim",], tapply(vol, list(Length, psd),max)) with(smb[Lake=="Pickerel",], tapply(vol, list(Length, psd),max)) with(smb[Lake=="Roy",], tapply(vol, list(Length, psd),max)) and pulled the values I needed out by hand and put them into a .csv. Unfortunately, I've got a number of other data sets upon which I'll need to do the same analysis. Finding a programmable alternative would provide a much easier (and likely less error prone) method to achieve the same results. Ideally, the "Length" and "vol" data would be in a data frame such that I could then analyze with nls. Does anyone have any thoughts as to how I might accomplish this? Thanks in advance, Steven Ranney ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Gabor Grothendieck
2008-Dec-23 00:15 UTC
[R] Summary information by groups programming assitance
Here are two solutions assuming DF is your data frame: # 1. aggregate is in the base of R aggregate(DF[c("Length", "vol")], DF[c("Lake", "psd")], max) or the following which is the same except it labels psd as Category: aggregate(DF[c("Length", "vol")], with(DF, list(Lake = Lake, Category = psd)), max) # 2. sqldf. The sqldf package allows specification using SQL notation: library|(sqldf) sqldf("select Lake, psd as Category, max(Length), max(vol) from DF group by Lake, psd") There are many other good solutions too using various packages which have already been mentioned on this thread. On Mon, Dec 22, 2008 at 4:51 PM, Ranney, Steven <steven.ranney at montana.edu> wrote:> All - > > I have data that looks like > > psd Species Lake Length Weight St.weight Wr > Wr.1 vol > 432 substock SMB Clear 150 41.00 0.01 95.12438 > 95.10118 0.0105 > 433 substock SMB Clear 152 39.00 0.01 86.72916 > 86.70692 0.0105 > 434 substock SMB Clear 152 40.00 3.11 88.95298 > 82.03689 3.2655 > 435 substock SMB Clear 159 48.00 0.04 92.42095 > 92.34393 0.0420 > 436 substock SMB Clear 159 48.00 0.01 92.42095 > 92.40170 0.0105 > 437 substock SMB Clear 165 47.00 0.03 80.38023 > 80.32892 0.0315 > 438 substock SMB Clear 171 62.00 0.21 94.58105 > 94.26070 0.2205 > 439 substock SMB Clear 178 70.00 0.01 93.91912 > 93.90571 0.0105 > 440 substock SMB Clear 179 76.00 1.38 100.15760 > 98.33895 1.4490 > 441 S-Q SMB Clear 180 75.00 0.01 97.09330 > 97.08035 0.0105 > 442 S-Q SMB Clear 180 92.00 0.02 119.10111 > 119.07522 0.0210 > ... > [truncated] > > where psd and lake are categorical variables, with five and four > categories, respectively. I'd like to find the maximum vol and the > lengths associated with each maximum vol by each category by each lake. > In other words, I'd like to have a data frame that looks something like > > Lake Category Length vol > Clear substock 152 3.2655 > Clear S-Q 266 11.73 > Clear Q-P 330 14.89 > ... > Pickerel substock 170 3.4965 > Pickerel S-Q 248 10.69 > Pickerel Q-P 335 25.62 > Pickerel P-M 415 32.62 > Pickerel M-T 442 17.25 > > > In order to originally get this, I used > > with(smb[Lake=="Clear",], tapply(vol, list(Length, psd),max)) > with(smb[Lake=="Enemy.Swim",], tapply(vol, list(Length, psd),max)) > with(smb[Lake=="Pickerel",], tapply(vol, list(Length, psd),max)) > with(smb[Lake=="Roy",], tapply(vol, list(Length, psd),max)) > > and pulled the values I needed out by hand and put them into a .csv. > Unfortunately, I've got a number of other data sets upon which I'll need > to do the same analysis. Finding a programmable alternative would > provide a much easier (and likely less error prone) method to achieve > the same results. Ideally, the "Length" and "vol" data would be in a > data frame such that I could then analyze with nls. > > Does anyone have any thoughts as to how I might accomplish this? > > Thanks in advance, > > Steven Ranney > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >