Any insight into the behavior of "by" in the following case would be appreciated. There is a note in the help details for "by" about documenting behavior since v2.7 but I don't entirely understand what it is saying. I'm using R2.7.2 Windows. I'm interested if the following behavior was a change or whether it has always worked this way. I looked at RSiteSearch and read through version changes but found nothing. Take a dataframe as follows: > samples Region.Label Area Sample.Label Effort Label 1 1 10000 1 100 11 2 1 10000 2 100 12 3 1 10000 3 100 13 4 1 10000 4 100 14 5 1 10000 5 100 15 6 1 10000 6 100 16 7 1 10000 7 100 17 8 1 10000 8 100 18 9 1 10000 9 100 19 10 1 10000 10 100 110 Use "by" to tally number of entries with particular values of Region.Label (in this case there is only 1 value of Region.Label) by(samples$Effort,samples$Region.Label,length) INDICES: 1 [1] 1 I expected to get 10 instead of 1. I debugged into by.data.frame and I can see that it used drop=FALSE, so length returned the number of columns which is 1. But if I do any of the following, I get the 10 I expect. > by(rep(1,10),samples$Region.Label,length) samples$Region.Label: 1 [1] 10 by(samples$Label,samples$Region.Label,length) samples$Region.Label: 1 [1] 10 Also if I use "tapply" with samples$Effort instead of "by" I get the 10 I expect. tapply(samples$Effort,samples$Region.Label,length) 1 10 I do not understand why I'm getting these differences but I can see that I'm going to use tapply from now on.
On 29/10/2008, at 2:04 PM, Jeff Laake wrote:> Any insight into the behavior of "by" in the following case would be > appreciated. There is a note in the help details for "by" about > documenting behavior since v2.7 but I don't entirely understand > what it > is saying. I'm using R2.7.2 Windows. I'm interested if the following > behavior was a change or whether it has always worked this way. I > looked at RSiteSearch and read through version changes but found > nothing. > > Take a dataframe as follows: >> samples > Region.Label Area Sample.Label Effort Label > 1 1 10000 1 100 11 > 2 1 10000 2 100 12 > 3 1 10000 3 100 13 > 4 1 10000 4 100 14 > 5 1 10000 5 100 15 > 6 1 10000 6 100 16 > 7 1 10000 7 100 17 > 8 1 10000 8 100 18 > 9 1 10000 9 100 19 > 10 1 10000 10 100 110 > > Use "by" to tally number of entries with particular values of > Region.Label (in this case there is only 1 value of Region.Label) > > by(samples$Effort,samples$Region.Label,length) > INDICES: 1 > [1] 1 > > I expected to get 10 instead of 1.<snip> Cannot reproduce the problem: > samples Region.Label Area Sample.Label Effort Label 1 1 10000 1 100 11 2 1 10000 2 100 12 3 1 10000 3 100 13 4 1 10000 4 100 14 5 1 10000 5 100 15 6 1 10000 6 100 16 7 1 10000 7 100 17 8 1 10000 8 100 18 9 1 10000 9 100 19 10 1 10000 10 100 110 > by(samples$Effort,samples$Region.Label,length) samples$Region.Label: 1 [1] 10 I.e. I get 10 as you expected. cheers, Rolf Turner ###################################################################### Attention:\ This e-mail message is privileged and confid...{{dropped:9}}
On Tue, 28 Oct 2008 18:04:57 -0700, Jeff Laake <Jeff.Laake at noaa.gov> wrote:> Any insight into the behavior of "by" in the following case would be > appreciated. There is a note in the help details for "by" about > documenting behavior since v2.7 but I don't entirely understand what > it is saying. I'm using R2.7.2 Windows. I'm interested if the > following behavior was a change or whether it has always worked this > way. I looked at RSiteSearch and read through version changes but > found nothing.> Take a dataframe as follows: >> samples > Region.Label Area Sample.Label Effort Label 1 1 10000 1 100 11 2 1 > 10000 2 100 12 3 1 10000 3 100 13 4 1 10000 4 100 14 5 1 10000 5 100 > 15 6 1 10000 6 100 16 7 1 10000 7 100 17 8 1 10000 8 100 18 9 1 10000 > 9 100 19 10 1 10000 10 100 110I cannot reproduce your results (please provide reproducible code), but: table(samples$Region.Label) is simpler for this purpose. -- Seb