thr3ads.net - R help - [R] behavior of "by" [Oct 2008]

If this information is useful, please help other people find it:
Share via:

Jeff Laake

2008-Oct-29 01:04 UTC

[R] behavior of "by"

Any insight into the behavior of "by" in the following case would be 
appreciated.  There is a note in the help details for "by" about 
documenting behavior since v2.7 but I don't entirely understand what it 
is saying.  I'm using R2.7.2 Windows.  I'm interested if the following 
behavior was a change or whether it has always worked this way.  I 
looked at RSiteSearch and read through version changes but found nothing.

Take a dataframe as follows:
 > samples
   Region.Label  Area Sample.Label Effort Label
1             1 10000            1    100    11
2             1 10000            2    100    12
3             1 10000            3    100    13
4             1 10000            4    100    14
5             1 10000            5    100    15
6             1 10000            6    100    16
7             1 10000            7    100    17
8             1 10000            8    100    18
9             1 10000            9    100    19
10            1 10000           10    100   110

Use "by" to tally number of entries with particular values of 
Region.Label (in this case there is only 1 value of Region.Label)

by(samples$Effort,samples$Region.Label,length)
INDICES: 1
[1] 1

I expected to get 10 instead of 1.  I debugged into by.data.frame and I 
can see that it used drop=FALSE, so length returned the number of 
columns which is 1.  But if I do any of the following, I get the 10 I 
expect.

 > by(rep(1,10),samples$Region.Label,length)
samples$Region.Label: 1
[1] 10
by(samples$Label,samples$Region.Label,length)
samples$Region.Label: 1
[1] 10

Also if I use "tapply" with samples$Effort instead of "by" I
get the 10
I expect.

tapply(samples$Effort,samples$Region.Label,length)
 1
10

I do not understand why I'm getting these differences but I can see that 
I'm going to use tapply from now on.

Rolf Turner

2008-Oct-29 01:35 UTC

head link

[R] behavior of "by"

On 29/10/2008, at 2:04 PM, Jeff Laake wrote:
> Any insight into the behavior of "by" in the following case would
be
> appreciated.  There is a note in the help details for "by" about
> documenting behavior since v2.7 but I don't entirely understand  
> what it
> is saying.  I'm using R2.7.2 Windows.  I'm interested if the
following
> behavior was a change or whether it has always worked this way.  I
> looked at RSiteSearch and read through version changes but found  
> nothing.
>
> Take a dataframe as follows:
>> samples
>    Region.Label  Area Sample.Label Effort Label
> 1             1 10000            1    100    11
> 2             1 10000            2    100    12
> 3             1 10000            3    100    13
> 4             1 10000            4    100    14
> 5             1 10000            5    100    15
> 6             1 10000            6    100    16
> 7             1 10000            7    100    17
> 8             1 10000            8    100    18
> 9             1 10000            9    100    19
> 10            1 10000           10    100   110
>
> Use "by" to tally number of entries with particular values of
> Region.Label (in this case there is only 1 value of Region.Label)
>
> by(samples$Effort,samples$Region.Label,length)
> INDICES: 1
> [1] 1
>
> I expected to get 10 instead of 1.

	<snip>

Cannot reproduce the problem:

  > samples
    Region.Label  Area Sample.Label Effort Label
1             1 10000            1    100    11
2             1 10000            2    100    12
3             1 10000            3    100    13
4             1 10000            4    100    14
5             1 10000            5    100    15
6             1 10000            6    100    16
7             1 10000            7    100    17
8             1 10000            8    100    18
9             1 10000            9    100    19
10            1 10000           10    100   110
  > by(samples$Effort,samples$Region.Label,length)
samples$Region.Label: 1
[1] 10

I.e. I get 10 as you expected.

	cheers,

		Rolf Turner

######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

Sebastian P. Luque

2008-Oct-29 01:39 UTC

head link

[R] behavior of "by"

On Tue, 28 Oct 2008 18:04:57 -0700,
Jeff Laake <Jeff.Laake at noaa.gov> wrote:
> Any insight into the behavior of "by" in the following case would
be
> appreciated.  There is a note in the help details for "by" about
> documenting behavior since v2.7 but I don't entirely understand what
> it is saying.  I'm using R2.7.2 Windows.  I'm interested if the
> following behavior was a change or whether it has always worked this
> way.  I looked at RSiteSearch and read through version changes but
> found nothing.
> Take a dataframe as follows:
>> samples
>   Region.Label Area Sample.Label Effort Label 1 1 10000 1 100 11 2 1
> 10000 2 100 12 3 1 10000 3 100 13 4 1 10000 4 100 14 5 1 10000 5 100
> 15 6 1 10000 6 100 16 7 1 10000 7 100 17 8 1 10000 8 100 18 9 1 10000
> 9 100 19 10 1 10000 10 100 110
I cannot reproduce your results (please provide reproducible code), but:

table(samples$Region.Label)

is simpler for this purpose.


-- 
Seb

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Oct 2008 - behavior of "by"

[R] behavior of "by"

[R] behavior of "by"

[R] behavior of "by"

Seemingly Similar Threads