thr3ads.net - R help - [R] counting duplicate items that occur in multiple groups [Nov 2020]

If this information is useful, please help other people find it:
Share via:

Tom Woolman

2020-Nov-17 23:34 UTC

[R] counting duplicate items that occur in multiple groups

Yes, good catch. Thanks


Quoting Bert Gunter <bgunter.4567 at gmail.com>:
> Why 0's in the data frame? Shouldn't that be 1 (vendor with that
account)?
>
> Bert
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)
>
>
> On Tue, Nov 17, 2020 at 3:29 PM Tom Woolman <twoolman at
ontargettek.com>
> wrote:
>
>> Hi Bill. Sorry to be so obtuse with the example data, I was trying
>> (too hard) not to share any actual values so I just created randomized
>> values for my example; of course I should have specified that the
>> random values would not provide the expected problem pattern. I should
>> have just used simple dummy codes as Bill Dunlap did.
>>
>> So per Bill's example data for Data1, the expected (hoped for)
output
>> should be:
>>
>>   Vendor Account Num_Vendors_Sharing_Bank_Acct
>> 1     V1      A1      0
>> 2     V2      A2      3
>> 3     V3      A2      3
>> 4     V4      A2      3
>>
>>
>> Where the new calculated variable is Num_Vendors_Sharing_Bank_Acct.
>> The value is 3 for V2, V3 and V4 because they each share bank account
>> A2.
>>
>>
>> Likewise, in the Data2 frame, the same logic applies:
>>
>>   Vendor Account Num_Vendors_Sharing_Bank_Acct
>> 1     V1      A1     0
>> 2     V2      A2     3
>> 3     V3      A2     3
>> 4     V1      A2     3
>> 5     V4      A3     0
>> 6     V2      A4     0
>>
>>
>>
>>
>>
>>
>> Thanks!
>>
>>
>> Quoting Bill Dunlap <williamwdunlap at gmail.com>:
>>
>> > What should the result be for
>> >   Data1 <-
data.frame(Vendor=c("V1","V2","V3","V4"),
>> >
Account=c("A1","A2","A2","A2"))
>> > ?
>> >
>> > Must each vendor have only one account?  If not, what should the
result
>> be
>> > for
>> >    Data2 <-
data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"),
>> >
Account=c("A1","A2","A2","A2","A3","A4"))
>> > ?
>> >
>> > -Bill
>> >
>> > On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman <twoolman at
ontargettek.com>
>> > wrote:
>> >
>> >> Hi everyone.  I have a dataframe that is a collection of
Vendor IDs
>> >> plus a bank account number for each vendor. I'm trying to
find a way
>> >> to count the number of duplicate bank accounts that occur in
more than
>> >> one unique Vendor_ID, and then assign the count value for each
row in
>> >> the dataframe in a new variable.
>> >>
>> >> I can do a count of bank accounts that occur within the same
vendor
>> >> using dplyr and group_by and count, but I can't figure out
a way to
>> >> count duplicates among multiple Vendor_IDs.
>> >>
>> >>
>> >> Dataframe example code:
>> >>
>> >>
>> >> #Create a sample data frame:
>> >>
>> >> set.seed(1)
>> >>
>> >> Data <- data.frame(Vendor_ID = sample(1:10000),
Bank_Account_ID >> >> sample(1:10000))
>> >>
>> >>
>> >>
>> >>
>> >> Thanks in advance for any help.
>> >>
>> >> ______________________________________________
>> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible
code.
>> >>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>

Bert Gunter

2020-Nov-18 00:06 UTC

head link

[R] counting duplicate items that occur in multiple groups

z <- with(Data2, tapply(Vendor,Account, I))
n <- vapply(z,length,1)
data.frame (Vendor = unlist(z),
   Account = rep(names(z),n),
   NumVen = rep(n,n)
)

## which gives:

   Vendor Account NumVen
A1      V1      A1      1
A21     V2      A2      3
A22     V3      A2      3
A23     V1      A2      3
A3      V4      A3      1
A4      V2      A4      1

Of course this also works for Data1

Bill may be able to come up with a slicker version, however.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Nov 17, 2020 at 3:34 PM Tom Woolman <twoolman at ontargettek.com>
wrote:
> Yes, good catch. Thanks
>
>
> Quoting Bert Gunter <bgunter.4567 at gmail.com>:
>
> > Why 0's in the data frame? Shouldn't that be 1 (vendor with
that
> account)?
> >
> > Bert
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming
along
> and
> > sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic
strip )
> >
> >
> > On Tue, Nov 17, 2020 at 3:29 PM Tom Woolman <twoolman at
ontargettek.com>
> > wrote:
> >
> >> Hi Bill. Sorry to be so obtuse with the example data, I was trying
> >> (too hard) not to share any actual values so I just created
randomized
> >> values for my example; of course I should have specified that the
> >> random values would not provide the expected problem pattern. I
should
> >> have just used simple dummy codes as Bill Dunlap did.
> >>
> >> So per Bill's example data for Data1, the expected (hoped for)
output
> >> should be:
> >>
> >>   Vendor Account Num_Vendors_Sharing_Bank_Acct
> >> 1     V1      A1      0
> >> 2     V2      A2      3
> >> 3     V3      A2      3
> >> 4     V4      A2      3
> >>
> >>
> >> Where the new calculated variable is
Num_Vendors_Sharing_Bank_Acct.
> >> The value is 3 for V2, V3 and V4 because they each share bank
account
> >> A2.
> >>
> >>
> >> Likewise, in the Data2 frame, the same logic applies:
> >>
> >>   Vendor Account Num_Vendors_Sharing_Bank_Acct
> >> 1     V1      A1     0
> >> 2     V2      A2     3
> >> 3     V3      A2     3
> >> 4     V1      A2     3
> >> 5     V4      A3     0
> >> 6     V2      A4     0
> >>
> >>
> >>
> >>
> >>
> >>
> >> Thanks!
> >>
> >>
> >> Quoting Bill Dunlap <williamwdunlap at gmail.com>:
> >>
> >> > What should the result be for
> >> >   Data1 <-
data.frame(Vendor=c("V1","V2","V3","V4"),
> >> >
Account=c("A1","A2","A2","A2"))
> >> > ?
> >> >
> >> > Must each vendor have only one account?  If not, what should
the
> result
> >> be
> >> > for
> >> >    Data2 <-
data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"),
> >> >
Account=c("A1","A2","A2","A2","A3","A4"))
> >> > ?
> >> >
> >> > -Bill
> >> >
> >> > On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman <twoolman at
ontargettek.com
> >
> >> > wrote:
> >> >
> >> >> Hi everyone.  I have a dataframe that is a collection of
Vendor IDs
> >> >> plus a bank account number for each vendor. I'm
trying to find a way
> >> >> to count the number of duplicate bank accounts that occur
in more
> than
> >> >> one unique Vendor_ID, and then assign the count value for
each row in
> >> >> the dataframe in a new variable.
> >> >>
> >> >> I can do a count of bank accounts that occur within the
same vendor
> >> >> using dplyr and group_by and count, but I can't
figure out a way to
> >> >> count duplicates among multiple Vendor_IDs.
> >> >>
> >> >>
> >> >> Dataframe example code:
> >> >>
> >> >>
> >> >> #Create a sample data frame:
> >> >>
> >> >> set.seed(1)
> >> >>
> >> >> Data <- data.frame(Vendor_ID = sample(1:10000),
Bank_Account_ID > >> >> sample(1:10000))
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> Thanks in advance for any help.
> >> >>
> >> >> ______________________________________________
> >> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE
and more, see
> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> PLEASE do read the posting guide
> >> >> http://www.R-project.org/posting-guide.html
> >> >> and provide commented, minimal, self-contained,
reproducible code.
> >> >>
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
>
>
>
>
	[[alternative HTML version deleted]]

Deepayan Sarkar

2020-Nov-18 09:40 UTC

head link

[R] counting duplicate items that occur in multiple groups

On Wed, Nov 18, 2020 at 5:40 AM Bert Gunter <bgunter.4567 at gmail.com>
wrote:>
> z <- with(Data2, tapply(Vendor,Account, I))
> n <- vapply(z,length,1)
> data.frame (Vendor = unlist(z),
>    Account = rep(names(z),n),
>    NumVen = rep(n,n)
> )
>
> ## which gives:
>
>    Vendor Account NumVen
> A1      V1      A1      1
> A21     V2      A2      3
> A22     V3      A2      3
> A23     V1      A2      3
> A3      V4      A3      1
> A4      V2      A4      1
>
> Of course this also works for Data1
>
> Bill may be able to come up with a slicker version, however.
Perhaps

transform(Data2, nshare = as.vector(table(Account)[Account]))

(or dplyr::mutate() instead of transform(), if you prefer.)

-Deepayan
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)
>
>
> On Tue, Nov 17, 2020 at 3:34 PM Tom Woolman <twoolman at
ontargettek.com>
> wrote:
>
> > Yes, good catch. Thanks
> >
> >
> > Quoting Bert Gunter <bgunter.4567 at gmail.com>:
> >
> > > Why 0's in the data frame? Shouldn't that be 1 (vendor
with that
> > account)?
> > >
> > > Bert
> > > Bert Gunter
> > >
> > > "The trouble with having an open mind is that people keep
coming along
> > and
> > > sticking things into it."
> > > -- Opus (aka Berkeley Breathed in his "Bloom County"
comic strip )
> > >
> > >
> > > On Tue, Nov 17, 2020 at 3:29 PM Tom Woolman <twoolman at
ontargettek.com>
> > > wrote:
> > >
> > >> Hi Bill. Sorry to be so obtuse with the example data, I was
trying
> > >> (too hard) not to share any actual values so I just created
randomized
> > >> values for my example; of course I should have specified that
the
> > >> random values would not provide the expected problem pattern.
I should
> > >> have just used simple dummy codes as Bill Dunlap did.
> > >>
> > >> So per Bill's example data for Data1, the expected (hoped
for) output
> > >> should be:
> > >>
> > >>   Vendor Account Num_Vendors_Sharing_Bank_Acct
> > >> 1     V1      A1      0
> > >> 2     V2      A2      3
> > >> 3     V3      A2      3
> > >> 4     V4      A2      3
> > >>
> > >>
> > >> Where the new calculated variable is
Num_Vendors_Sharing_Bank_Acct.
> > >> The value is 3 for V2, V3 and V4 because they each share bank
account
> > >> A2.
> > >>
> > >>
> > >> Likewise, in the Data2 frame, the same logic applies:
> > >>
> > >>   Vendor Account Num_Vendors_Sharing_Bank_Acct
> > >> 1     V1      A1     0
> > >> 2     V2      A2     3
> > >> 3     V3      A2     3
> > >> 4     V1      A2     3
> > >> 5     V4      A3     0
> > >> 6     V2      A4     0
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> Thanks!
> > >>
> > >>
> > >> Quoting Bill Dunlap <williamwdunlap at gmail.com>:
> > >>
> > >> > What should the result be for
> > >> >   Data1 <-
data.frame(Vendor=c("V1","V2","V3","V4"),
> > >> >
Account=c("A1","A2","A2","A2"))
> > >> > ?
> > >> >
> > >> > Must each vendor have only one account?  If not, what
should the
> > result
> > >> be
> > >> > for
> > >> >    Data2 <-
data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"),
> > >> >
Account=c("A1","A2","A2","A2","A3","A4"))
> > >> > ?
> > >> >
> > >> > -Bill
> > >> >
> > >> > On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman <twoolman
at ontargettek.com
> > >
> > >> > wrote:
> > >> >
> > >> >> Hi everyone.  I have a dataframe that is a
collection of Vendor IDs
> > >> >> plus a bank account number for each vendor. I'm
trying to find a way
> > >> >> to count the number of duplicate bank accounts that
occur in more
> > than
> > >> >> one unique Vendor_ID, and then assign the count
value for each row in
> > >> >> the dataframe in a new variable.
> > >> >>
> > >> >> I can do a count of bank accounts that occur within
the same vendor
> > >> >> using dplyr and group_by and count, but I can't
figure out a way to
> > >> >> count duplicates among multiple Vendor_IDs.
> > >> >>
> > >> >>
> > >> >> Dataframe example code:
> > >> >>
> > >> >>
> > >> >> #Create a sample data frame:
> > >> >>
> > >> >> set.seed(1)
> > >> >>
> > >> >> Data <- data.frame(Vendor_ID = sample(1:10000),
Bank_Account_ID > > >> >> sample(1:10000))
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >> Thanks in advance for any help.
> > >> >>
> > >> >> ______________________________________________
> > >> >> R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more, see
> > >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> >> PLEASE do read the posting guide
> > >> >> http://www.R-project.org/posting-guide.html
> > >> >> and provide commented, minimal, self-contained,
reproducible code.
> > >> >>
> > >>
> > >> ______________________________________________
> > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > >> http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible
code.
> > >>
> >
> >
> >
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Nov 2020 - counting duplicate items that occur in multiple groups

[R] counting duplicate items that occur in multiple groups

[R] counting duplicate items that occur in multiple groups

[R] counting duplicate items that occur in multiple groups