thr3ads.net - R help - [R] counting duplicate items that occur in multiple groups [Nov 2020]

If this information is useful, please help other people find it:
Share via:

Tom Woolman

2020-Nov-17 21:10 UTC

[R] counting duplicate items that occur in multiple groups

Hi everyone.  I have a dataframe that is a collection of Vendor IDs  
plus a bank account number for each vendor. I'm trying to find a way  
to count the number of duplicate bank accounts that occur in more than  
one unique Vendor_ID, and then assign the count value for each row in  
the dataframe in a new variable.

I can do a count of bank accounts that occur within the same vendor  
using dplyr and group_by and count, but I can't figure out a way to  
count duplicates among multiple Vendor_IDs.


Dataframe example code:


#Create a sample data frame:

set.seed(1)

Data <- data.frame(Vendor_ID = sample(1:10000), Bank_Account_ID =  
sample(1:10000))




Thanks in advance for any help.

Bert Gunter

2020-Nov-17 22:22 UTC

head link

[R] counting duplicate items that occur in multiple groups

Inline.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman <twoolman at ontargettek.com>
wrote:
> Hi everyone.  I have a dataframe that is a collection of Vendor IDs
> plus a bank account number for each vendor.

I interpret this as: "all vendors are unique and each vendor has a single
bank account." Is that correct?

> I'm trying to find a way
> to count the number of duplicate bank accounts that occur in more than
> one unique Vendor_ID,

The following makes no sense to me, as each row is a unique vendor and has
only one bank account.
> and then assign the count value for each row in
> the dataframe in a new variable.
>
> I can do a count of bank accounts that occur within the same vendor
>using dplyr and group_by and count, but I can't figure out a way
to> count duplicates among multiple Vendor_IDs.
>I interpret this to mean that you want to count vendor ID's by account .
With only one account per vendor
this is trivial; e.g.

set.seed(22)
d1 <- data.frame(id = sample(1:30),
      account = sample(1:20,30, replace = TRUE))

table(d1$account)

## gives
 1  2  3  6  7  8  9 10 11 13 15 16 17 18 19 20
 3  1  2  1  1  1  1  1  4  3  1  2  1  3  2  3

Note that AFAICS your example is useless, as it gives the same number of
different account numbers as ID's, so no duplication can occur.

As my interpretations are likely incorrect and this is not what you mean
nor want, either clarify your meaning and provide a useful **minimal**
example; or wait for a reply from someone with a better understanding than
I.

Cheers,
Bert




>
> Dataframe example code:
>
>
> #Create a sample data frame:
>
> set.seed(1)
>
> Data <- data.frame(Vendor_ID = sample(1:10000), Bank_Account_ID >
sample(1:10000))
>
>
>
>
> Thanks in advance for any help.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Bill Dunlap

2020-Nov-17 23:01 UTC

head link

[R] counting duplicate items that occur in multiple groups

What should the result be for
  Data1 <-
data.frame(Vendor=c("V1","V2","V3","V4"),
Account=c("A1","A2","A2","A2"))
?

Must each vendor have only one account?  If not, what should the result be
for
   Data2 <-
data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"),
Account=c("A1","A2","A2","A2","A3","A4"))
?

-Bill

On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman <twoolman at ontargettek.com>
wrote:
> Hi everyone.  I have a dataframe that is a collection of Vendor IDs
> plus a bank account number for each vendor. I'm trying to find a way
> to count the number of duplicate bank accounts that occur in more than
> one unique Vendor_ID, and then assign the count value for each row in
> the dataframe in a new variable.
>
> I can do a count of bank accounts that occur within the same vendor
> using dplyr and group_by and count, but I can't figure out a way to
> count duplicates among multiple Vendor_IDs.
>
>
> Dataframe example code:
>
>
> #Create a sample data frame:
>
> set.seed(1)
>
> Data <- data.frame(Vendor_ID = sample(1:10000), Bank_Account_ID >
sample(1:10000))
>
>
>
>
> Thanks in advance for any help.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Tom Woolman

2020-Nov-17 23:29 UTC

head link

[R] counting duplicate items that occur in multiple groups

Hi Bill. Sorry to be so obtuse with the example data, I was trying  
(too hard) not to share any actual values so I just created randomized  
values for my example; of course I should have specified that the  
random values would not provide the expected problem pattern. I should  
have just used simple dummy codes as Bill Dunlap did.

So per Bill's example data for Data1, the expected (hoped for) output  
should be:

  Vendor Account Num_Vendors_Sharing_Bank_Acct
1     V1      A1      0
2     V2      A2      3
3     V3      A2      3
4     V4      A2      3


Where the new calculated variable is Num_Vendors_Sharing_Bank_Acct.  
The value is 3 for V2, V3 and V4 because they each share bank account  
A2.


Likewise, in the Data2 frame, the same logic applies:

  Vendor Account Num_Vendors_Sharing_Bank_Acct
1     V1      A1     0
2     V2      A2     3
3     V3      A2     3
4     V1      A2     3
5     V4      A3     0
6     V2      A4     0






Thanks!


Quoting Bill Dunlap <williamwdunlap at gmail.com>:
> What should the result be for
>   Data1 <-
data.frame(Vendor=c("V1","V2","V3","V4"),
> Account=c("A1","A2","A2","A2"))
> ?
>
> Must each vendor have only one account?  If not, what should the result be
> for
>    Data2 <-
data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"),
>
Account=c("A1","A2","A2","A2","A3","A4"))
> ?
>
> -Bill
>
> On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman <twoolman at
ontargettek.com>
> wrote:
>
>> Hi everyone.  I have a dataframe that is a collection of Vendor IDs
>> plus a bank account number for each vendor. I'm trying to find a
way
>> to count the number of duplicate bank accounts that occur in more than
>> one unique Vendor_ID, and then assign the count value for each row in
>> the dataframe in a new variable.
>>
>> I can do a count of bank accounts that occur within the same vendor
>> using dplyr and group_by and count, but I can't figure out a way to
>> count duplicates among multiple Vendor_IDs.
>>
>>
>> Dataframe example code:
>>
>>
>> #Create a sample data frame:
>>
>> set.seed(1)
>>
>> Data <- data.frame(Vendor_ID = sample(1:10000), Bank_Account_ID
>> sample(1:10000))
>>
>>
>>
>>
>> Thanks in advance for any help.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>

Jim Lemon

2020-Nov-18 10:22 UTC

head link

[R] counting duplicate items that occur in multiple groups

Oops, I sent this to Tom earlier today and forgot to copy to the list:

VendorID=rep(paste0("V",1:10),each=5)
AcctID=paste0("A",sample(1:5,50,TRUE))
Data<-data.frame(VendorID,AcctID)
table(Data)
# get multiple vendors for each account
dupAcctID<-colSums(table(Data)>0)
Data$dupAcct<-NA
# fill in the new column
for(i in 1:length(dupAcctID))
 Data$dupAcct[Data$AcctID == names(dupAcctID[i])]<-dupAcctID[i]

Jim

On Wed, Nov 18, 2020 at 8:20 AM Tom Woolman <twoolman at ontargettek.com>
wrote:
> Hi everyone.  I have a dataframe that is a collection of Vendor IDs
> plus a bank account number for each vendor. I'm trying to find a way
> to count the number of duplicate bank accounts that occur in more than
> one unique Vendor_ID, and then assign the count value for each row in
> the dataframe in a new variable.
>
> I can do a count of bank accounts that occur within the same vendor
> using dplyr and group_by and count, but I can't figure out a way to
> count duplicates among multiple Vendor_IDs.
>
>
> Dataframe example code:
>
>
> #Create a sample data frame:
>
> set.seed(1)
>
> Data <- data.frame(Vendor_ID = sample(1:10000), Bank_Account_ID >
sample(1:10000))
>
>
>
>
> Thanks in advance for any help.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Tom Woolman

2020-Nov-18 10:25 UTC

head link

[R] counting duplicate items that occur in multiple groups

Thanks, everyone!



Quoting Jim Lemon <drjimlemon at gmail.com>:
> Oops, I sent this to Tom earlier today and forgot to copy to the list:
>
> VendorID=rep(paste0("V",1:10),each=5)
> AcctID=paste0("A",sample(1:5,50,TRUE))
> Data<-data.frame(VendorID,AcctID)
> table(Data)
> # get multiple vendors for each account
> dupAcctID<-colSums(table(Data)>0)
> Data$dupAcct<-NA
> # fill in the new column
> for(i in 1:length(dupAcctID))
>  Data$dupAcct[Data$AcctID == names(dupAcctID[i])]<-dupAcctID[i]
>
> Jim
>
> On Wed, Nov 18, 2020 at 8:20 AM Tom Woolman <twoolman at
ontargettek.com>
> wrote:
>
>> Hi everyone.  I have a dataframe that is a collection of Vendor IDs
>> plus a bank account number for each vendor. I'm trying to find a
way
>> to count the number of duplicate bank accounts that occur in more than
>> one unique Vendor_ID, and then assign the count value for each row in
>> the dataframe in a new variable.
>>
>> I can do a count of bank accounts that occur within the same vendor
>> using dplyr and group_by and count, but I can't figure out a way to
>> count duplicates among multiple Vendor_IDs.
>>
>>
>> Dataframe example code:
>>
>>
>> #Create a sample data frame:
>>
>> set.seed(1)
>>
>> Data <- data.frame(Vendor_ID = sample(1:10000), Bank_Account_ID
>> sample(1:10000))
>>
>>
>>
>>
>> Thanks in advance for any help.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Nov 2020 - counting duplicate items that occur in multiple groups

[R] counting duplicate items that occur in multiple groups

[R] counting duplicate items that occur in multiple groups

[R] counting duplicate items that occur in multiple groups

[R] counting duplicate items that occur in multiple groups

[R] counting duplicate items that occur in multiple groups

[R] counting duplicate items that occur in multiple groups