Tom Woolman
2020-Nov-17 23:34 UTC
[R] counting duplicate items that occur in multiple groups
Yes, good catch. Thanks Quoting Bert Gunter <bgunter.4567 at gmail.com>:> Why 0's in the data frame? Shouldn't that be 1 (vendor with that account)? > > Bert > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Tue, Nov 17, 2020 at 3:29 PM Tom Woolman <twoolman at ontargettek.com> > wrote: > >> Hi Bill. Sorry to be so obtuse with the example data, I was trying >> (too hard) not to share any actual values so I just created randomized >> values for my example; of course I should have specified that the >> random values would not provide the expected problem pattern. I should >> have just used simple dummy codes as Bill Dunlap did. >> >> So per Bill's example data for Data1, the expected (hoped for) output >> should be: >> >> Vendor Account Num_Vendors_Sharing_Bank_Acct >> 1 V1 A1 0 >> 2 V2 A2 3 >> 3 V3 A2 3 >> 4 V4 A2 3 >> >> >> Where the new calculated variable is Num_Vendors_Sharing_Bank_Acct. >> The value is 3 for V2, V3 and V4 because they each share bank account >> A2. >> >> >> Likewise, in the Data2 frame, the same logic applies: >> >> Vendor Account Num_Vendors_Sharing_Bank_Acct >> 1 V1 A1 0 >> 2 V2 A2 3 >> 3 V3 A2 3 >> 4 V1 A2 3 >> 5 V4 A3 0 >> 6 V2 A4 0 >> >> >> >> >> >> >> Thanks! >> >> >> Quoting Bill Dunlap <williamwdunlap at gmail.com>: >> >> > What should the result be for >> > Data1 <- data.frame(Vendor=c("V1","V2","V3","V4"), >> > Account=c("A1","A2","A2","A2")) >> > ? >> > >> > Must each vendor have only one account? If not, what should the result >> be >> > for >> > Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"), >> > Account=c("A1","A2","A2","A2","A3","A4")) >> > ? >> > >> > -Bill >> > >> > On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman <twoolman at ontargettek.com> >> > wrote: >> > >> >> Hi everyone. I have a dataframe that is a collection of Vendor IDs >> >> plus a bank account number for each vendor. I'm trying to find a way >> >> to count the number of duplicate bank accounts that occur in more than >> >> one unique Vendor_ID, and then assign the count value for each row in >> >> the dataframe in a new variable. >> >> >> >> I can do a count of bank accounts that occur within the same vendor >> >> using dplyr and group_by and count, but I can't figure out a way to >> >> count duplicates among multiple Vendor_IDs. >> >> >> >> >> >> Dataframe example code: >> >> >> >> >> >> #Create a sample data frame: >> >> >> >> set.seed(1) >> >> >> >> Data <- data.frame(Vendor_ID = sample(1:10000), Bank_Account_ID >> >> sample(1:10000)) >> >> >> >> >> >> >> >> >> >> Thanks in advance for any help. >> >> >> >> ______________________________________________ >> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide >> >> http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> >> >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >>
Bert Gunter
2020-Nov-18 00:06 UTC
[R] counting duplicate items that occur in multiple groups
z <- with(Data2, tapply(Vendor,Account, I)) n <- vapply(z,length,1) data.frame (Vendor = unlist(z), Account = rep(names(z),n), NumVen = rep(n,n) ) ## which gives: Vendor Account NumVen A1 V1 A1 1 A21 V2 A2 3 A22 V3 A2 3 A23 V1 A2 3 A3 V4 A3 1 A4 V2 A4 1 Of course this also works for Data1 Bill may be able to come up with a slicker version, however. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Nov 17, 2020 at 3:34 PM Tom Woolman <twoolman at ontargettek.com> wrote:> Yes, good catch. Thanks > > > Quoting Bert Gunter <bgunter.4567 at gmail.com>: > > > Why 0's in the data frame? Shouldn't that be 1 (vendor with that > account)? > > > > Bert > > Bert Gunter > > > > "The trouble with having an open mind is that people keep coming along > and > > sticking things into it." > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > > > > On Tue, Nov 17, 2020 at 3:29 PM Tom Woolman <twoolman at ontargettek.com> > > wrote: > > > >> Hi Bill. Sorry to be so obtuse with the example data, I was trying > >> (too hard) not to share any actual values so I just created randomized > >> values for my example; of course I should have specified that the > >> random values would not provide the expected problem pattern. I should > >> have just used simple dummy codes as Bill Dunlap did. > >> > >> So per Bill's example data for Data1, the expected (hoped for) output > >> should be: > >> > >> Vendor Account Num_Vendors_Sharing_Bank_Acct > >> 1 V1 A1 0 > >> 2 V2 A2 3 > >> 3 V3 A2 3 > >> 4 V4 A2 3 > >> > >> > >> Where the new calculated variable is Num_Vendors_Sharing_Bank_Acct. > >> The value is 3 for V2, V3 and V4 because they each share bank account > >> A2. > >> > >> > >> Likewise, in the Data2 frame, the same logic applies: > >> > >> Vendor Account Num_Vendors_Sharing_Bank_Acct > >> 1 V1 A1 0 > >> 2 V2 A2 3 > >> 3 V3 A2 3 > >> 4 V1 A2 3 > >> 5 V4 A3 0 > >> 6 V2 A4 0 > >> > >> > >> > >> > >> > >> > >> Thanks! > >> > >> > >> Quoting Bill Dunlap <williamwdunlap at gmail.com>: > >> > >> > What should the result be for > >> > Data1 <- data.frame(Vendor=c("V1","V2","V3","V4"), > >> > Account=c("A1","A2","A2","A2")) > >> > ? > >> > > >> > Must each vendor have only one account? If not, what should the > result > >> be > >> > for > >> > Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"), > >> > Account=c("A1","A2","A2","A2","A3","A4")) > >> > ? > >> > > >> > -Bill > >> > > >> > On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman <twoolman at ontargettek.com > > > >> > wrote: > >> > > >> >> Hi everyone. I have a dataframe that is a collection of Vendor IDs > >> >> plus a bank account number for each vendor. I'm trying to find a way > >> >> to count the number of duplicate bank accounts that occur in more > than > >> >> one unique Vendor_ID, and then assign the count value for each row in > >> >> the dataframe in a new variable. > >> >> > >> >> I can do a count of bank accounts that occur within the same vendor > >> >> using dplyr and group_by and count, but I can't figure out a way to > >> >> count duplicates among multiple Vendor_IDs. > >> >> > >> >> > >> >> Dataframe example code: > >> >> > >> >> > >> >> #Create a sample data frame: > >> >> > >> >> set.seed(1) > >> >> > >> >> Data <- data.frame(Vendor_ID = sample(1:10000), Bank_Account_ID > >> >> sample(1:10000)) > >> >> > >> >> > >> >> > >> >> > >> >> Thanks in advance for any help. > >> >> > >> >> ______________________________________________ > >> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> >> https://stat.ethz.ch/mailman/listinfo/r-help > >> >> PLEASE do read the posting guide > >> >> http://www.R-project.org/posting-guide.html > >> >> and provide commented, minimal, self-contained, reproducible code. > >> >> > >> > >> ______________________________________________ > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > >[[alternative HTML version deleted]]
Deepayan Sarkar
2020-Nov-18 09:40 UTC
[R] counting duplicate items that occur in multiple groups
On Wed, Nov 18, 2020 at 5:40 AM Bert Gunter <bgunter.4567 at gmail.com> wrote:> > z <- with(Data2, tapply(Vendor,Account, I)) > n <- vapply(z,length,1) > data.frame (Vendor = unlist(z), > Account = rep(names(z),n), > NumVen = rep(n,n) > ) > > ## which gives: > > Vendor Account NumVen > A1 V1 A1 1 > A21 V2 A2 3 > A22 V3 A2 3 > A23 V1 A2 3 > A3 V4 A3 1 > A4 V2 A4 1 > > Of course this also works for Data1 > > Bill may be able to come up with a slicker version, however.Perhaps transform(Data2, nshare = as.vector(table(Account)[Account])) (or dplyr::mutate() instead of transform(), if you prefer.) -Deepayan> > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Tue, Nov 17, 2020 at 3:34 PM Tom Woolman <twoolman at ontargettek.com> > wrote: > > > Yes, good catch. Thanks > > > > > > Quoting Bert Gunter <bgunter.4567 at gmail.com>: > > > > > Why 0's in the data frame? Shouldn't that be 1 (vendor with that > > account)? > > > > > > Bert > > > Bert Gunter > > > > > > "The trouble with having an open mind is that people keep coming along > > and > > > sticking things into it." > > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > > > > > > > On Tue, Nov 17, 2020 at 3:29 PM Tom Woolman <twoolman at ontargettek.com> > > > wrote: > > > > > >> Hi Bill. Sorry to be so obtuse with the example data, I was trying > > >> (too hard) not to share any actual values so I just created randomized > > >> values for my example; of course I should have specified that the > > >> random values would not provide the expected problem pattern. I should > > >> have just used simple dummy codes as Bill Dunlap did. > > >> > > >> So per Bill's example data for Data1, the expected (hoped for) output > > >> should be: > > >> > > >> Vendor Account Num_Vendors_Sharing_Bank_Acct > > >> 1 V1 A1 0 > > >> 2 V2 A2 3 > > >> 3 V3 A2 3 > > >> 4 V4 A2 3 > > >> > > >> > > >> Where the new calculated variable is Num_Vendors_Sharing_Bank_Acct. > > >> The value is 3 for V2, V3 and V4 because they each share bank account > > >> A2. > > >> > > >> > > >> Likewise, in the Data2 frame, the same logic applies: > > >> > > >> Vendor Account Num_Vendors_Sharing_Bank_Acct > > >> 1 V1 A1 0 > > >> 2 V2 A2 3 > > >> 3 V3 A2 3 > > >> 4 V1 A2 3 > > >> 5 V4 A3 0 > > >> 6 V2 A4 0 > > >> > > >> > > >> > > >> > > >> > > >> > > >> Thanks! > > >> > > >> > > >> Quoting Bill Dunlap <williamwdunlap at gmail.com>: > > >> > > >> > What should the result be for > > >> > Data1 <- data.frame(Vendor=c("V1","V2","V3","V4"), > > >> > Account=c("A1","A2","A2","A2")) > > >> > ? > > >> > > > >> > Must each vendor have only one account? If not, what should the > > result > > >> be > > >> > for > > >> > Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"), > > >> > Account=c("A1","A2","A2","A2","A3","A4")) > > >> > ? > > >> > > > >> > -Bill > > >> > > > >> > On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman <twoolman at ontargettek.com > > > > > >> > wrote: > > >> > > > >> >> Hi everyone. I have a dataframe that is a collection of Vendor IDs > > >> >> plus a bank account number for each vendor. I'm trying to find a way > > >> >> to count the number of duplicate bank accounts that occur in more > > than > > >> >> one unique Vendor_ID, and then assign the count value for each row in > > >> >> the dataframe in a new variable. > > >> >> > > >> >> I can do a count of bank accounts that occur within the same vendor > > >> >> using dplyr and group_by and count, but I can't figure out a way to > > >> >> count duplicates among multiple Vendor_IDs. > > >> >> > > >> >> > > >> >> Dataframe example code: > > >> >> > > >> >> > > >> >> #Create a sample data frame: > > >> >> > > >> >> set.seed(1) > > >> >> > > >> >> Data <- data.frame(Vendor_ID = sample(1:10000), Bank_Account_ID > > >> >> sample(1:10000)) > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> Thanks in advance for any help. > > >> >> > > >> >> ______________________________________________ > > >> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > >> >> https://stat.ethz.ch/mailman/listinfo/r-help > > >> >> PLEASE do read the posting guide > > >> >> http://www.R-project.org/posting-guide.html > > >> >> and provide commented, minimal, self-contained, reproducible code. > > >> >> > > >> > > >> ______________________________________________ > > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > >> https://stat.ethz.ch/mailman/listinfo/r-help > > >> PLEASE do read the posting guide > > >> http://www.R-project.org/posting-guide.html > > >> and provide commented, minimal, self-contained, reproducible code. > > >> > > > > > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.