Kevin Wamae
2019-Jan-17 00:29 UTC
[R] create groups from data with duplicates, such that each group has a duplicate represented once
Hi, I have a sequencing run with ~3000 samples (attached dataset). The samples were initially tagged and amplified by PCR in duplicate. The tags used range from MID01 to MID26. MID01-MID13 were used for pair 1 while MID14-MID26 were used for pair 2. The tags are re-used to allow samples to be pooled. The pooling process will involve mixing samples with MID01-26 into the first group, the next group samples with MID01-26 into the second group and so on. I'm hoping to get an R script that can create these groups such that for each group, any of the Tags appears only once. An example is shown below. ID TagA TagB group 180 MID03 MID10 group1 181 MID04 MID06 group1 182 MID05 MID07 group1 183 MID03 MID09 group2 184 MID04 MID10 group2 185 MID05 MID06 group2 186 MID01 MID06 group3 187 MID02 MID07 group3 188 MID03 MID08 group3 ______________________________________________________________________ This e-mail contains information which is confidential. It is intended only for the use of the named recipient. If you have received this e-mail in error, please let us know by replying to the sender, and immediately delete it from your system. Please note, that in these circumstances, the use, disclosure, distribution or copying of this information is strictly prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility for the accuracy or completeness of this message as it has been transmitted over a public network. Although the Programme has taken reasonable precautions to ensure no viruses are present in emails, it cannot accept responsibility for any loss or damage arising from the use of the email or attachments. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of KEMRI-Wellcome Trust Programme. ______________________________________________________________________
PIKAL Petr
2019-Jan-17 08:55 UTC
[R] create groups from data with duplicates, such that each group has a duplicate represented once
Hi Instead of attachment which is usually removed you should use dput Something like output from dput(head(yourdata,30)) To remove duplicate values see unique or duplicated Cheers Petr> -----Original Message----- > From: R-help <r-help-bounces at r-project.org> On Behalf Of Kevin Wamae > Sent: Thursday, January 17, 2019 1:29 AM > To: r-help at r-project.org > Subject: [R] create groups from data with duplicates, such that each group has > a duplicate represented once > > Hi, I have a sequencing run with ~3000 samples (attached dataset). The > samples were initially tagged and amplified by PCR in duplicate. The tags used > range from MID01 to MID26. > > MID01-MID13 were used for pair 1 while MID14-MID26 were used for pair 2. > The tags are re-used to allow samples to be pooled. > > The pooling process will involve mixing samples with MID01-26 into the first > group, the next group samples with MID01-26 into the second group and so on. > > I'm hoping to get an R script that can create these groups such that for each > group, any of the Tags appears only once. An example is shown below. > > ID > > TagA > > TagB > > group > > 180 > > MID03 > > MID10 > > group1 > > 181 > > MID04 > > MID06 > > group1 > > 182 > > MID05 > > MID07 > > group1 > > 183 > > MID03 > > MID09 > > group2 > > 184 > > MID04 > > MID10 > > group2 > > 185 > > MID05 > > MID06 > > group2 > > 186 > > MID01 > > MID06 > > group3 > > 187 > > MID02 > > MID07 > > group3 > > 188 > > MID03 > > MID08 > > group3 > > > > ___________________________________________________________________ > ___ > > This e-mail contains information which is confidential. It is intended only for > the use of the named recipient. If you have received this e-mail in error, please > let us know by replying to the sender, and immediately delete it from your > system. Please note, that in these circumstances, the use, disclosure, > distribution or copying of this information is strictly prohibited. KEMRI- > Wellcome Trust Programme cannot accept any responsibility for the accuracy > or completeness of this message as it has been transmitted over a public > network. Although the Programme has taken reasonable precautions to ensure > no viruses are present in emails, it cannot accept responsibility for any loss or > damage arising from the use of the email or attachments. Any views expressed > in this message are those of the individual sender, except where the sender > specifically states them to be the views of KEMRI-Wellcome Trust Programme. > ___________________________________________________________________ > ___ > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.Osobn? ?daje: Informace o zpracov?n? a ochran? osobn?ch ?daj? obchodn?ch partner? PRECHEZA a.s. jsou zve?ejn?ny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner?s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/ D?v?rnost: Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a podl?haj? tomuto pr?vn? z?vazn?mu prohl??en? o vylou?en? odpov?dnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/
Kevin Wamae
2019-Jan-17 11:53 UTC
[R] create groups from data with duplicates, such that each group has a duplicate represented once
Dear Petr, thank you for the guidance. A colleague managed to solve it.... I'll definitely use "dput" for future postings. Regards ------------------ Kevin Wamae ?On 17/01/2019, 03:57, "PIKAL Petr" <petr.pikal at precheza.cz> wrote: Hi Instead of attachment which is usually removed you should use dput Something like output from dput(head(yourdata,30)) To remove duplicate values see unique or duplicated Cheers Petr > -----Original Message----- > From: R-help <r-help-bounces at r-project.org> On Behalf Of Kevin Wamae > Sent: Thursday, January 17, 2019 1:29 AM > To: r-help at r-project.org > Subject: [R] create groups from data with duplicates, such that each group has > a duplicate represented once > > Hi, I have a sequencing run with ~3000 samples (attached dataset). The > samples were initially tagged and amplified by PCR in duplicate. The tags used > range from MID01 to MID26. > > MID01-MID13 were used for pair 1 while MID14-MID26 were used for pair 2. > The tags are re-used to allow samples to be pooled. > > The pooling process will involve mixing samples with MID01-26 into the first > group, the next group samples with MID01-26 into the second group and so on. > > I'm hoping to get an R script that can create these groups such that for each > group, any of the Tags appears only once. An example is shown below. > > ID > > TagA > > TagB > > group > > 180 > > MID03 > > MID10 > > group1 > > 181 > > MID04 > > MID06 > > group1 > > 182 > > MID05 > > MID07 > > group1 > > 183 > > MID03 > > MID09 > > group2 > > 184 > > MID04 > > MID10 > > group2 > > 185 > > MID05 > > MID06 > > group2 > > 186 > > MID01 > > MID06 > > group3 > > 187 > > MID02 > > MID07 > > group3 > > 188 > > MID03 > > MID08 > > group3 > > > > ___________________________________________________________________ > ___ > > This e-mail contains information which is confidential. It is intended only for > the use of the named recipient. If you have received this e-mail in error, please > let us know by replying to the sender, and immediately delete it from your > system. Please note, that in these circumstances, the use, disclosure, > distribution or copying of this information is strictly prohibited. KEMRI- > Wellcome Trust Programme cannot accept any responsibility for the accuracy > or completeness of this message as it has been transmitted over a public > network. Although the Programme has taken reasonable precautions to ensure > no viruses are present in emails, it cannot accept responsibility for any loss or > damage arising from the use of the email or attachments. Any views expressed > in this message are those of the individual sender, except where the sender > specifically states them to be the views of KEMRI-Wellcome Trust Programme. > ___________________________________________________________________ > ___ > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. Osobn? ?daje: Informace o zpracov?n? a ochran? osobn?ch ?daj? obchodn?ch partner? PRECHEZA a.s. jsou zve?ejn?ny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner?s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/ D?v?rnost: Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a podl?haj? tomuto pr?vn? z?vazn?mu prohl??en? o vylou?en? odpov?dnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/ ______________________________________________________________________ This e-mail contains information which is confidential. It is intended only for the use of the named recipient. If you have received this e-mail in error, please let us know by replying to the sender, and immediately delete it from your system. Please note, that in these circumstances, the use, disclosure, distribution or copying of this information is strictly prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility for the accuracy or completeness of this message as it has been transmitted over a public network. Although the Programme has taken reasonable precautions to ensure no viruses are present in emails, it cannot accept responsibility for any loss or damage arising from the use of the email or attachments. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of KEMRI-Wellcome Trust Programme. ______________________________________________________________________