Hey, I'm trying to match patient identifiers from two separate input files, and then add information from one of the input files to the corresponding output file. I'd greatly appreciate any help! More specifically, Input_File_1 has a column header "bcr_patient_barcode" Input_File_2 has a column header "Barcode" and a column header "Batch" I want my script to match the appropriate patient identifiers since "bcr_patient_barcode" and "Barcode" are not in the same order. Then I want to add the information from "Batch" to the corresponding patient. My (incorrect) code is below: #batch tmp <- Input_File_2$Barcode tmp1 <- Input_File_1$bcr_patient_barcode for i in tmp for item in tmp1 if (tmp == tmp1) { curated$batch <- Input_File_2$Batch } Thanks! [[alternative HTML version deleted]]
Looks like a job for merge(). On Fri, Oct 28, 2011 at 10:49 AM, Ben Ganzfried <ben.ganzfried at gmail.com> wrote:> Hey, > > I'm trying to match patient identifiers from two separate input files, and > then add information from one of the input files to the corresponding output > file. ?I'd greatly appreciate any help! > > More specifically, > Input_File_1 has a column header "bcr_patient_barcode" > Input_File_2 has a column header "Barcode" and a column header "Batch" > > I want my script to match the appropriate patient identifiers since > "bcr_patient_barcode" and "Barcode" are not in the same order. ?Then I want > to add the information from "Batch" to the corresponding patient. > > My (incorrect) code is below: > > #batch > tmp <- Input_File_2$Barcode > tmp1 <- Input_File_1$bcr_patient_barcode > > for i in tmp > ?for item in tmp1 > if (tmp == tmp1) { > ?curated$batch <- Input_File_2$Batch > } >-- Sarah Goslee http://www.functionaldiversity.org
On Oct 28, 2011, at 9:49 AM, Ben Ganzfried wrote:> Hey, > > I'm trying to match patient identifiers from two separate input files, and > then add information from one of the input files to the corresponding output > file. I'd greatly appreciate any help! > > More specifically, > Input_File_1 has a column header "bcr_patient_barcode" > Input_File_2 has a column header "Barcode" and a column header "Batch" > > I want my script to match the appropriate patient identifiers since > "bcr_patient_barcode" and "Barcode" are not in the same order. Then I want > to add the information from "Batch" to the corresponding patient. > > My (incorrect) code is below: > > #batch > tmp <- Input_File_2$Barcode > tmp1 <- Input_File_1$bcr_patient_barcode > > for i in tmp > for item in tmp1 > if (tmp == tmp1) { > curated$batch <- Input_File_2$Batch > } > > Thanks!See ?merge and then use something like: newDF <- merge(Input_File_2, Input_File_1, by.x = "Barcode", by.y = "bcr_patient_barcode") Also, pay attention to the 'all', 'all.x' and 'all.y' arguments, which control whether or not only matching records are retained or non-matching records are retained from one or both datasets. merge() performs an "SQL-like" join operation. HTH, Marc Schwartz