Hey,
I'm trying to match patient identifiers from two separate input files, and
then add information from one of the input files to the corresponding output
file. I'd greatly appreciate any help!
More specifically,
Input_File_1 has a column header "bcr_patient_barcode"
Input_File_2 has a column header "Barcode" and a column header
"Batch"
I want my script to match the appropriate patient identifiers since
"bcr_patient_barcode" and "Barcode" are not in the same
order. Then I want
to add the information from "Batch" to the corresponding patient.
My (incorrect) code is below:
#batch
tmp <- Input_File_2$Barcode
tmp1 <- Input_File_1$bcr_patient_barcode
for i in tmp
for item in tmp1
if (tmp == tmp1) {
curated$batch <- Input_File_2$Batch
}
Thanks!
[[alternative HTML version deleted]]
Looks like a job for merge(). On Fri, Oct 28, 2011 at 10:49 AM, Ben Ganzfried <ben.ganzfried at gmail.com> wrote:> Hey, > > I'm trying to match patient identifiers from two separate input files, and > then add information from one of the input files to the corresponding output > file. ?I'd greatly appreciate any help! > > More specifically, > Input_File_1 has a column header "bcr_patient_barcode" > Input_File_2 has a column header "Barcode" and a column header "Batch" > > I want my script to match the appropriate patient identifiers since > "bcr_patient_barcode" and "Barcode" are not in the same order. ?Then I want > to add the information from "Batch" to the corresponding patient. > > My (incorrect) code is below: > > #batch > tmp <- Input_File_2$Barcode > tmp1 <- Input_File_1$bcr_patient_barcode > > for i in tmp > ?for item in tmp1 > if (tmp == tmp1) { > ?curated$batch <- Input_File_2$Batch > } >-- Sarah Goslee http://www.functionaldiversity.org
On Oct 28, 2011, at 9:49 AM, Ben Ganzfried wrote:> Hey, > > I'm trying to match patient identifiers from two separate input files, and > then add information from one of the input files to the corresponding output > file. I'd greatly appreciate any help! > > More specifically, > Input_File_1 has a column header "bcr_patient_barcode" > Input_File_2 has a column header "Barcode" and a column header "Batch" > > I want my script to match the appropriate patient identifiers since > "bcr_patient_barcode" and "Barcode" are not in the same order. Then I want > to add the information from "Batch" to the corresponding patient. > > My (incorrect) code is below: > > #batch > tmp <- Input_File_2$Barcode > tmp1 <- Input_File_1$bcr_patient_barcode > > for i in tmp > for item in tmp1 > if (tmp == tmp1) { > curated$batch <- Input_File_2$Batch > } > > Thanks!See ?merge and then use something like: newDF <- merge(Input_File_2, Input_File_1, by.x = "Barcode", by.y = "bcr_patient_barcode") Also, pay attention to the 'all', 'all.x' and 'all.y' arguments, which control whether or not only matching records are retained or non-matching records are retained from one or both datasets. merge() performs an "SQL-like" join operation. HTH, Marc Schwartz