On Apr 29, 2010, at 10:21 AM, Alex Jameson wrote:
> Hi,
>
> i have two files (file1.txt and file2.txt) which i would like to
> merge,
> based on certain criteria, i.e.
> it combines data based on matching geneID and exons.
> i have used the merge option,
Huh? What is the merge option? (There is a merge _function_.)
> but it
"It"? Please provide the code you used. Have you yet read the Posting
Guide as I urged you earlier?
> does not give me the desired outcome.
> merged.txt shows the result i would like.
>
Given that those two files have no GeneID and Exons in common (after I
took you mangled HTML posting and fixed each one to create readable
files) , I would expect that this call which would implement the merge
you requested above would produce 0 rows:
merge(dtd, File2, by=c("GeneID", "Exons")) # which would be
an inner
join
Many (most?) of the numbers in the third desired file that we are
seeing in mangled form do not appear in either of those two input
files, so you appear to be requesting that we hack into your system to
get them. Now what was it that you really wanted? (And no more HTML
postings ... and use the dput function. That would be an equivalent to
the dump method in the Posting Guide which (again) I urge you to read.)
--
David>
>
>
> *File1. txt*
> **
> AffyProbe ProbeType Flag GeneSymbol GeneID Exons Chrom Strand
> Affytart
> AffyEnd 1 1007_s_at:1105:483 0 0 DDR1 780 21 6 + 30975403 30975427 2
> 1007_s_at:1119:177 0 0 DDR1 780 21 6 + 30975549 30975573 3
> 1007_s_at:1136:469 0 0 DDR1 780 21 6 + 30975766 30975790 4 1007_s_at:
> 192:205
> 0 0 DDR1 780 21 6 + 30975523 30975547 5 1007_s_at:474:1161 0 0 DDR1
> 780 21 6
> + 30975745 30975769 6 1007_s_at:504:983 0 0 DDR1 780 21 6 + 30975575
> 30975599 7 1007_s_at:50:779 0 0 DDR1 780 21 6 + 30975758 30975782
>
> *File2.txt*
>
> AgilentProbe ProbeType Flag GeneSymbol GeneID Exons Chrom Strand
> AgilentStart AgilentEnd 1 A_23_P100001 0 0 FAM174B 400451 5 15 -
> 90961852
> 90961793 2 A_23_P100022 0 0 SV2B 9899 14 15 + 89639333 89639392 3
> A_23_P100056 0 0 RBPMS2 348093 8 15 - 62819428 62819369 4
> A_23_P100074 0 0
> AVEN 57099 6 15 - 31946031 31945972 5 A_23_P100092 0 0 ZSCAN29
> 146050 5 15 -
> 41440680 41440621 6 A_23_P100103 0 0 VPS39 23339 24 15 - 40240319
> 40240260 7
> A_23_P100111 0 0 CHP 11261 7 15 + 39358845 39358904 8 A_23_P100127 0
> 0 CASC5
> 57082 11 15 + 38704817 38704876 9 A_23_P100133 0 0 ATMIN 23300 4 16 +
> 79636596 79636655 10 A_23_P100141 0 0 UNKL 64718 12 16 - 1355346
> 1355287
>
>
> *merged.txt (Should look like this)*
>
> GeneSymbol GeneID Exons Chrome AffyMatrixProbeID AffyStart
> AffyEnd
> AgilentProbeID AgilentStart AgilentEnd DDR1 780 21 6
> A_24_P123601
> 30975848 30975907 RFC2 5982 10 7 1053_at:120:925,
> 1053_at:504:41,
> 1053_at:522:871,
> 1053_at:828:1025,
> 203696_s_at:291:651 73287845,
> 73287869,
> 73287863,
> 73287881,
> 73287850 73287821,
> 73287845,
> 73287839,
> 73287857,
> 73287826 A_23_P93823 73287861 73287802 RFC2 5982 11 7
> HSPA6 3310
> 1 1 A_23_P114903 159762782 159762841 PAX8 7849 12 2
> A_23_P210001
> 113691555 113691496 GUCA1A 2978 6 6 UBA7 7318 24 3
> 1294_at:1079:379,
> 1294_at:361:881,
> 203281_s_at:524:889,
> 203281_s_at:678:1017,
> 203281_s_at:68:1153 49818386,
> 49818398,
> 49818378,
> 49818434,
> 49818422 49818362,
> 49818374,
> 49818354,
> 49818420,
> 49818398
>
>
> sorry for the long tables,
>
> thanks
>
> Alex
>
> Student
> University of Colorado
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT