Hi Guys, working on a "merge" for 2 data frames. Using the command: x <- merge(annotatedData, UCSCgenes, by.x="names", by.y="Ensembl.Gene.ID", all.x=TRUE) names and Ensembl.Gene.ID are columns with similar elements from the x and y data frames. annotatedData has 8909 entries, so has x(as expected). x has columns for UCSCgenes, but there is no data in them, all n/a, as if no match exists. This is not true as I can manually see and find many similarities between the names and UCSCgenes columns. I am wondering if there is any syntax error, or logical. comments appreciated. Thanks Dan
What you "see" and what the data really is may be two different things. You should have at least enclosed an 'str' of the two data frames; even better would be a subset of the data using 'dput'. Most likely your problem is that your data is not what you 'expect' it to be. On Mon, Aug 1, 2011 at 12:17 PM, world peace <buysellrentoffer at gmail.com> wrote:> Hi Guys, > > working on a "merge" for 2 data frames. > > Using the command: > > x <- merge(annotatedData, UCSCgenes, by.x="names", > by.y="Ensembl.Gene.ID", all.x=TRUE) > > names and Ensembl.Gene.ID are columns with similar elements from the x > and y data frames. > > annotatedData has 8909 entries, so has x(as expected). x has columns > for UCSCgenes, but there is no data in them, all n/a, as if no match > exists. > This is not true as I can manually see and find many similarities > between the names and UCSCgenes columns. > > I am wondering if there is any syntax error, or logical. > > comments appreciated. > > Thanks > Dan > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
Dan, If the variables you are merging by are character variables, there may be subtle differences that you haven't noticed, e.g., capitalization or spacing. You can look for differences by listing off the unique values: table(c(annotatedData$names, UCSCgenes$Ensembl.Gene.ID)) Jean `·.,, ><(((º> `·.,, ><(((º> `·.,, ><(((º> Jean V. Adams Statistician U.S. Geological Survey Great Lakes Science Center 223 East Steinfest Road Antigo, WI 54409 USA From: world peace <buysellrentoffer@gmail.com> To: r-help@r-project.org Date: 08/01/2011 11:24 AM Subject: [R] possible reason for merge not working Sent by: r-help-bounces@r-project.org Hi Guys, working on a "merge" for 2 data frames. Using the command: x <- merge(annotatedData, UCSCgenes, by.x="names", by.y="Ensembl.Gene.ID", all.x=TRUE) names and Ensembl.Gene.ID are columns with similar elements from the x and y data frames. annotatedData has 8909 entries, so has x(as expected). x has columns for UCSCgenes, but there is no data in them, all n/a, as if no match exists. This is not true as I can manually see and find many similarities between the names and UCSCgenes columns. I am wondering if there is any syntax error, or logical. comments appreciated. Thanks Dan ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
On Aug 1, 2011, at 12:17 PM, world peace wrote:> Hi Guys, > > working on a "merge" for 2 data frames. > > Using the command: > > x <- merge(annotatedData, UCSCgenes, by.x="names", > by.y="Ensembl.Gene.ID", all.x=TRUE) > > names and Ensembl.Gene.ID are columns with similar elements from the x > and y data frames. > > annotatedData has 8909 entries, so has x(as expected). x has columns > for UCSCgenes, but there is no data in them, all n/a, as if no match > exists. > This is not true as I can manually see and find many similaritiesThe merge function does not work on "similarities". Matches need to be exact.> between the names and UCSCgenes columns. > > I am wondering if there is any syntax error, or logical.Probably logical. -- David Winsemius, MD West Hartford, CT