Hello, I want to compare all of the columns of one data frame to another to see if any of the columns are equivalent to one another. The first column in both of my data frames are the sample IDs and do not need to be compared. Below is an example of the loop I am using to compare the two data frames that counts the number of equivalent values there between two columns. So in this example the value of 3 means that all three observations for the two columns being compared were equivalent. The loop works fine but I do not understand why it tests the first column of the sample IDs providing ?NA? for the sum of matching when my loop is specifying to only test columns 2-3. Thank you! #create dataframe A A = matrix(c("a",3,4,"b",5,7,"c",3,7),nrow=3, ncol=3,byrow = TRUE) A <- as.data.frame(A) A$V2 <- as.numeric(A$V2) A$V3 <- as.numeric(A$V3) str(A) #create dataframe B B = matrix(c("a",1,1,"b",6,2,"c",2,2),nrow=3, ncol=3,byrow = TRUE) B <- as.data.frame(B) B$V2 <- as.numeric(B$V2) B$V3 <- as.numeric(B$V3) str(B) results.2 <- numeric() results.3 <- numeric() #compare columns to identify those that are identical in the two dataframes for(i in 2:3){ results.2[i] <- sum(A[,2]==B[,i]) results.3[i] <- sum(A[,3]==B[,i]) results.pc.all <- rbind(results.2,results.3) } results.pc.all
It does not test the first column, but a vector must have consecutive indices. Since you did not assign a value, R inserts a missing value. If you don't want to see it use> results.pc.all[, -1][,1] [,2] results.2 1 2 results.3 2 3 ------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Brittany Demmitt Sent: Monday, June 20, 2016 12:15 PM To: r-help at r-project.org Subject: [R] loop testing unidentified columns Hello, I want to compare all of the columns of one data frame to another to see if any of the columns are equivalent to one another. The first column in both of my data frames are the sample IDs and do not need to be compared. Below is an example of the loop I am using to compare the two data frames that counts the number of equivalent values there between two columns. So in this example the value of 3 means that all three observations for the two columns being compared were equivalent. The loop works fine but I do not understand why it tests the first column of the sample IDs providing ?NA? for the sum of matching when my loop is specifying to only test columns 2-3. Thank you! #create dataframe A A = matrix(c("a",3,4,"b",5,7,"c",3,7),nrow=3, ncol=3,byrow = TRUE) A <- as.data.frame(A) A$V2 <- as.numeric(A$V2) A$V3 <- as.numeric(A$V3) str(A) #create dataframe B B = matrix(c("a",1,1,"b",6,2,"c",2,2),nrow=3, ncol=3,byrow = TRUE) B <- as.data.frame(B) B$V2 <- as.numeric(B$V2) B$V3 <- as.numeric(B$V3) str(B) results.2 <- numeric() results.3 <- numeric() #compare columns to identify those that are identical in the two dataframes for(i in 2:3){ results.2[i] <- sum(A[,2]==B[,i]) results.3[i] <- sum(A[,3]==B[,i]) results.pc.all <- rbind(results.2,results.3) } results.pc.all ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thank you!> On Jun 20, 2016, at 12:41 PM, David L Carlson <dcarlson at tamu.edu> wrote: > > It does not test the first column, but a vector must have consecutive indices. Since you did not assign a value, R inserts a missing value. If you don't want to see it use > >> results.pc.all[, -1] > [,1] [,2] > results.2 1 2 > results.3 2 3 > > ------------------------------------- > David L Carlson > Department of Anthropology > Texas A&M University > College Station, TX 77840-4352 > > -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Brittany Demmitt > Sent: Monday, June 20, 2016 12:15 PM > To: r-help at r-project.org > Subject: [R] loop testing unidentified columns > > Hello, > > I want to compare all of the columns of one data frame to another to see if any of the columns are equivalent to one another. The first column in both of my data frames are the sample IDs and do not need to be compared. Below is an example of the loop I am using to compare the two data frames that counts the number of equivalent values there between two columns. So in this example the value of 3 means that all three observations for the two columns being compared were equivalent. The loop works fine but I do not understand why it tests the first column of the sample IDs providing ?NA? for the sum of matching when my loop is specifying to only test columns 2-3. > > Thank you! > > > #create dataframe A > A = matrix(c("a",3,4,"b",5,7,"c",3,7),nrow=3, ncol=3,byrow = TRUE) > A <- as.data.frame(A) > A$V2 <- as.numeric(A$V2) > A$V3 <- as.numeric(A$V3) > str(A) > > #create dataframe B > B = matrix(c("a",1,1,"b",6,2,"c",2,2),nrow=3, ncol=3,byrow = TRUE) > B <- as.data.frame(B) > B$V2 <- as.numeric(B$V2) > B$V3 <- as.numeric(B$V3) > str(B) > > results.2 <- numeric() > results.3 <- numeric() > > > #compare columns to identify those that are identical in the two dataframes > for(i in 2:3){ > results.2[i] <- sum(A[,2]==B[,i]) > results.3[i] <- sum(A[,3]==B[,i]) > results.pc.all <- rbind(results.2,results.3) > } > results.pc.all > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.