kayj
2008-Aug-21 14:59 UTC
[R] problem merging two data sets ( one with a header and one without)
I have two set of data, Data1 and Data2 . Data1 has a header and Data2 does not. I would like to merge the two data sets after removing some columns from data2 . I am having a problem merging so I had to write and read final data and specify the ?header=F? so the merge can be done by?V1?. Is there a way to avoid this step. The problem is when I do cbind the FinalData has different column names Data1<-read.table("data1.txt", sep='\t', header=F, stringsAsFactors=F) Data2<-read.table("data2.txt", sep='\t', header=T, stringsAsFactors=F) P1<-cbind(Data2[,2]) P2<-cbind(Data2[,5:30]) FinalData<-cbind(P1,P2) write.table(FinalData ,file="FinalData.txt", sep='\t', quote=F, col.names=F, row.names=F) Data3<-read.table("FinalData.txt", sep='\t', header=F, stringsAsFactors=F) m<-merge(Data1,Data3, by="V1") -- View this message in context: http://www.nabble.com/problem-merging-two-data-sets-%28-one-with-a-header-and-one-without%29-tp19090134p19090134.html Sent from the R help mailing list archive at Nabble.com.
Don MacQueen
2008-Aug-21 20:35 UTC
[R] problem merging two data sets ( one with a header and one without)
merge() has by.x and by.y arguments. If you use them, you can merge data frames that have different column names. You can specify columns by name or by number. This is mentioned in the help for merge. Try merge(Data1, Data2, by.x=1, by.y=2) which will keep all of the columns in Data2, or merge(Data1, Data2[ ,c(2,5:30)] , by=1 ) if you must remove columns 1, 4, and 5 from Data2. Alternately, since merge() works on common variable names, all you have to do is make sure that the single column you want to use for the merge has the same name in both of them, and that it is the only column with the same name in both. Thus, another way to do the merge would be names(Data2)[2] <- 'V1' merge( Data1, Data2) or names(Data2)[2] <- 'V1' merge( Data1, Data2[ , c(2,5:30)] ) While I'm at it, using cbind() is unnecessary. You can replace P1<-cbind(Data2[,2]) P2<-cbind(Data2[,5:30]) FinalData<-cbind(P1,P2) with FinalData <- Data2[, c(2,5:30)] But even more unnecessary is the cbind() in P1<-cbind(Data2[,2]) all that is needed is P1<- Data2[,2] I personally think you're better off if you do not change the names of FinalData, but if you do, it's easier this way: names(FinalData) <- paste('V',1:27,sep='') Or more generally names(FinalData) <- paste('V', seq(ncol(FinalData)), sep='') By the way, although your text file data2.txt does not have a header, your dataframe Data2 does have a "header". That is, it has column names V1, V2, and so on. -Don At 7:59 AM -0700 8/21/08, kayj wrote:>I have two set of data, Data1 and Data2 . Data1 has a header and Data2 does >not. I would like to merge the two data sets after removing some columns >from data2 . > >I am having a problem merging so I had to write and read final data and >specify the "header=F" so the merge can be done by"V1". Is there a way to >avoid this step. The problem is when I do cbind the FinalData has different >column names > > > >Data1<-read.table("data1.txt", sep='\t', header=F, stringsAsFactors=F) > >Data2<-read.table("data2.txt", sep='\t', header=T, stringsAsFactors=F) > >P1<-cbind(Data2[,2]) >P2<-cbind(Data2[,5:30]) >FinalData<-cbind(P1,P2) >write.table(FinalData ,file="FinalData.txt", sep='\t', quote=F, col.names=F, >row.names=F) > >Data3<-read.table("FinalData.txt", sep='\t', header=F, stringsAsFactors=F) >m<-merge(Data1,Data3, by="V1") > > >-- >View this message in context: http:// www. >nabble.com/problem-merging-two-data-sets-%28-one-with-a-header-and-one-without%29-tp19090134p19090134.html >Sent from the R help mailing list archive at Nabble.com. > >______________________________________________ >R-help at r-project.org mailing list >https:// stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http:// www. R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- --------------------------------- Don MacQueen Lawrence Livermore National Laboratory Livermore, CA, USA 925-423-1062 macq at llnl.gov