Dear all, I have a data frame 144 x 20000 values. I need to take every value in the first row and compare to the second row, and the same for rows 3-4 and 5-6 and so on. the output should be one line for each of the two row comparison. the comparison is: if row1==1 and row2==1 <-'HT' if row1==1 and row2==0 <-'A' if row1==0 and row2==1 <-'B' if row1==1 and row2=='-' <-'Aht' if row1=='-' and row2==1 <-'Bht' for example: if the data is: CloneID genotype 2001 genotype 2002 genotype 2003 2471250 1 1 1 2471250 0 0 0 2433062 0 0 0 2433062 1 1 1 100021605 1 1 0 100021605 1 0 1 100005599 1 1 0 100005599 1 1 1 100002798 1 1 0 100002798 1 1 1 then the output should be: CloneID genotype 2001 genotype 2002 genotype 2003 2471250 A A A 2433062 B B B 100021605 HT A B 100005599 HT HT B 100002798 HT HT B I tried this for the whole data, but its so slow: AX <- data.frame(lapply(AX, as.character), stringsAsFactors=FALSE) for (i in seq(1,nrow(AX),by=2)){ for (j in 6:144){ if (AX[i,j]==1 & AX[i+1,j]==0){ AX[i,j]<-'A' } if (AX[i,j]==0 & AX[i+1,j]==1){ AX[i,j]<-'B' } if (AX[i,j]==1 & AX[i+1,j]==1){ AX[i,j]<-'HT' } if (AX[i,j]==1 & AX[i+1,j]=="-"){ AX[i,j]<-'Aht' } if (AX[i,j]=="-" & AX[i+1,j]==1){ AX[i,j]<-'Bht' } } } AX1<-AX[!duplicated(AX[,3]),] AX2<-AX[duplicated(AX[,3]),] Thanks for any help, Raz -- \m/ [[alternative HTML version deleted]]
Gerrit Eichner
2014-Aug-04 10:47 UTC
[R] Compare data in two rows and replace objects in data frame
Hello, Raz, if X is the data frame that contains your data, then using sort of an "indexing trick" to circumvent your numerous if-statements as in aggregate( X[ c( "genotype 2001", "genotype 2002", "genotype 2003")], X[ "CloneID"], FUN = function( x) c( "11" = "HT", "10" = "A", "01" = "B", "1-" = "Aht", "-1" = "Bht")[ paste( x, collapse = "")]) presumably does what you want (and can certainly be improved). Hth -- Gerrit --------------------------------------------------------------------- Dr. Gerrit Eichner Mathematical Institute, Room 212 gerrit.eichner at math.uni-giessen.de Justus-Liebig-University Giessen Tel: +49-(0)641-99-32104 Arndtstr. 2, 35392 Giessen, Germany Fax: +49-(0)641-99-32109 http://www.uni-giessen.de/cms/eichner --------------------------------------------------------------------- On Mon, 4 Aug 2014, raz wrote:> Dear all, > > I have a data frame 144 x 20000 values. > I need to take every value in the first row and compare to the second row, > and the same for rows 3-4 and 5-6 and so on. > the output should be one line for each of the two row comparison. > the comparison is: > if row1==1 and row2==1 <-'HT' > if row1==1 and row2==0 <-'A' > if row1==0 and row2==1 <-'B' > if row1==1 and row2=='-' <-'Aht' > if row1=='-' and row2==1 <-'Bht' > > for example: > if the data is: > CloneID genotype 2001 genotype 2002 genotype 2003 > 2471250 1 1 1 > 2471250 0 0 0 > 2433062 0 0 0 > 2433062 1 1 1 > 100021605 1 1 0 > 100021605 1 0 1 > 100005599 1 1 0 > 100005599 1 1 1 > 100002798 1 1 0 > 100002798 1 1 1 > > then the output should be: > CloneID genotype 2001 genotype 2002 genotype 2003 > 2471250 A A A > 2433062 B B B > 100021605 HT A B > 100005599 HT HT B > 100002798 HT HT B > > I tried this for the whole data, but its so slow: > > AX <- data.frame(lapply(AX, as.character), stringsAsFactors=FALSE) > > > for (i in seq(1,nrow(AX),by=2)){ > for (j in 6:144){ > if (AX[i,j]==1 & AX[i+1,j]==0){ > AX[i,j]<-'A' > } > if (AX[i,j]==0 & AX[i+1,j]==1){ > AX[i,j]<-'B' > } > if (AX[i,j]==1 & AX[i+1,j]==1){ > AX[i,j]<-'HT' > } > if (AX[i,j]==1 & AX[i+1,j]=="-"){ > AX[i,j]<-'Aht' > } > if (AX[i,j]=="-" & AX[i+1,j]==1){ > AX[i,j]<-'Bht' > } > } > } > > AX1<-AX[!duplicated(AX[,3]),] > AX2<-AX[duplicated(AX[,3]),] > > Thanks for any help, > > Raz > > > > -- > \m/ > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
You could try data.table #dat is the dataset library(data.table) v1 <- setNames(c("HT", "A", "B", "Aht", "Bht"), c("11", "10", "01", "1-", "-1")) dat2 <- setDT(dat1)[, lapply(.SD, function(x) v1[paste(x, collapse="")]), by=CloneID] A.K. On Monday, August 4, 2014 5:55 AM, raz <barvazduck at gmail.com> wrote: Dear all, I have a data frame 144 x 20000 values. I need to take every value in the first row and compare to the second row, and the same for rows 3-4 and 5-6 and so on. the output should be one line for each of the two row comparison. the comparison is: if row1==1 and row2==1 <-'HT' if row1==1 and row2==0 <-'A' if row1==0 and row2==1 <-'B' if row1==1 and row2=='-' <-'Aht' if row1=='-' and row2==1 <-'Bht' for example: if the data is: CloneID? ? genotype 2001? ? genotype 2002? ? genotype 2003 2471250? ? 1? ? 1? ? 1 2471250? ? 0? ? 0? ? 0 2433062? ? 0? ? 0? ? 0 2433062? ? 1? ? 1? ? 1 100021605? ? 1? ? 1? ? 0 100021605? ? 1? ? 0? ? 1 100005599? ? 1? ? 1? ? 0 100005599? ? 1? ? 1? ? 1 100002798? ? 1? ? 1? ? 0 100002798? ? 1? ? 1? ? 1 then the output should be: CloneID? ? genotype 2001? ? genotype 2002? ? genotype 2003 2471250? ? A? ? A? ? A 2433062? ? B? ? B? ? B 100021605? ? HT? ? A? ? B 100005599? ? HT? ? HT? ? B 100002798? ? HT? ? HT? ? B I tried this for the whole data, but its so slow: AX <- data.frame(lapply(AX, as.character), stringsAsFactors=FALSE) for (i in seq(1,nrow(AX),by=2)){ for (j in 6:144){ if (AX[i,j]==1 & AX[i+1,j]==0){ AX[i,j]<-'A' } if (AX[i,j]==0 & AX[i+1,j]==1){ AX[i,j]<-'B' } if (AX[i,j]==1 & AX[i+1,j]==1){ AX[i,j]<-'HT' } if (AX[i,j]==1 & AX[i+1,j]=="-"){ AX[i,j]<-'Aht' } if (AX[i,j]=="-" & AX[i+1,j]==1){ AX[i,j]<-'Bht' } } } AX1<-AX[!duplicated(AX[,3]),] AX2<-AX[duplicated(AX[,3]),] Thanks for any help, Raz -- \m/ ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
John McKown
2014-Aug-04 18:21 UTC
[R] Compare data in two rows and replace objects in data frame
On Mon, Aug 4, 2014 at 4:53 AM, raz <barvazduck at gmail.com> wrote:> Dear all, > > I have a data frame 144 x 20000 values. > I need to take every value in the first row and compare to the second row, > and the same for rows 3-4 and 5-6 and so on. > the output should be one line for each of the two row comparison. > the comparison is: > if row1==1 and row2==1 <-'HT' > if row1==1 and row2==0 <-'A' > if row1==0 and row2==1 <-'B' > if row1==1 and row2=='-' <-'Aht' > if row1=='-' and row2==1 <-'Bht' > > for example: > if the data is: > CloneID genotype 2001 genotype 2002 genotype 2003 > 2471250 1 1 1 > 2471250 0 0 0 > 2433062 0 0 0 > 2433062 1 1 1 > 100021605 1 1 0 > 100021605 1 0 1 > 100005599 1 1 0 > 100005599 1 1 1 > 100002798 1 1 0 > 100002798 1 1 1 > > then the output should be: > CloneID genotype 2001 genotype 2002 genotype 2003 > 2471250 A A A > 2433062 B B B > 100021605 HT A B > 100005599 HT HT B > 100002798 HT HT B > > I tried this for the whole data, but its so slow: > > AX <- data.frame(lapply(AX, as.character), stringsAsFactors=FALSE) > > > for (i in seq(1,nrow(AX),by=2)){ > for (j in 6:144){ > if (AX[i,j]==1 & AX[i+1,j]==0){ > AX[i,j]<-'A' > } > if (AX[i,j]==0 & AX[i+1,j]==1){ > AX[i,j]<-'B' > } > if (AX[i,j]==1 & AX[i+1,j]==1){ > AX[i,j]<-'HT' > } > if (AX[i,j]==1 & AX[i+1,j]=="-"){ > AX[i,j]<-'Aht' > } > if (AX[i,j]=="-" & AX[i+1,j]==1){ > AX[i,j]<-'Bht' > } > } > } > > AX1<-AX[!duplicated(AX[,3]),] > AX2<-AX[duplicated(AX[,3]),] > > Thanks for any help, > > RazI don't know if you've received a solution as yet. Below is my generic solution. I don't know how fast it will be, but it does _NOT_ do any looping. It does do a few if functions. The result is in the variable new_data. The variables data_odd and data_even are temporaries which can be removed. Or you can wrap the code up in a function which returns new_data and they will simply "go away" when the function ends. # # Read in the data data <- read.csv(file="data.csv",header=TRUE,stringsAsFactors=FALSE); # # The criteria #if row1==1 and row2==1 <-'HT' #if row1==1 and row2==0 <-'A' #if row1==0 and row2==1 <-'B' #if row1==1 and row2=='-' <-'Aht' #if row1=='-' and row2==1 <-'Bht' # # The following assumes that data is properly ordered! data$rowNumber <- seq(1:nrow(data)); data_odd <-data[data$rowNumber %% 2 == 1,]; data_even <-data[data$rowNumber %% 2 == 0,]; # # You really need to make sure that # the CloneID values are correct in data_odd # and data_even. Something like: stopifnot(data_odd$CloneID == data_even$CloneID); CloneIDs <- data_even[,1]; # Get the list of CloneIDs #data_even[,1] <- NULL; # Remove CloneIDs from even data #data_odd[,1] <- NULL; # And also from odd data # # Initialize new_data - make everything NA so # it will stick out later! new_data <- data_even; new_data[,colnames(data_even)] <- NA; # new_data[data_odd == 1 & data_odd ==1] <- 'HT'; new_data[data_odd == 1 & data_even == 0] <- 'A'; new_data[data_odd == 0 & data_even == 1] <- 'B'; new_data[data_odd == 1 & data_even == '.'] <- 'Aht'; new_data[data_odd == '-' & data_even == 1] <- 'Bht'; new_data$CloneID <- CloneIDs; new_data$rowNumber<-NULL; # #stopifnot( !is.na(new_data)); # Make sure no NAs left -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! <>< John McKown