thr3ads.net - R help - [R] Compare data in two rows and replace objects in data frame [Aug 2014]

If this information is useful, please help other people find it:
Share via:

raz

2014-Aug-04 09:53 UTC

[R] Compare data in two rows and replace objects in data frame

Dear all,

I have a data frame 144 x 20000 values.
I need to take every value in the first row and compare to the second row,
and the same for rows 3-4 and 5-6 and so on.
the output should be one line for each of the two row comparison.
the comparison is:
if row1==1 and row2==1 <-'HT'
if row1==1 and row2==0 <-'A'
if row1==0 and row2==1 <-'B'
if row1==1 and row2=='-' <-'Aht'
if row1=='-' and row2==1 <-'Bht'

for example:
if the data is:
CloneID    genotype 2001    genotype 2002    genotype 2003
2471250    1    1    1
2471250    0    0    0
2433062    0    0    0
2433062    1    1    1
100021605    1    1    0
100021605    1    0    1
100005599    1    1    0
100005599    1    1    1
100002798    1    1    0
100002798    1    1    1

then the output should be:
CloneID    genotype 2001    genotype 2002    genotype 2003
2471250    A    A    A
2433062    B    B    B
100021605    HT    A    B
100005599    HT    HT    B
100002798    HT    HT    B

I tried this for the whole data, but its so slow:

AX <- data.frame(lapply(AX, as.character), stringsAsFactors=FALSE)


for (i in seq(1,nrow(AX),by=2)){
for (j in 6:144){
if (AX[i,j]==1 & AX[i+1,j]==0){
AX[i,j]<-'A'
}
if (AX[i,j]==0 & AX[i+1,j]==1){
AX[i,j]<-'B'
}
if (AX[i,j]==1 & AX[i+1,j]==1){
AX[i,j]<-'HT'
}
if (AX[i,j]==1 & AX[i+1,j]=="-"){
AX[i,j]<-'Aht'
}
if (AX[i,j]=="-" & AX[i+1,j]==1){
AX[i,j]<-'Bht'
}
}
}

AX1<-AX[!duplicated(AX[,3]),]
AX2<-AX[duplicated(AX[,3]),]

Thanks for any help,

Raz



-- 
\m/

	[[alternative HTML version deleted]]

Gerrit Eichner

2014-Aug-04 10:47 UTC

head link

[R] Compare data in two rows and replace objects in data frame

Hello, Raz,

if X is the data frame that contains your data, then using sort of an 
"indexing trick" to circumvent your numerous if-statements as in

aggregate( X[ c( "genotype 2001", "genotype 2002",
"genotype 2003")],
            X[ "CloneID"],
            FUN = function( x)
                   c( "11" = "HT",
                      "10" = "A",
                      "01" = "B",
                      "1-" = "Aht",
                      "-1" = "Bht")[ paste( x, collapse =
"")])

presumably does what you want (and can certainly be improved).

Hth  --  Gerrit

---------------------------------------------------------------------
Dr. Gerrit Eichner                   Mathematical Institute, Room 212
gerrit.eichner at math.uni-giessen.de   Justus-Liebig-University Giessen
Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
Fax: +49-(0)641-99-32109        http://www.uni-giessen.de/cms/eichner
---------------------------------------------------------------------

On Mon, 4 Aug 2014, raz wrote:
> Dear all,
>
> I have a data frame 144 x 20000 values.
> I need to take every value in the first row and compare to the second row,
> and the same for rows 3-4 and 5-6 and so on.
> the output should be one line for each of the two row comparison.
> the comparison is:
> if row1==1 and row2==1 <-'HT'
> if row1==1 and row2==0 <-'A'
> if row1==0 and row2==1 <-'B'
> if row1==1 and row2=='-' <-'Aht'
> if row1=='-' and row2==1 <-'Bht'
>
> for example:
> if the data is:
> CloneID    genotype 2001    genotype 2002    genotype 2003
> 2471250    1    1    1
> 2471250    0    0    0
> 2433062    0    0    0
> 2433062    1    1    1
> 100021605    1    1    0
> 100021605    1    0    1
> 100005599    1    1    0
> 100005599    1    1    1
> 100002798    1    1    0
> 100002798    1    1    1
>
> then the output should be:
> CloneID    genotype 2001    genotype 2002    genotype 2003
> 2471250    A    A    A
> 2433062    B    B    B
> 100021605    HT    A    B
> 100005599    HT    HT    B
> 100002798    HT    HT    B
>
> I tried this for the whole data, but its so slow:
>
> AX <- data.frame(lapply(AX, as.character), stringsAsFactors=FALSE)
>
>
> for (i in seq(1,nrow(AX),by=2)){
> for (j in 6:144){
> if (AX[i,j]==1 & AX[i+1,j]==0){
> AX[i,j]<-'A'
> }
> if (AX[i,j]==0 & AX[i+1,j]==1){
> AX[i,j]<-'B'
> }
> if (AX[i,j]==1 & AX[i+1,j]==1){
> AX[i,j]<-'HT'
> }
> if (AX[i,j]==1 & AX[i+1,j]=="-"){
> AX[i,j]<-'Aht'
> }
> if (AX[i,j]=="-" & AX[i+1,j]==1){
> AX[i,j]<-'Bht'
> }
> }
> }
>
> AX1<-AX[!duplicated(AX[,3]),]
> AX2<-AX[duplicated(AX[,3]),]
>
> Thanks for any help,
>
> Raz
>
>
>
> -- 
> \m/
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

arun

2014-Aug-04 11:34 UTC

head link

[R] Compare data in two rows and replace objects in data frame

You could try data.table

#dat is the dataset


library(data.table)
v1 <- setNames(c("HT", "A", "B",
"Aht", "Bht"), c("11", "10",
"01", "1-", "-1"))
dat2 <- setDT(dat1)[, lapply(.SD, function(x) v1[paste(x,
collapse="")]), by=CloneID]

A.K.




On Monday, August 4, 2014 5:55 AM, raz <barvazduck at gmail.com> wrote:
Dear all,

I have a data frame 144 x 20000 values.
I need to take every value in the first row and compare to the second row,
and the same for rows 3-4 and 5-6 and so on.
the output should be one line for each of the two row comparison.
the comparison is:
if row1==1 and row2==1 <-'HT'
if row1==1 and row2==0 <-'A'
if row1==0 and row2==1 <-'B'
if row1==1 and row2=='-' <-'Aht'
if row1=='-' and row2==1 <-'Bht'

for example:
if the data is:
CloneID? ? genotype 2001? ? genotype 2002? ? genotype 2003
2471250? ? 1? ? 1? ? 1
2471250? ? 0? ? 0? ? 0
2433062? ? 0? ? 0? ? 0
2433062? ? 1? ? 1? ? 1
100021605? ? 1? ? 1? ? 0
100021605? ? 1? ? 0? ? 1
100005599? ? 1? ? 1? ? 0
100005599? ? 1? ? 1? ? 1
100002798? ? 1? ? 1? ? 0
100002798? ? 1? ? 1? ? 1

then the output should be:
CloneID? ? genotype 2001? ? genotype 2002? ? genotype 2003
2471250? ? A? ? A? ? A
2433062? ? B? ? B? ? B
100021605? ? HT? ? A? ? B
100005599? ? HT? ? HT? ? B
100002798? ? HT? ? HT? ? B

I tried this for the whole data, but its so slow:

AX <- data.frame(lapply(AX, as.character), stringsAsFactors=FALSE)


for (i in seq(1,nrow(AX),by=2)){
for (j in 6:144){
if (AX[i,j]==1 & AX[i+1,j]==0){
AX[i,j]<-'A'
}
if (AX[i,j]==0 & AX[i+1,j]==1){
AX[i,j]<-'B'
}
if (AX[i,j]==1 & AX[i+1,j]==1){
AX[i,j]<-'HT'
}
if (AX[i,j]==1 & AX[i+1,j]=="-"){
AX[i,j]<-'Aht'
}
if (AX[i,j]=="-" & AX[i+1,j]==1){
AX[i,j]<-'Bht'
}
}
}

AX1<-AX[!duplicated(AX[,3]),]
AX2<-AX[duplicated(AX[,3]),]

Thanks for any help,

Raz



-- 
\m/

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

John McKown

2014-Aug-04 18:21 UTC

head link

[R] Compare data in two rows and replace objects in data frame

On Mon, Aug 4, 2014 at 4:53 AM, raz <barvazduck at gmail.com>
wrote:> Dear all,
>
> I have a data frame 144 x 20000 values.
> I need to take every value in the first row and compare to the second row,
> and the same for rows 3-4 and 5-6 and so on.
> the output should be one line for each of the two row comparison.
> the comparison is:
> if row1==1 and row2==1 <-'HT'
> if row1==1 and row2==0 <-'A'
> if row1==0 and row2==1 <-'B'
> if row1==1 and row2=='-' <-'Aht'
> if row1=='-' and row2==1 <-'Bht'
>
> for example:
> if the data is:
> CloneID    genotype 2001    genotype 2002    genotype 2003
> 2471250    1    1    1
> 2471250    0    0    0
> 2433062    0    0    0
> 2433062    1    1    1
> 100021605    1    1    0
> 100021605    1    0    1
> 100005599    1    1    0
> 100005599    1    1    1
> 100002798    1    1    0
> 100002798    1    1    1
>
> then the output should be:
> CloneID    genotype 2001    genotype 2002    genotype 2003
> 2471250    A    A    A
> 2433062    B    B    B
> 100021605    HT    A    B
> 100005599    HT    HT    B
> 100002798    HT    HT    B
>
> I tried this for the whole data, but its so slow:
>
> AX <- data.frame(lapply(AX, as.character), stringsAsFactors=FALSE)
>
>
> for (i in seq(1,nrow(AX),by=2)){
> for (j in 6:144){
> if (AX[i,j]==1 & AX[i+1,j]==0){
> AX[i,j]<-'A'
> }
> if (AX[i,j]==0 & AX[i+1,j]==1){
> AX[i,j]<-'B'
> }
> if (AX[i,j]==1 & AX[i+1,j]==1){
> AX[i,j]<-'HT'
> }
> if (AX[i,j]==1 & AX[i+1,j]=="-"){
> AX[i,j]<-'Aht'
> }
> if (AX[i,j]=="-" & AX[i+1,j]==1){
> AX[i,j]<-'Bht'
> }
> }
> }
>
> AX1<-AX[!duplicated(AX[,3]),]
> AX2<-AX[duplicated(AX[,3]),]
>
> Thanks for any help,
>
> Raz
I don't know if you've received a solution as yet. Below is my generic
solution. I don't know how fast it will be, but it does _NOT_ do any
looping. It does do a few if functions. The result is in the variable
new_data. The variables data_odd and data_even are temporaries which
can be removed. Or you can wrap the code up in a function which
returns new_data and they will simply "go away" when the function
ends.

#
# Read in the data
data <-
read.csv(file="data.csv",header=TRUE,stringsAsFactors=FALSE);
#
# The criteria
#if row1==1 and row2==1 <-'HT'
#if row1==1 and row2==0 <-'A'
#if row1==0 and row2==1 <-'B'
#if row1==1 and row2=='-' <-'Aht'
#if row1=='-' and row2==1 <-'Bht'
#
# The following assumes that data is properly ordered!
data$rowNumber <- seq(1:nrow(data));
data_odd <-data[data$rowNumber %% 2 == 1,];
data_even <-data[data$rowNumber %% 2 == 0,];
#
# You really need to make sure that
# the CloneID values are correct in data_odd
# and data_even. Something like:
stopifnot(data_odd$CloneID == data_even$CloneID);
CloneIDs <- data_even[,1]; # Get the list of CloneIDs
#data_even[,1] <- NULL; # Remove CloneIDs from even data
#data_odd[,1] <- NULL;  # And also from odd data
#
# Initialize new_data - make everything NA so
# it will stick out later!
new_data <- data_even;
new_data[,colnames(data_even)] <- NA;
#
new_data[data_odd == 1 & data_odd ==1] <- 'HT';
new_data[data_odd == 1 & data_even == 0] <- 'A';
new_data[data_odd == 0 & data_even == 1] <- 'B';
new_data[data_odd == 1 & data_even == '.'] <- 'Aht';
new_data[data_odd == '-' & data_even == 1] <- 'Bht';
new_data$CloneID <- CloneIDs;
new_data$rowNumber<-NULL;
#
#stopifnot( !is.na(new_data)); # Make sure no NAs left




-- 
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! <><
John McKown

R help - Aug 2014 - Compare data in two rows and replace objects in data frame

[R] Compare data in two rows and replace objects in data frame

[R] Compare data in two rows and replace objects in data frame

[R] Compare data in two rows and replace objects in data frame

[R] Compare data in two rows and replace objects in data frame