thr3ads.net - R help - [R] inconsistent rows in a data frame [Sep 2006]

If this information is useful, please help other people find it:
Share via:

Gamal Azim

2006-Sep-19 19:19 UTC

[R] inconsistent rows in a data frame

I need to identify repeated items in p$a with
different s and d entries on the same row, given that
the "0" items should not be considered in the
comparison. Here is an example:

1. Items 3 and 5 in p$a are repeated with different 
entries of s and d, should be removed. 

2. Item 2 was repeated twice but with a 0 once for s
on row 2 and a second time for d on row 6, hence 2
should be  excluded from the comparison. All items are
factor levels  and not necessarily numbers.
> p <- data.frame(a=c(1,2,3,4,5,2,3,5,3,5,3),s=c(0,0,0,2,4,3,2,4,0,0,4),
d=c(0,1,1,1,3,0,5,11,0,0,0)
)

for(i in 1:3) p[,i] <- factor(p[,i])
> p   a s  d
1  1 0  0
2  2 0  1
3  3 0  1
4  4 2  1
5  5 4  3
6  2 3  0
7  3 2  5
8  5 4 11
9  3 0  0
10 5 0  0
11 3 4  0

Here is my best effort, I don't like the efficiency
with large data frames! Actually,
efficiency is ridiculous with 800,000 rows!

is.unk <- function(x) {x == "0"}

p.tmp <- unique(p[,1:2])
p.tmp <- p.tmp[!is.unk(p.tmp[,1]) &
!is.unk(p.tmp[,2]),]       
dup.s <- p.tmp[duplicated(p.tmp[,1]), 1][,drop=T]

p.tmp <- unique(p[,c(1,3)])
p.tmp <- p.tmp[!is.unk(p.tmp[,1]) &
!is.unk(p.tmp[,2]),]
dup.d <- p.tmp[duplicated(p.tmp[,1]), 1][,drop=T]

dup.sd <- union(as.character(dup.d),
as.character(dup.s))
> row.names(p[is.element(p[,1],dup.sd),])[1] "3"  "5"  "7"  "8"  "9" 
"10" "11"

There must be more efficient ways, help please!!

Thanks

Apparently Analagous Threads

Search for more possibly parallel threads

R help - Sep 2006 - inconsistent rows in a data frame

[R] inconsistent rows in a data frame

Apparently Analagous Threads

Wisdom of the Ancients