At 12:35 PM 11/15/2002 -0500, you wrote:>Hi, all,
> I have a little problem to solve. I'd like to collapse rows which are
next
>to each other but have
> same value to one row. The following is an example.
>
>say x is a data frame like:
> X1 X2 X3 X4 X5
>a 1 0 0 0 1
>b 1 0 1 0 1
>c 1 0 0 0 1
>d 1 0 0 0 1
>e 1 0 0 0 1
>f 1 1 0 0 1
>g 1 1 0 0 1
>
>notice that a, c,d,e are the same. since c,d,e are next to each other, I
>will only use the middle
>one,i.e d. I will also keep a although it is the same as d.
>
>f,g , I will keep f or g.
>
>so the ideal output is:
>a 1 0 0 0 1
>b 1 0 1 0 1
>d 1 0 0 0 1
>f 1 1 0 0 1
>
>Any idea how to do it? Thanks!!
I think the following will do what you seem to want (except that it keeps
the first of duplicated rows, i.e., of rows "c","d" &
"e", it keeps "c")
> x <- read.table(header=T,file=stdin(),nrow=7)
X1 X2 X3 X4 X5
a 1 0 0 0 1
b 1 0 1 0 1
c 1 0 0 0 1
d 1 0 0 0 1
e 1 0 0 0 1
f 1 1 0 0 1
g 1 1 0 0 1
> x[which(c(1,apply(apply(x, 2, diff), 1, any))!=0),,drop=FALSE]
X1 X2 X3 X4 X5
a 1 0 0 0 1
b 1 0 1 0 1
c 1 0 0 0 1
f 1 1 0 0 1
>
This expression depends on any() returning TRUE if any of the values it is
given are non-zero, which does seem to work for negative integers, and for
any finite floating-point number.
However, the expression above doesn't work if there are NA values -- for it
to work in the presence of NAs you need to replace "any" by something
like
"function(x) identical(all(x==0),TRUE)" (leaving the quotes off) (Are
there
any more elegant ways of expressing this?)
Hope this helps,
Tony Plate
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._