thr3ads.net - R help - [R] Removing duplicated rows within a matrix, with missing data as wildcards [Mar 2007]

If this information is useful, please help other people find it:
Share via:

stacey thompson

2007-Mar-08 15:14 UTC

[R] Removing duplicated rows within a matrix, with missing data as wildcards

I'd like to remove duplicated rows within a matrix, with missing data
being treated as wildcards.

For example
> x <- matrix((1:3), 5, 3)
> x[4,2] = NA
> x[3,3] = NA
> x
     [,1] [,2] [,3]
[1,]    1    3    2
[2,]    2    1    3
[3,]    3    2   NA
[4,]    1   NA    2
[5,]    2    1    3

I would like to obtain

      [,1] [,2] [,3]
[1,]    1    3    2
[2,]    2    1    3
[3,]    3    2   NA
>From the R-help archives, I learned about unique(x) and duplicated(x).However, unique(x) returns
> unique(x)
     [,1] [,2] [,3]
[1,]    1    3    2
[2,]    2    1    3
[3,]    3    2   NA
[4,]    1   NA    2

and duplicated(x) gives
> duplicated(x)
[1] FALSE FALSE FALSE FALSE  TRUE

I have tried various na.action 's but with unique(x) I get errors at best.

e.g.> unique(x, na.omit(x))
Error: argument 'incomparables != FALSE' is not used (yet)

How I might tackle this?

Thanks,

-stacey

-- 
-stacey lee thompson-
Stagiaire post-doctorale
Institut de recherche en biologie v?g?tale
Universit? de Montr?al
4101 Sherbrooke Est
Montr?al, Qu?bec H1X 2B2 Canada
stacey.thompson at umontreal.ca

Petr Pikal

2007-Mar-09 07:03 UTC

head link

[R] Removing duplicated rows within a matrix, with missing data as wildcards

Hi

its a bit tricky but

dup<-apply(x, 2, duplicated) #which are dupplucated
isna<-apply(x, 2, is.na) #which are na
check<-dup|isna # which are both

and here is your result

x[rowSums(check)!=3,]
     [,1] [,2] [,3]
[1,]    1    3    2
[2,]    2    1    3
[3,]    3    2   NA


Regards
Petr




On 8 Mar 2007 at 10:14, stacey thompson wrote:

Date sent:      	Thu, 8 Mar 2007 10:14:37 -0500
From:           	"stacey thompson" <stacey.lee.thompson at
gmail.com>
To:             	r-help at stat.math.ethz.ch
Subject:        	[R] Removing duplicated rows within a matrix,
	with missing data as wildcards
> I'd like to remove duplicated rows within a matrix, with missing data
> being treated as wildcards.
> 
> For example
> 
> > x <- matrix((1:3), 5, 3)
> > x[4,2] = NA
> > x[3,3] = NA
> > x
> 
>      [,1] [,2] [,3]
> [1,]    1    3    2
> [2,]    2    1    3
> [3,]    3    2   NA
> [4,]    1   NA    2
> [5,]    2    1    3
> 
> I would like to obtain
> 
>       [,1] [,2] [,3]
> [1,]    1    3    2
> [2,]    2    1    3
> [3,]    3    2   NA
> 
> >From the R-help archives, I learned about unique(x) and
> >duplicated(x).
> However, unique(x) returns
> 
> > unique(x)
> 
>      [,1] [,2] [,3]
> [1,]    1    3    2
> [2,]    2    1    3
> [3,]    3    2   NA
> [4,]    1   NA    2
> 
> and duplicated(x) gives
> 
> > duplicated(x)
> 
> [1] FALSE FALSE FALSE FALSE  TRUE
> 
> I have tried various na.action 's but with unique(x) I get errors at
> best.
> 
> e.g.
> > unique(x, na.omit(x))
> 
> Error: argument 'incomparables != FALSE' is not used (yet)
> 
> How I might tackle this?
> 
> Thanks,
> 
> -stacey
> 
> -- 
> -stacey lee thompson-
> Stagiaire post-doctorale
> Institut de recherche en biologie v?g?tale
> Universit? de Montr?al
> 4101 Sherbrooke Est
> Montr?al, Qu?bec H1X 2B2 Canada
> stacey.thompson at umontreal.ca
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.
Petr Pikal
petr.pikal at precheza.cz

Dimitris Rizopoulos

2007-Mar-09 15:14 UTC

head link

[R] Removing duplicated rows within a matrix, with missing data as wildcards

you could also try something like the following:

x <- matrix(c(1, NA, 3, NA, 2, 3, 1, 3, 2, 2, 1, 3, 1, NA, 2, 2, 1, 
3), ncol=3, byrow=TRUE)

wildcardVals <- 1:3 # possible wildcard values
ind <- complete.cases(x)
nc <- ncol(x)
nr <- nrow(x[ind, ])
nwld <- length(wildcardVals)
posb <- apply(x[!ind, , drop = FALSE], 1, function(y){
    out <- matrix(y, nwld, nc, by = TRUE)
    out[, is.na(y)] <- wildcardVals
    t(out)
})
posb <- matrix(c(posb), ncol = nc, by = TRUE)
keep.ind <- duplicated(rbind(x[ind, ], posb))
keep.ind[-(1:nr)] <- apply(matrix(keep.ind[-(1:nr)], nc = nwld, by = 
TRUE),
1, function(x) if(any(x)) rep(TRUE, length(x)) else x)
out <- rbind(x[ind, ], matrix(rep(x[!ind, ], each = nwld), nc = nc))
unique(out[!keep.ind, ])


I hope it works ok.

Best,
Dimitris

----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
     http://www.student.kuleuven.be/~m0390867/dimitris.htm

----- Original Message ----- 
From: "stacey thompson" <stacey.lee.thompson at gmail.com>
To: <hpages at fhcrc.org>; <r-help at stat.math.ethz.ch>
Cc: <petr.pikal at precheza.cz>
Sent: Friday, March 09, 2007 3:09 PM
Subject: Re: [R] Removing duplicated rows within a matrix,with missing 
data as wildcards

> Hi H.,
>
> Your response has improved the clarity of my thinking.  Kind thanks.
> Also, your use of seq_len() prompted me to update from R version 
> 2.3.1
> on this machine.
>
> For your matrix
>
> > x <- matrix(c(1, NA, 3, NA, 2, 3), ncol=3, byrow=TRUE)
> > x
>      [,1] [,2] [,3]
> [1,]    1   NA    3
> [2,]   NA    2    3
>
> I would want to delete either x[1,] or x[2,] but not both.
> Practically, your "removeLooseDupRows(x)"
>
> removeLooseDupRows <- function(x)
> {
>   if (nrow(x) <= 1)
>       return(x)
>   ii <- do.call("order",
>                 args=lapply(seq_len(ncol(x)),
>                             function(col) x[ , col]))
>   dup_index <- logical(nrow(x))
>   i0 <- -1
>   for (k in 1:length(ii)) {
>       i <- ii[k]
>       if (any(is.na(x[i, ]))) {
>           if (i0 == -1)
>               next
>           if (any(x[i, ] != x[i0, ], na.rm=TRUE))
>               next
>           dup_index[i] <- TRUE
>       } else {
>           i0 <- i
>       }
>   }
>   x[!dup_index, ]
> }
>
> should leave no such ambiguous cases for my data, as the nrow(x) are
> very high with few NA in each x.  For example, a row of (1, 2, 3) is
> very likely to exist in my data.
>
> However, to find the row numbers of any remaining ambiguous matches,
> should they exist, using example:
>
>> x <- matrix(c(1, NA, 3, NA, 2, 3, 1, 3, 2, 2, 1, 3, 1, NA, 2, 2, 1, 
>> 3), ncol=3, byrow=TRUE)
>> x
>     [,1] [,2] [,3]
> [1,]    1   NA    3
> [2,]   NA    2    3
> [3,]    1    3    2
> [4,]    2    1    3
> [5,]    1   NA    2
> [6,]    2    1    3
>
> after your suggested
>
>> removeLooseDupRows(x)
>     [,1] [,2] [,3]
> [1,]    1   NA    3
> [2,]   NA    2    3
> [3,]    1    3    2
> [4,]    2    1    3
> [5,]    2    1    3
>
>> q <- removeLooseDupRows(unique(x))
>> q
>     [,1] [,2] [,3]
> [1,]    1   NA    3
> [2,]   NA    2    3
> [3,]    1    3    2
> [4,]    2    1    3
>
> I could
>
>> # ambiguous matches in matrix form
>> apply(q, 1, function(row1) apply(q, 1, function(row2) 
>> all(is.na(row1) | is.na(row2) | row1==row2)))
>
>      [,1]  [,2]  [,3]  [,4]
> [1,]  TRUE  TRUE FALSE FALSE
> [2,]  TRUE  TRUE FALSE FALSE
> [3,] FALSE FALSE  TRUE FALSE
> [4,] FALSE FALSE FALSE  TRUE
>
>> # indices of ambiguous matches
>> m <- which(apply(q, 1, function(row1) apply(q, 1, function(row2) 
>> all(is.na(row1) | is.na(row2) | row1==row2))), arr=T)
>> m
>     row col
> [1,]   1   1
> [2,]   2   1
> [3,]   1   2
> [4,]   2   2
> [5,]   3   3
> [6,]   4   4
>
>> #put in order and omit duplicates
>> m2 <- unique(t(apply(m, 1, sort)))
>> m2
>     [,1] [,2]
> [1,]    1    1
> [2,]    1    2
> [3,]    2    2
> [4,]    3    3
> [5,]    4    4
>
>> # show the ambiguous matches
>> m2[m2[,1]!=m2[,2], drop=F]
> [1] 1 2
>
> ...and procede from there.
>
> This solution came from another helpful "R-help" respondant to my
> poorly-defined problem.
>
> Appreciative thanks to everyone for your instructive help.
>
> Cheers,
> stacey
>
> -- 
> -stacey lee thompson-
> Stagiaire post-doctorale
> Institut de recherche en biologie v?g?tale
> Universit? de Montr?al
> 4101 Sherbrooke Est
> Montr?al, Qu?bec H1X 2B2 Canada
> stacey.thompson at umontreal.ca
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Mar 2007 - Removing duplicated rows within a matrix, with missing data as wildcards

[R] Removing duplicated rows within a matrix, with missing data as wildcards

[R] Removing duplicated rows within a matrix, with missing data as wildcards

[R] Removing duplicated rows within a matrix, with missing data as wildcards

Possibly Parallel Threads