Emmanuel Levy
2012-Dec-27 20:30 UTC
[R] Finding (swapped) repetitions of numbers pairs across two columns
Hi, I've had this problem for a while and tackled it is a quite dirty way so I'm wondering is a better solution exists: If we have two vectors: v1 = c(0,1,2,3,4) v2 = c(5,3,2,1,0) How to remove one instance of the "3,1" / "1,3" double? At the moment I'm using the following solution, which is quite horrible: v1 = c(0,1,2,3,4) v2 = c(5,3,2,1,0) ft <- cbind(v1, v2) direction = apply( ft, 1, function(x) return(x[1]>x[2])) ft.tmp = ft ft[which(direction),1] = ft.tmp[which(direction),2] ft[which(direction),2] = ft.tmp[which(direction),1] uniques = apply( ft, 1, function(x) paste(x, collapse="%") ) uniques = unique(uniques) ft.unique = matrix(unlist(strsplit(uniques,"%")), ncol=2, byrow=TRUE) Any better solution would be very welcome! All the best, Emmanuel
Marc Schwartz
2012-Dec-27 20:39 UTC
[R] Finding (swapped) repetitions of numbers pairs across two columns
On Dec 27, 2012, at 2:30 PM, Emmanuel Levy <emmanuel.levy at gmail.com> wrote:> Hi, > > I've had this problem for a while and tackled it is a quite dirty way > so I'm wondering is a better solution exists: > > If we have two vectors: > > v1 = c(0,1,2,3,4) > v2 = c(5,3,2,1,0) > > How to remove one instance of the "3,1" / "1,3" double? > > At the moment I'm using the following solution, which is quite horrible: > > v1 = c(0,1,2,3,4) > v2 = c(5,3,2,1,0) > ft <- cbind(v1, v2) > direction = apply( ft, 1, function(x) return(x[1]>x[2])) > ft.tmp = ft > ft[which(direction),1] = ft.tmp[which(direction),2] > ft[which(direction),2] = ft.tmp[which(direction),1] > uniques = apply( ft, 1, function(x) paste(x, collapse="%") ) > uniques = unique(uniques) > ft.unique = matrix(unlist(strsplit(uniques,"%")), ncol=2, byrow=TRUE) > > > Any better solution would be very welcome! > > All the best, > > EmmanuelTry this:> unique(t(apply(cbind(v1, v2), 1, sort)))[,1] [,2] [1,] 0 5 [2,] 1 3 [3,] 2 2 [4,] 0 4 Basically, sort each row so that you don't have to worry about the permutations of values, then get the unique rows as a result. Regards, Marc Schwartz
Emmanuel Levy
2012-Dec-27 20:48 UTC
[R] Finding (swapped) repetitions of numbers pairs across two columns
I did not know that unique worked on entire rows! That is great, thank you very much! Emmanuel On 27 December 2012 22:39, Marc Schwartz <marc_schwartz at me.com> wrote:> unique(t(apply(cbind(v1, v2), 1, sort)))
Marc Schwartz
2012-Dec-27 20:59 UTC
[R] Finding (swapped) repetitions of numbers pairs across two columns
Yep. There are methods for:> methods(unique)[1] unique.array unique.data.frame unique.default [4] unique.matrix unique.numeric_version unique.POSIXlt and for the matrix and data.frame methods, unique rows will be returned by default. For array and matrix objects, you can change the MARGIN argument to a different value (eg. 2 for columns, etc.). See ?unique for more information, notably the Details and Value sections. Marc On Dec 27, 2012, at 2:48 PM, Emmanuel Levy <emmanuel.levy at gmail.com> wrote:> I did not know that unique worked on entire rows! > > That is great, thank you very much! > > Emmanuel > > > On 27 December 2012 22:39, Marc Schwartz <marc_schwartz at me.com> wrote: >> unique(t(apply(cbind(v1, v2), 1, sort)))
arun
2012-Dec-28 02:49 UTC
[R] Finding (swapped) repetitions of numbers pairs across two columns
Hi, You could also use: apply(cbind(v1,v2),1,function(x) x[order(x)]) #or unique(t(apply(cbind(v1,v2),1,sort.int,method="quick"))) By comparing different methods: set.seed(51) v1<-sample(0:9,1e5,replace=TRUE) set.seed(49) v2<-sample(0:9,1e5,replace=TRUE) system.time(res1<-unique(t(apply(cbind(v1, v2), 1, sort)))) # user? system elapsed # 11.373?? 0.188? 11.918 system.time(res2<-unique(t(apply(cbind(v1,v2),1,sort.int,method="quick")))) #?? user? system elapsed #? 7.088?? 0.120?? 7.446 ?identical(res1,res2) #[1] TRUE ?system.time(res3 <- unique(t(apply(cbind(v1,v2),1,function(x) x[order(x)])))) #found to be faster #?? user? system elapsed #? 2.693?? 0.072?? 2.857 ?identical(res1,res3) #[1] TRUE A.K. ----- Original Message ----- From: Emmanuel Levy <emmanuel.levy at gmail.com> To: R-help Mailing List <r-help at r-project.org> Cc: Sent: Thursday, December 27, 2012 3:30 PM Subject: [R] Finding (swapped) repetitions of numbers pairs across two columns Hi, I've had this problem for a while and tackled it is a quite dirty way so I'm wondering is a better solution exists: If we have two vectors: v1 = c(0,1,2,3,4) v2 = c(5,3,2,1,0) How to remove one instance of the "3,1" / "1,3" double? At the moment I'm using the following solution, which is quite horrible: v1 = c(0,1,2,3,4) v2 = c(5,3,2,1,0) ft <- cbind(v1, v2) direction = apply( ft, 1, function(x) return(x[1]>x[2])) ft.tmp = ft ft[which(direction),1] = ft.tmp[which(direction),2] ft[which(direction),2] = ft.tmp[which(direction),1] uniques? ? = apply( ft, 1, function(x) paste(x, collapse="%") ) uniques? ? = unique(uniques) ft.unique? = matrix(unlist(strsplit(uniques,"%")), ncol=2, byrow=TRUE) Any better solution would be very welcome! All the best, Emmanuel ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Possibly Parallel Threads
- a question about swap space, memory and read.table()
- group bunch of lines in a data.frame, an additional requirement
- reshape is re-ordering my variables
- How to convert an ftable object to a matrix including the row names?
- Retrieve indexes of the "first occurrence of numbers" in an effective manner