Note that: 1. Your solution returns a matrix, not a data frame (or Tibble) 2. Assuming that the order of the entries in the pairs does not matter (which your solution also assumes and seems reasonable given the OP's specification), I think that you'll find the following, which returns the data.frame, is considerably more efficient: x[!duplicated(cbind(do.call(pmin,x), do.call(pmax,x))),] For example:> x <- expand.grid(Source = 1:1000, Target = 1:1000)> system.time({y <- apply(x, 1, function(y) return (c(A=min(y), B=max(y)))) unique(t(y))}) user system elapsed 5.075 0.034 5.109> system.time({x[!duplicated(cbind(do.call(pmin, x), do.call(pmax, x))), ] }) user system elapsed 1.340 0.013 1.353 Still more efficient and still returning a data frame is: w <- x[,2] > x[,1] x[w,] <- x[w, 2:1] unique(x)> system.time({w <- x[, 2] > x[,1] x[w, ] <- x[w, 2:1] unique(x)}) user system elapsed 0.693 0.011 0.703 The efficiency gains are due to vectorization and the use of more efficient primitives. None of this may matter of course, but it seemed worth mentioning. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Aug 20, 2021 at 7:13 AM Greg Minshall <minshall at umich.edu> wrote:> > Eric, > > > x %>% transmute( a=pmin(Source,Target), b=pmax(Source,Target)) %>% > > unique() %>% rename(Source=a, Target=b) > > ah, very nice. i have trouble remembering, e.g., unique(). > > fwiw, (hopefully) here's a baser version. > ---- > x = data.frame(Source=rep(1:3,4), Target=c(rep(1,3),rep(2,3),rep(3,3),rep(4,3))) > > y <- apply(x, 1, function(y) return (c(A=min(y), B=max(y)))) > unique(t(y)) > ---- > > cheers, Greg > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Bert,> The efficiency gains are due to vectorization and the use of more > efficient primitives. None of this may matter of course, but it seemed > worth mentioning.thanks very much! the varieties of code, and disparities of performance, are truly wonderful. Rui's point that what works better for small n is not necessarily what will work better for large n is important to keep in [my] mind. as a "so-far" summary, here are some timings. the relevant code is below. ---- my apply user system elapsed 8.397 0.124 8.531 Bert's !duplicated user system elapsed 2.367 0.000 2.370 Bert's x[,2]>x[,1] user system elapsed 1.052 0.000 1.054 my a.d.f(unique(cbind(do.call))) user system elapsed 3.909 0.000 3.914 Eric Berger's unique(...pmin...pmax) user system elapsed 0.848 0.000 0.850 Eric Berger's transmuting tibble... user system elapsed 0.986 0.000 0.988 Kimmo Elo's [OP] mutating paste user system elapsed 52.079 0.000 52.136 Rui Barradas' sort-based user system elapsed 42.327 0.080 42.450 ---- cheers, Greg ---- n <- 1000 x <- expand.grid(Source = 1:n, Target = 1:n) cat("my apply\n") system.time({ y <- apply(x, 1, function(y) return (c(A=min(y), B=max(y)))) unique(t(y))}) # user system elapsed # 5.075 0.034 5.109 cat("Bert's !duplicated\n") system.time({ x[!duplicated(cbind(do.call(pmin, x), do.call(pmax, x))), ] }) # user system elapsed # 1.340 0.013 1.353 # Still more efficient and still returning a data frame is: cat("Bert's x[,2]>x[,1]\n") system.time({ w <- x[, 2] > x[,1] x[w, ] <- x[w, 2:1] unique(x)}) # user system elapsed # 0.693 0.011 0.703 cat("my a.d.f(unique(cbind(do.call)))\n") system.time({ as.data.frame(unique(cbind(A=do.call(pmin,x), B=do.call(pmax,x)))) }) cat("Eric Berger's unique(...pmin...pmax)\n") system.time({ unique(data.frame(V1=pmin(x$Source,x$Target), V2=pmax(x$Source,x$Target))) }) cat("Eric Berger's transmuting tibble...\n") require(dplyr) xt<-tibble(x) system.time({ xt %>% transmute( a=pmin(Source,Target), b=pmax(Source,Target)) %>% unique() %>% rename(Source=a, Target=b) }) cat("Kimmo Elo's [OP] mutating paste\n") system.time({ xt %>% mutate(pair=mapply(function(x,y) paste0(sort(c(x,y)),collapse="-"), Source, Target)) %>% distinct(pair, .keep_all = T) %>% mutate(Source=sapply(pair, function(x) unlist(strsplit(x, split="-"))[1]), Target=sapply(pair, function(x) unlist(strsplit(x, split="-"))[2])) %>% select(-pair) }) cat("Rui Barradas' sort-based\n") system.time({ apply(x, 1, sort) |> t() |> unique() })