I don't think you answered the OP's query, although I confess that I
am not so sure I understand it either (see below). In any case, I
believe the R level loop (i.e. apply()) is unnecessary. There is a
unique (and a duplicated()) method for data frames, so simply
unique(x)
returns a data frame with all the unique rows of x.
However, I don't think that's what the OP wanted. (S)he appeared to
want all unique combinations of 2 columns. If I got that right (??),
combn(ncol(x),2) gives that and could be used for indexing. I'm not
sure parallel processing is useful here, but then again, I may have
misunderstood the query. If so, my apologies, and feel free to ignore
all the above :-( .
Cheers,
Bert
Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374
"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
H. Gilbert Welch
On Wed, Feb 5, 2014 at 3:26 PM, arun <smartpink111 at yahoo.com>
wrote:> Hi,
> Try ?duplicated()
> apply(x,2,function(x) {x[duplicated(x)]<-"";x})
> A.K.
>
>
>
> Hi all,
>
> I have a dataset of around a thousand column and a few thousands
> of rows. I'm trying to get all the possible combinations (without
> repetition) of the data columns and process them in parallel. Here's a
> simplification of what my data and my code looks like:
>
> mydata <- structure(list(col1 = c(231L, 8946L, 534L), col2 = c(123L,
2361L,
> 65L), col3 = c(5645L, 45L, 51L), col4 = c(654L, 356L, 32L), col5 = c(21L,
> 1L, 51L), col6 = c(4L, 4515L, 15L), col7 = c(6L, 1L, 535L), col8 = c(894L,
> 20L, 35L), col9 = c(68L, 21L, 123L), col10 = c(46L, 2L, 2L)), .Names =
c("col1",
> "col2", "col3", "col4", "col5",
"col6", "col7", "col8", "col9",
> "col10"), class = "data.frame", row.names = c(NA, -3L))
>
> require(foreach)
>
> x <-
> foreach(m=1:5, .combine='cbind') %:%
> foreach(j=(m+1):10, .combine='c') %do% {
> paste(colnames(mydata)[m], colnames(mydata)[j])
>
> }
>
> x
>
>
>
> if you execute the command above in R, you will get this result.
>
>
>
> result.1 result.2 result.3 result.4 result.5
> [1,] "col1 col2" "col2 col3" "col3 col4"
"col4 col5" "col5 col6"
> [2,] "col1 col3" "col2 col4" "col3 col5"
"col4 col6" "col5 col7"
> [3,] "col1 col4" "col2 col5" "col3 col6"
"col4 col7" "col5 col8"
> [4,] "col1 col5" "col2 col6" "col3 col7"
"col4 col8" "col5 col9"
> [5,] "col1 col6" "col2 col7" "col3 col8"
"col4 col9" "col5 col10"
> [6,] "col1 col7" "col2 col8" "col3 col9"
"col4 col10" "col5 col6"
> [7,] "col1 col8" "col2 col9" "col3 col10"
"col4 col5" "col5 col7"
> [8,] "col1 col9" "col2 col10" "col3 col4"
"col4 col6" "col5 col8"
> [9,] "col1 col10" "col2 col3" "col3 col5"
"col4 col7" "col5 col9"
>
> notice that first problem I face that in the last row of the
> second column of the "x" matrix says "col2 col3" which
is a repetition
> of the first item (which happens also in all succeeding columns). I was
> planning to have unique combinations of all columns, which obviously,
> did not work.
>
> Can somebody please help me with this? My desired output would be
>
>
>
> result.1 result.2 result.3 result.4 result.5
> [1,] "col1 col2" "col2 col3" "col3 col4"
"col4 col5" "col5 col6"
> [2,] "col1 col3" "col2 col4" "col3 col5"
"col4 col6" "col5 col7"
> [3,] "col1 col4" "col2 col5" "col3 col6"
"col4 col7" "col5 col8"
> [4,] "col1 col5" "col2 col6" "col3 col7"
"col4 col8" "col5 col9"
> [5,] "col1 col6" "col2 col7" "col3 col8"
"col4 col9"
> [6,] "col1 col7" "col2 col8" "col3 col9"
> [7,] "col1 col8" "col2 col9"
> [8,] "col1 col9" "col2 col10"
> [9,] "col1 col10"
>
>
> Many thanks
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.