Hi, I have a number of genes (columns) for which I want to examine pairwise associations of genotypes (each row is an individual)...For example (see data below), I would like to compare M1 to M2, M2 to M3, and M1 to M3 (i.e. does ac from M1 tend to be found with bc from M2 more often than expected.) Down stream I will be performing chi square tests for each pair. But I am looking for a way to set all pairs of genes (order doesn't matter, so with 3 genes, there are 3 comparisons, 4 genes=6 comparisons) in a new data.frame or matrix so that I can then test each pair with a chi-square test in a loop. Below is some sample data of the form I will be using.> lets<-c("ab","ac","bc","bd") > epi<-data.frame(cbind("M1"= c(sample(lets,10,replace=TRUE)),"M2"=c(sample(lets,10,replace=TRUE)), "M3"=c(sample(lets,10, replace=TRUE))))> print(epi)M1 M2 M3 1 ac bc bd 2 ac ac bd 3 bd bd bd 4 ab ac bd 5 ac bc bd 6 bd bd bc 7 ab ac ab 8 bc bd ab 9 bd ab ac 10 bc bc bd I tried a for loop to set each column against the others, but get errors for undefined columns selected: for(i in 1:3) { k=i+1 j=k for(j in k:3){ epi3=cbind("A"=epi[,i],"B"=epi[,j]) print(epi3) } }>A B 1 ac bc 2 ac ac 3 bd bd 4 ab ac 5 ac bc 6 bd bd 7 ab ac 8 bc bd 9 bd ab 10 bc bc A B 1 ac bd 2 ac bd 3 bd bd 4 ab bd 5 ac bd 6 bd bc 7 ab ab 8 bc ab 9 bd ac 10 bc bd A B 1 bc bd 2 ac bd 3 bd bd 4 ac bd 5 bc bd 6 bd bc 7 ac ab 8 bd ab 9 ab ac 10 bc bd Error in `[.data.frame`(epi, , j) : undefined columns selected I get the output in the right format, but with errors, and the actual data frame epi 3, has only one column, Im sure this is a simple fix...any ideas? Could I use combn instead? Louis [[alternative HTML version deleted]]
Hi Louis, You pose an interesting question. Bellow is my take on a solution. It is a function named "compare.all.column.pairs" which gets as input the original data (as a data.frame or a matrix, with all the columns you'd want to compare). And also the function you would like to use on each pair of columns. The function then outputs a list with the output of the function for every pair of column. Also, if you'd want the output to be in the form of a data.frame, there is the return_as_data.frame parameter you can set to TRUE and it will do so. However, in order for that to work, you must have the FUNCTION you enter, output something that can easily be turned into a data.frame. Beneath the function there is also two examples (one for t.test, and another for chisq.test) I hope this helps, Cheers, Tal compare.all.column.pairs <- function(DATA, FUNCTION, return_as_data.frame F, ...) { # FUNCTION should always be of the form that can receive two parameters: function(column1, column2) number_of_columns <- dim(DATA)[2] all_collumns_combinations <- combn(seq_len(number_of_columns), 2) number_of_combinations <- dim(all_collumns_combinations)[2] FUNCTION_paits_output <- vector("list", number_of_combinations) list_names <- vector("character", number_of_combinations) if(is.null(colnames(DATA))) colnames(DATA) <- paste("Column",seq_len(number_of_columns), sep = "_") for(i in seq_len(number_of_combinations)) { column1 <- all_collumns_combinations[1,i] column2 <- all_collumns_combinations[2,i] FUNCTION_paits_output[[i]] <- FUNCTION(DATA[,column1], DATA[,column2] ,...) # give this list item a name list_names[i] <- paste("Pair number ", i,": ", paste(colnames(DATA)[c(column1, column2)], collapse = " VS ") , sep = "") } # set names for the items in the list attr(FUNCTION_paits_output, "names") <- list_names output <- FUNCTION_paits_output if(return_as_data.frame) { require(plyr) combo_dataframe <- t(all_collumns_combinations) colnames(combo_dataframe) <- c("Column1", "Column2") output <- ldply(FUNCTION_paits_output, function(x) {x}) output <- cbind(combo_dataframe, output) } return(output) } set.seed(341) norm_matrix <- matrix(rnorm(15*4), 15, 4) compare.all.column.pairs(norm_matrix, t.test, return_as_data.frame = F) # something with chi square set.seed(341) rounded_norm_matrix <- round(matrix(rnorm(15*4), 15, 4)) make.table.and.chi.square <- function(x,y) {chisq.test(table(x,y))$p.value} compare.all.column.pairs(rounded_norm_matrix, make.table.and.chi.square, return_as_data.frame = T) ----------------Contact Details:------------------------------------------------------- Contact me: Tal.Galili@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- On Wed, Apr 13, 2011 at 10:32 PM, Louis Plough <lplough@usc.edu> wrote:> lets<-c("ab","ac","bc","bd") > > epi<-data.frame(cbind("M1"= c(sample(lets,10, > replace=TRUE)),"M2"=c(sample(lets,10,replace=TRUE)), "M3"=c(sample(lets,10, > replace=TRUE)))) >[[alternative HTML version deleted]]
On Apr 13, 2011, at 21:32 , Louis Plough wrote:> Hi, > I have a number of genes (columns) for which I want to examine pairwise > associations of genotypes (each row is an individual)...For example (see > data below), I would like to compare M1 to M2, M2 to M3, and M1 to M3 (i.e. > does ac from M1 tend to be found with bc from M2 more often than expected.) > Down stream I will be performing chi square tests for each pair. > > But I am looking for a way to set all pairs of genes (order doesn't matter, > so with 3 genes, there are 3 comparisons, 4 genes=6 comparisons) in a new > data.frame or matrix so that I can then test each pair with a chi-square > test in a loop. > > Below is some sample data of the form I will be using. > >> lets<-c("ab","ac","bc","bd") >> epi<-data.frame(cbind("M1"= c(sample(lets,10, > replace=TRUE)),"M2"=c(sample(lets,10,replace=TRUE)), "M3"=c(sample(lets,10, > replace=TRUE)))) >> print(epi) > M1 M2 M3 > 1 ac bc bd > 2 ac ac bd > 3 bd bd bd > 4 ab ac bd > 5 ac bc bd > 6 bd bd bc > 7 ab ac ab > 8 bc bd ab > 9 bd ab ac > 10 bc bc bd > > I tried a for loop to set each column against the others, but get errors for > undefined columns selected: > > for(i in 1:3) { > k=i+1 > j=k > for(j in k:3){ > epi3=cbind("A"=epi[,i],"B"=epi[,j]) > > print(epi3) > } > ...> 10 bc bd > Error in `[.data.frame`(epi, , j) : undefined columns selected > > > I get the output in the right format, but with errors, and the actual data > frame epi 3, has only one column, > > > Im sure this is a simple fix...any ideas? Could I use combn instead?Well, for i==3, your inner loop is for (j in 4:3), and R is not C, so this runs backwards from 4. The quick fix is (I think k is superfluous) for (i in 1:2) for (j in (i+1):3) However, also check pairwise.table() for a generic solution (including adjustment for multiple testing) -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com