Hi: I am trying to identify mutually exclusive events from the following example: Cluster Gene Mutated not-mutated 1 G1 1 0 1 G2 1 0 1 G3 0 1 1 G4 0 1 1 G5 1 0 2 G1 0 1 2 G2 1 0 2 G3 1 0 2 G4 0 0 2 G5 1 0 In cluster 1 : G1, G2, G5 are mutated In cluster 2: G2, G3, G5 are mutated. I am interested in finding such G2-G5 event and G1-G3 events. In total I have a 8 clusters and 150 gene (1200 rows x 4 columns). What test could be appropriate to identify such pairs. In my naive understanding would a fishers-exact test give such combinations. Thanks a lot. -Adrian [[alternative HTML version deleted]]
On Aug 2, 2014, at 11:11 AM, Adrian Johnson wrote:> Hi: > > I am trying to identify mutually exclusive events from the following > example: >#------------- dat <- read.table(text="Cluster Gene Mutated not_mutated 1 G1 1 0 1 G2 1 0 1 G3 0 1 1 G4 0 1 1 G5 1 0 2 G1 0 1 2 G2 1 0 2 G3 1 0 2 G4 0 0 2 G5 1 0", header=TRUE, stringsAsFactors=FALSE) with(dat, table(Cluster, Gene, Mutated) ) #---------------- , , Mutated = 0 Gene Cluster G1 G2 G3 G4 G5 1 0 0 1 1 0 2 1 0 0 1 0 , , Mutated = 1 Gene Cluster G1 G2 G3 G4 G5 1 1 1 0 0 1 2 0 1 1 0 1 #-------------- Or: xtabs(Mutated ~ Cluster+Gene, data=dat) #---------------- Gene Cluster G1 G2 G3 G4 G5 1 1 1 0 0 1 2 0 1 1 0 1 I'm a bit unclear about your goals. Are you trying to identify the "Gene"s that have only one "Cluster" mutated as the "G1-G3" events and the Gene's that have either-Cluster but not both as the "G2-G5" events? If so you can choose the columns that have a sum of 2 for the first and columns with sum of 1 for the second.> > > In cluster 1 : G1, G2, G5 are mutated > > In cluster 2: G2, G3, G5 are mutated. > > > I am interested in finding such G2-G5 event and G1-G3 events. > > In total I have a 8 clusters and 150 gene (1200 rows x 4 columns). > > What test could be appropriate to identify such pairs. > > In my naive understanding would a fishers-exact test give such > combinations.It's even less clear what sort of "test" you propose. `fisher.test` is a test of association. It doesn't identify combinations.> > Thanks a lot. > > -Adrian > > [[alternative HTML version deleted]]This is a plain text mailing list.> > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
On Sat, Aug 2, 2014 at 1:11 PM, Adrian Johnson <oriolebaltimore at gmail.com> wrote:> Hi: > > I am trying to identify mutually exclusive events from the following > example: > > > Cluster Gene Mutated not-mutated > 1 G1 1 0 > 1 G2 1 0 > 1 G3 0 1 > 1 G4 0 1 > 1 G5 1 0 > 2 G1 0 1 > 2 G2 1 0 > 2 G3 1 0 > 2 G4 0 0 > 2 G5 1 0 > > > In cluster 1 : G1, G2, G5 are mutated > > In cluster 2: G2, G3, G5 are mutated. > > > I am interested in finding such G2-G5 event and G1-G3 events. > > In total I have a 8 clusters and 150 gene (1200 rows x 4 columns). > > What test could be appropriate to identify such pairs. > > In my naive understanding would a fishers-exact test give such > combinations. > > Thanks a lot. > > -AdrianI am having trouble visualizing your data. How about a sample? The easy is to do something like: temp <- head(realData,10); dput(temp); Then cut'n'paste the output from the dput() into another email here. But, asuming I have a bit of a grasp, you have four columns (example only shows 3). If you have a set of columns which are 0 & 1 or FALSE and TRUE, then you can create a "temp" column which encodes tehm simply by considering them to be binary digits in a number. I.e. tempColumn = 1 * column1 + 2 * column2 + 4*column3 + 8*column4. You can the "group" the data by this value. All rows with the same value are in the same "group". But I don't know what you want your output to look like. As an aside any value other than 0, 1, 2,4, or 8 could be considered invalid because it means that more than one column is TRUE, which violates your constraint. -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! <>< John McKown
On Sat, Aug 2, 2014 at 1:11 PM, Adrian Johnson <oriolebaltimore@gmail.com> wrote:> Hi: > > I am trying to identify mutually exclusive events from the following > example: > > > Cluster Gene Mutated not-mutated > 1 G1 1 0 > 1 G2 1 0 > 1 G3 0 1 > 1 G4 0 1 > 1 G5 1 0 > 2 G1 0 1 > 2 G2 1 0 > 2 G3 1 0 > 2 G4 0 0 > 2 G5 1 0 > > > In cluster 1 : G1, G2, G5 are mutated > > In cluster 2: G2, G3, G5 are mutated. > > > I am interested in finding such G2-G5 event and G1-G3 events. > > In total I have a 8 clusters and 150 gene (1200 rows x 4 columns). > > What test could be appropriate to identify such pairs. > > In my naive understanding would a fishers-exact test give such > combinations. > > Thanks a lot. > > -AdrianI am having trouble visualizing your data. How about a sample? The easy is to do something like: temp <- head(realData,10); dput(temp); Then cut'n'paste the output from the dput() into another email here. But, asuming I have a bit of a grasp, you have four columns (example only shows 3). If you have a set of columns which are 0 & 1 or FALSE and TRUE, then you can create a "temp" column which encodes tehm simply by considering them to be binary digits in a number. I.e. tempColumn = 1 * column1 + 2 * column2 + 4*column3 + 8*column4. You can the "group" the data by this value. All rows with the same value are in the same "group". But I don't know what you want your output to look like. -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! <>< John McKown [[alternative HTML version deleted]]