Gmail
2013-Nov-28 17:18 UTC
[R] Counting variables repeted in dataframe columns to create a presence-absence table
Hi! I'm new in R and I'm writing you asking for some guidance. I had analyzed a comparative genomic microarray data of /56 Salmonella/ strains to identify absent genes in each of the serovars, and finally I got a matrix that looks like that: > data[1:5,1:5] Abortusovis07918 Agona08561 Anatum08125 Arizonae65S Braenderup08488 1 S5305B_IGR S5305B_IGR S5305B_IGR S5305B_IGR S5305B_IGR 2 S5305A_IGR S5300A_IGR S5305A_IGR S5300A_IGR S5300A_IGR 3 S5300A_IGR S5300B_IGR S5300A_IGR S5300B_IGR S5300B_IGR 4 S5300B_IGR S5299B_IGR S5300B_IGR S5299B_IGR S5299B_IGR 5 S5299B_IGR S5299A_IGR S5299B_IGR S5829B_IGR S5299A_IGR The variables corresponds to those genes identified as absent in each of the serovars. I would like to create a presence-absence matrix of those genes comparing all the serovars at the same time, I assume that should not be complicated but I don't know how to do it. I would like a matrix similar to the next one: > data_m[1:5,1:5] Abortusovis07918 Agona08561 Anatum08125 Arizonae65S Braenderup08488 S5305B_IGR 1 1 1 1 1 S5305A_IGR 1 0 1 0 0 S5300A_IGR 1 1 1 1 1 Any help would be welcome, and thank you in advance, Oihane -- Oihane Irazoki Sanchez PhD Student, Molecular Microbiology Genetics and Microbiology Department, Faculty of Biosciences Autonomous University of Barcelona 08193 Bellaterra (Barcelona), Spain Telf: 34 - 935 811 665 E-mail: oihane.irazoki@uab.cat / o.irazoki@gmail.com [[alternative HTML version deleted]]
arun
2013-Nov-28 19:57 UTC
[R] Counting variables repeted in dataframe columns to create a presence-absence table
Hi, Try: data_m <- read.table(text="Abortusovis07918 Agona08561 Anatum08125 Arizonae65S Braenderup08488 1????? S5305B_IGR S5305B_IGR? S5305B_IGR? S5305B_IGR S5305B_IGR 2????? S5305A_IGR S5300A_IGR? S5305A_IGR? S5300A_IGR S5300A_IGR 3????? S5300A_IGR S5300B_IGR? S5300A_IGR? S5300B_IGR S5300B_IGR 4????? S5300B_IGR S5299B_IGR? S5300B_IGR? S5299B_IGR S5299B_IGR 5????? S5299B_IGR S5299A_IGR? S5299B_IGR? S5829B_IGR S5299A_IGR",sep="",header=TRUE,stringsAsFactors=FALSE) ?data_m$new <-1 library(reshape2) ?dM <- melt(data_m,id.vars="new") xtabs(new~value+variable,dM) #or ?dcast(dM,value~variable,value.var="new",fill=0) A.K. On Thursday, November 28, 2013 12:18 PM, Gmail <o.irazoki at gmail.com> wrote: Hi! I'm new in R and I'm writing you asking for some guidance. I had analyzed a comparative genomic microarray data of /56 Salmonella/ strains to identify absent genes in each of the serovars, and finally I got a matrix that looks like that:> data[1:5,1:5]? Abortusovis07918 Agona08561 Anatum08125 Arizonae65S Braenderup08488 1? ? ? S5305B_IGR S5305B_IGR? S5305B_IGR? S5305B_IGR S5305B_IGR 2? ? ? S5305A_IGR S5300A_IGR? S5305A_IGR? S5300A_IGR S5300A_IGR 3? ? ? S5300A_IGR S5300B_IGR? S5300A_IGR? S5300B_IGR S5300B_IGR 4? ? ? S5300B_IGR S5299B_IGR? S5300B_IGR? S5299B_IGR S5299B_IGR 5? ? ? S5299B_IGR S5299A_IGR? S5299B_IGR? S5829B_IGR S5299A_IGR The variables corresponds to those genes identified as absent in each of the serovars. I would like to create a presence-absence matrix of those genes comparing all the serovars at the same time, I assume that should not be complicated but I don't know how to do it. I would like a matrix similar to the next one:> data_m[1:5,1:5]? ? ? ? ? ? ? Abortusovis07918 Agona08561 Anatum08125 Arizonae65S Braenderup08488 S5305B_IGR? ? ? ? ? 1? ? ? ? ? ? ? ? 1? ? ? ? ? 1? ? ? ? 1? ? ? 1 S5305A_IGR? ? ? ? ? 1? ? ? ? ? ? ? ? 0? ? ? ? ? 1? ? ? ? 0? ? 0 S5300A_IGR? ? ? ? ? 1? ? ? ? ? ? ? ? 1? ? ? ? ? 1? ? ? ? 1? ? ? 1 Any help would be welcome, and thank you in advance, Oihane -- Oihane Irazoki Sanchez PhD Student, Molecular Microbiology Genetics and Microbiology Department, Faculty of Biosciences Autonomous University of Barcelona 08193 Bellaterra (Barcelona), Spain Telf: 34 - 935 811 665 E-mail: oihane.irazoki at uab.cat / o.irazoki at gmail.com ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.