Hi everybody, I would like to know whether it is possible to compare to tables for certain parameters. I have these two tables: gene table name chr start end str accession Length gen1 4 646752 646838 + MI0005806 86 gen12 2L 243035 243141 - MI0005821 106 gen3 2L 159838 159928 + MI0005813 90 gen7 2L 1831685 1831799 - MI0011290 114 gen4 2L 2737568 2737661 + MI0017696 93 ... localization table: Chr Start End length 4 136532 138654 2122 3 139870 141970 2100 2L 157838 158440 602 X 160834 162966 2132 4 204040 208536 4496 ... I would like to check whether a specific gene lie within a certain region. For example I want to see if gene 3 on chromosome 2L lies within the region given in the second table. What I would like to is like 1. check if the gene lies on a specific chromosome 1.a if no - go to the next line 1.b if yes - go to 2 2. check if the start position of the gene is bigger than the start position of the localization table AND if it smaller than the end position (if it lies between the start and end positions in the localization table) 2.a if no - go to the next gene 2.b if yes - give it to me. I was having difficulties doing it without running into three interleaved conditional loops (if). I would appreciate any help. Thanks Assa [[alternative HTML version deleted]]
On Oct 25, 2011, at 6:42 AM, Assa Yeroslaviz wrote:> Hi everybody, > > I would like to know whether it is possible to compare to tables for > certain > parameters. > I have these two tables: > gene table > name chr start end str accession Length > gen1 4 646752 646838 + MI0005806 86 > gen12 2L 243035 243141 - MI0005821 106 > gen3 2L 159838 159928 + MI0005813 90 > gen7 2L 1831685 1831799 - MI0011290 114 > gen4 2L 2737568 2737661 + MI0017696 93 > ... > > localization table: > Chr Start End length > 4 136532 138654 2122 > 3 139870 141970 2100 > 2L 157838 158440 602 > X 160834 162966 2132 > 4 204040 208536 4496 > ... > > I would like to check whether a specific gene lie within a certain > region. > For example I want to see if gene 3 on chromosome 2L lies within the > region > given in the second table. >rd.txt <- function(txt, header=TRUE, ...) { rd <- read.table(textConnection(txt), header=header, ...) closeAllConnections() rd } # Data input genetable <- rd.txt("name chr start end str accession Length gen1 4 646752 646838 + MI0005806 86 gen12 2L 243035 243141 - MI0005821 106 gen3 2L 159838 159928 + MI0005813 90 gen7 2L 1831685 1831799 - MI0011290 114 gen4 2L 2737568 2737661 + MI0017696 93") loctable <- rd.txt("Chr Start End length 4 136532 138654 2122 3 139870 141970 2100 2L 157838 158440 602 X 160834 162966 2132 4 204040 208536 4496") # Helper function inregion <- function(vec, locs) { any( apply(locs, 1, function(x) vec["start"]>x[1] & vec["end"]<=x[2])) } # Test the function inregion(genetable[2, ], loctable[, c("Start", "End")]) # [1] FALSE apply(genetable, 1, function(x) inregion(x, loctable[, c("Start", "End")]) ) #[1] FALSE FALSE FALSE FALSE FALSE The logical vector can be used to extract elements from genetable, but seems pointless to offer code that produces an empty dataframe. (Wouldn't it have been more sensible to offer a test case that had a combination that satisfied you requirements?) I'm guessing that this facility would already be implemented in one or more BioConductor functions. -- David.> What I would like to is like > 1. check if the gene lies on a specific chromosome > 1.a if no - go to the next line > 1.b if yes - go to 2 > 2. check if the start position of the gene is bigger than the start > position > of the localization table AND if it smaller than the end position > (if it > lies between the start and end positions in the localization table) > 2.a if no - go to the next gene > 2.b if yes - give it to me. > > I was having difficulties doing it without running into three > interleaved > conditional loops (if). > > I would appreciate any help. > > Thanks > > Assa > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
I (now) see that you crossposted rhelp and bioc. That practice is deprecated. Please read the Posting Guide more thoroughly. I will need to bear the burden of my sin in not looking at headers more closely in my own. -- David. On Oct 25, 2011, at 9:27 AM, David Winsemius wrote:> > On Oct 25, 2011, at 6:42 AM, Assa Yeroslaviz wrote: > >> Hi everybody, >> >> I would like to know whether it is possible to compare to tables >> for certain >> parameters. >> I have these two tables: >> gene table >> name chr start end str accession Length >> gen1 4 646752 646838 + MI0005806 86 >> gen12 2L 243035 243141 - MI0005821 106snipped -- David Winsemius, MD West Hartford, CT
On 10/25/2011 03:42 AM, Assa Yeroslaviz wrote:> Hi everybody, > > I would like to know whether it is possible to compare to tables for certain > parameters. > I have these two tables: > gene table > name chr start end str accession Length > gen1 4 646752 646838 + MI0005806 86 > gen12 2L 243035 243141 - MI0005821 106 > gen3 2L 159838 159928 + MI0005813 90 > gen7 2L 1831685 1831799 - MI0011290 114 > gen4 2L 2737568 2737661 + MI0017696 93 > ... > > localization table: > Chr Start End length > 4 136532 138654 2122 > 3 139870 141970 2100 > 2L 157838 158440 602 > X 160834 162966 2132 > 4 204040 208536 4496 > ... > > I would like to check whether a specific gene lie within a certain region. > For example I want to see if gene 3 on chromosome 2L lies within the region > given in the second table.Hi Assa -- In Bioconductor, use the GenomicRanges package. Create two GRanges objects genes = with(genetable, GRanges(chr, IRanges(start, end), str, accession=accession, Length=length) locations = with(locationtable, GRanges(Chr, IRanges(Start, End))) then olaps = findOverlaps(genes, locations) queryHits(olaps) and subjectHits(olaps) index each gene with all locations it overlaps. The definition of 'overlap' is flexible, see ?findOverlaps. Martin> > What I would like to is like > 1. check if the gene lies on a specific chromosome > 1.a if no - go to the next line > 1.b if yes - go to 2 > 2. check if the start position of the gene is bigger than the start position > of the localization table AND if it smaller than the end position (if it > lies between the start and end positions in the localization table) > 2.a if no - go to the next gene > 2.b if yes - give it to me. > > I was having difficulties doing it without running into three interleaved > conditional loops (if). > > I would appreciate any help. > > Thanks > > Assa > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor-- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
Hi Tamara, No problem. ?dat3<- rbind(dat1,dat2)? #Sorry, forgot this line. A.K. ________________________________ From: Tamara Simakova <tsimakova at sequoiag.com> To: arun <smartpink111 at yahoo.com> Sent: Thursday, May 30, 2013 12:26 PM Subject: Re: [R] [BioC] comparing two tables Hello Arun, Thanks very much for help. Indeed there is a mistake in the resulted table, it should be exactly as in your example. When I use dat3New<-with(dat3,aggregate(info,list(chr,pos,ref,alt),FUN=function(x) x)) ?colnames(dat3New)<- colnames(dat1) the R returns "dat3 is not found", but with plyr library everything works well. Thank again, Tamara? 2013/5/30 arun <smartpink111 at yahoo.com>> >Assuming that you wanted to label '1' for table1 and '4' for table2 (info column).? > >Also, not sure why chr2 row is not in the resulted table. > >dat1<- read.table(text=" > >chr??? pos??? ref??? alt >chr1??? 5??? A??? G >chr1??? 8??? T??? C >chr2??? 2??? C??? T >",sep="",header=TRUE,stringsAsFactors=FALSE) > >dat2<-read.table(text=" > >chr??? pos??? ref??? alt >chr1??? 5??? A??? G >chr1??? 7??? T??? C >chr1??? 8??? T??? A >",sep="",header=TRUE,stringsAsFactors=FALSE) >dat1$info<- 1 >?dat2$info<-4 >?dat3New<-with(dat3,aggregate(info,list(chr,pos,ref,alt),FUN=function(x) x)) >?colnames(dat3New)<- colnames(dat1) >dat3New1<-dat3New[order(dat3New$chr,dat3New$pos),] >?row.names(dat3New1)<-1:nrow(dat3New1) >?dat3New1 >#?? chr pos ref alt info >#1 chr1?? 5?? A?? G 1, 4 >#2 chr1?? 7?? T?? C??? 4 >#3 chr1?? 8?? T?? A??? 4 >#4 chr1?? 8?? T?? C??? 1 >#5 chr2?? 2?? C?? T??? 1 > >#or >library(plyr) >res<-ddply(merge(dat1,dat2,all=TRUE),.(chr,pos,ref,alt),summarize,info=list(info)) >res >#?? chr pos ref alt info >#1 chr1?? 5?? A?? G 1, 4 >#2 chr1?? 7?? T?? C??? 4 >#3 chr1?? 8?? T?? A??? 4 >#4 chr1?? 8?? T?? C??? 1 >#5 chr2?? 2?? C?? T??? 1 >names(dat3New1$info)<-NULL >?identical(dat3New1,res) >#[1] TRUE > >A.K. > > >----- Original Message ----- >From: tomkina <tsimakova at sequoiag.com> >To: r-help at r-project.org >Cc: >Sent: Thursday, May 30, 2013 4:45 AM >Subject: Re: [R] [BioC] comparing two tables > >Hello, > >I have the similar task.? I have two tables and I need to get the third >table containing data from both of them with extra column with information >of what data from which table: > >table1??? ??? ??? >chr??? pos??? ref??? alt >chr1??? 5??? A??? G >chr1??? 8??? T??? C >chr2??? 2??? C??? T > >table2??? ??? ??? >chr??? pos??? ref??? alt >chr1??? 5??? A??? G >chr1??? 7??? T??? C >chr1??? 8??? T??? A > >resulted table >chr??? pos??? ref??? alt??? info >chr1??? 5??? A??? G??? 1, 4 >chr1??? 7??? T??? C??? 4 >chr1??? 8??? T??? C??? 1 >chr1??? 8??? T??? A??? 4 > >I need all 4 columns (chr, pos, ref and alt) to be compared. I didn't find >this function in Bioconductor. I am a beginner at R and would appreciate any >help. > >Thanks, >Tamara > > > > > > >-- >View this message in context: http://r.789695.n4.nabble.com/comparing-two-tables-tp3936306p4668272.html >Sent from the R help mailing list archive at Nabble.com. > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > >