Hi, May be you can try this: dat1New<-? dat1[!(duplicated(dat1$gene)|duplicated(dat1$gene,fromLast=TRUE)),] dat2<-dat1[duplicated(dat1$gene)|duplicated(dat1$gene,fromLast=TRUE),] ?lst1<-split(dat2,dat2$gene) dat3<-unsplit(lapply(lst1,function(x) {x1<- sum(apply(x[,6:32],2,function(y) y[1]>=y[2]));x2<- sum(apply(x[,6:32],2, function(y) y[1]<=y[2])); if(x1>x2) x[1,] else x[2,] } ),unique(dat2$gene)) #assuming that there are not more than 2 copies of a particular gene. (In the dataset, it was not present) ?dat4<-rbind(dat1New,dat3) dat5<-dat4[order(as.numeric(row.names(dat4))),] ?dim(dat5) #[1] 639? 32 A.K. ________________________________ From: Vivek Das <vd4mmind at gmail.com> To: arun <smartpink111 at yahoo.com> Sent: Monday, September 9, 2013 2:30 PM Subject: Re: Duplicated genes actually these are all differentially expressed genes. So the one with the most differentially expressed will be there in the list and its duplicate will be removed. Can you tell me again? I think then the script will change right? ---------------------------------------------------------- Vivek Das PhD Student in Computational Biology Giuseppe Testa's Lab European School of Molecular Medicine IFOM-IEO Campus Via Adamello, 16 Milan, Italy emails:?vivek.das at ieo.eu ??? ??? ??? vchris_05 at yahoo.co.in ??? ??? ??? vd4mmind at gmail.com On Mon, Sep 9, 2013 at 8:27 PM, arun <smartpink111 at yahoo.com> wrote: Hi,>Try: >dat1<- read.table("DEGs_all.txt",sep="",header=TRUE,stringsAsFactors=FALSE) >dim(dat1) >#[1] 725? 32 >length(unique(dat1$gene)) >#[1] 639 >?dat2<-dat1[!duplicated(dat1$gene),] >?dim(dat2) >#[1] 639? 32 > >dim(unique(dat1)) >#[1] 725? 32 > >The duplicated genes have different expression values.? You didn't provide information on how to select those unique genes.? Here, the first row of every duplicated gene will be selected and others are removed. > >But suppose, you want to get the mean values of those rows. >library(plyr) >?res<-ddply(dat1[,c(1,6:32)],.(gene), numcolwise(mean,na.rm=TRUE)) >dim(res) >#[1] 639? 28 > >A.K. > > > > > > > >________________________________ >From: Vivek Das <vd4mmind at gmail.com> >To: arun <smartpink111 at yahoo.com> >Sent: Monday, September 9, 2013 1:35 PM >Subject: Urgent help > > > >I have a data list with genes , I want to reduce the list to its unique genes. The genes are having expression values but some of the genes are duplicates. Is there any way where I can remove the duplicate names from the list and only have the genes once with their corresponding values.Please see the attached matrix. > >It will be nice if you can let me know. Its a bit urgent > >---------------------------------------------------------- > >Vivek Das >PhD Student in Computational Biology >Giuseppe Testa's Lab >European School of Molecular Medicine >IFOM-IEO Campus >Via Adamello, 16 >Milan, Italy > >emails:?vivek.das at ieo.eu >??? ??? ??? vchris_05 at yahoo.co.in >??? ??? ??? vd4mmind at gmail.com >