thr3ads.net - R help - [R] Duplicated genes [Sep 2013]

If this information is useful, please help other people find it:
Share via:

arun

2013-Sep-09 19:30 UTC

[R] Duplicated genes

Hi,

May be you can try this:
dat1New<-?
dat1[!(duplicated(dat1$gene)|duplicated(dat1$gene,fromLast=TRUE)),]
dat2<-dat1[duplicated(dat1$gene)|duplicated(dat1$gene,fromLast=TRUE),]
?lst1<-split(dat2,dat2$gene)
dat3<-unsplit(lapply(lst1,function(x) {x1<-
sum(apply(x[,6:32],2,function(y) y[1]>=y[2]));x2<- sum(apply(x[,6:32],2,
function(y) y[1]<=y[2])); if(x1>x2) x[1,] else x[2,] }
),unique(dat2$gene)) #assuming that there are not more than 2 copies of a
particular gene. (In the dataset, it was not present)
?dat4<-rbind(dat1New,dat3)
dat5<-dat4[order(as.numeric(row.names(dat4))),]
?dim(dat5)
#[1] 639? 32

A.K.

________________________________
From: Vivek Das <vd4mmind at gmail.com>
To: arun <smartpink111 at yahoo.com> 
Sent: Monday, September 9, 2013 2:30 PM
Subject: Re: Duplicated genes

actually these are all differentially expressed genes. So the one with the most
differentially expressed will be there in the list and its duplicate will be
removed. Can you tell me again? I think then the script will change right?

----------------------------------------------------------

Vivek Das
PhD Student in Computational Biology
Giuseppe Testa's Lab
European School of Molecular Medicine
IFOM-IEO Campus
Via Adamello, 16
Milan, Italy

emails:?vivek.das at ieo.eu
??? ??? ??? vchris_05 at yahoo.co.in
??? ??? ??? vd4mmind at gmail.com

On Mon, Sep 9, 2013 at 8:27 PM, arun <smartpink111 at yahoo.com> wrote:

Hi,>Try:
>dat1<-
read.table("DEGs_all.txt",sep="",header=TRUE,stringsAsFactors=FALSE)
>dim(dat1)
>#[1] 725? 32
>length(unique(dat1$gene))
>#[1] 639
>?dat2<-dat1[!duplicated(dat1$gene),]
>?dim(dat2)
>#[1] 639? 32
>
>dim(unique(dat1))
>#[1] 725? 32
>
>The duplicated genes have different expression values.? You didn't
provide information on how to select those unique genes.? Here, the first row of
every duplicated gene will be selected and others are removed.
>
>But suppose, you want to get the mean values of those rows.
>library(plyr)
>?res<-ddply(dat1[,c(1,6:32)],.(gene), numcolwise(mean,na.rm=TRUE))
>dim(res)
>#[1] 639? 28
>
>A.K.
>
>
>
>
>
>
>
>________________________________
>From: Vivek Das <vd4mmind at gmail.com>
>To: arun <smartpink111 at yahoo.com>
>Sent: Monday, September 9, 2013 1:35 PM
>Subject: Urgent help
>
>
>
>I have a data list with genes , I want to reduce the list to its unique
genes. The genes are having expression values but some of the genes are
duplicates. Is there any way where I can remove the duplicate names from the
list and only have the genes once with their corresponding values.Please see the
attached matrix.
>
>It will be nice if you can let me know. Its a bit urgent
>
>----------------------------------------------------------
>
>Vivek Das
>PhD Student in Computational Biology
>Giuseppe Testa's Lab
>European School of Molecular Medicine
>IFOM-IEO Campus
>Via Adamello, 16
>Milan, Italy
>
>emails:?vivek.das at ieo.eu
>??? ??? ??? vchris_05 at yahoo.co.in
>??? ??? ??? vd4mmind at gmail.com
>

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Sep 2013 - Duplicated genes

[R] Duplicated genes

Seemingly Similar Threads

Wisdom of the Ancients