HI, this is my problem I want to subset this file df, using only unique df$exon printing the line once even if df$exon appear several times: unique(df$exon) will show me the unique exons If I try to print only the unique exon lines with df[unique(df$exon),] -this doesn't print only the unique ones :( could you help? thanks Nat exon size chr start end 413077 ChrX_133594175_133594368_HPRT1 193 ChrX 133594175 133594368 413270 ChrX_133594183_133594368_HPRT1 185 ChrX 133594183 133594368 413455 ChrX_133594381_133594565_HPRT1 184 ChrX 133594381 133594565 413639 ChrX_133607389_133607495_HPRT1 106 ChrX 133607389 133607495 413745 ChrX_133607389_133607495_HPRT1 106 ChrX 133607389 133607495 413851 ChrX_133607404_133607495_HPRT1 91 ChrX 133607404 133607495 413942 ChrX_133609211_133609394_HPRT1 183 ChrX 133609211 133609394 414125 ChrX_133609211_133609394_HPRT1 183 ChrX 133609211 133609394 414308 ChrX_133620495_133620560_HPRT1 65 ChrX 133620495 133620560 414373 ChrX_133620495_133620560_HPRT1 65 ChrX 133620495 133620560 414438 ChrX_133620692_133620696_HPRT1 4 ChrX 133620692 133620696 414442 ChrX_133624218_133624235_HPRT1 17 ChrX 133624218 133624235 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
Hello,> HI, > this is my problem I want to subset this file df, using only unique > df$exon printing the line once even if df$exon appear several times: > > unique(df$exon) will show me the unique exons > If I try to print only the unique exon lines > with df[unique(df$exon),] -this doesn't print only the unique ones :( >Try inx <- match(unique(df$exon), df$exon) df[inx, ] Hope this helps, Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/subseting-a-data-frame-tp4438745p4438922.html Sent from the R help mailing list archive at Nabble.com.
R. Michael Weylandt <michael.weylandt@gmail.com>
2012-Mar-02 17:02 UTC
[R] subseting a data frame
I believe you want the duplicated() function. Michael On Mar 2, 2012, at 10:19 AM, nathalie <nac at sanger.ac.uk> wrote:> HI, > this is my problem I want to subset this file df, using only unique df$exon printing the line once even if df$exon appear several times: > > unique(df$exon) will show me the unique exons > If I try to print only the unique exon lines > with df[unique(df$exon),] -this doesn't print only the unique ones :( > > could you help? > thanks > Nat > > > > > exon size chr start end > 413077 ChrX_133594175_133594368_HPRT1 193 ChrX 133594175 133594368 > 413270 ChrX_133594183_133594368_HPRT1 185 ChrX 133594183 133594368 > 413455 ChrX_133594381_133594565_HPRT1 184 ChrX 133594381 133594565 > 413639 ChrX_133607389_133607495_HPRT1 106 ChrX 133607389 133607495 > 413745 ChrX_133607389_133607495_HPRT1 106 ChrX 133607389 133607495 > 413851 ChrX_133607404_133607495_HPRT1 91 ChrX 133607404 133607495 > 413942 ChrX_133609211_133609394_HPRT1 183 ChrX 133609211 133609394 > 414125 ChrX_133609211_133609394_HPRT1 183 ChrX 133609211 133609394 > 414308 ChrX_133620495_133620560_HPRT1 65 ChrX 133620495 133620560 > 414373 ChrX_133620495_133620560_HPRT1 65 ChrX 133620495 133620560 > 414438 ChrX_133620692_133620696_HPRT1 4 ChrX 133620692 133620696 > 414442 ChrX_133624218_133624235_HPRT1 17 ChrX 133624218 133624235 > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
R. Michael Weylandt <michael.weylandt@gmail.com>
2012-Mar-02 17:56 UTC
[R] subseting a data frame
Please always cc the list for archival/threading reasons. Sort answer is that unique() gives the unique elements rather than something you should subset by, like a set of logical indices or row numbers. Note that in general unique(x) == x[!duplicated(x)] I'd imagine there are cases where this breaks down but I can't assemble one off the top of my head. Michael On Mar 2, 2012, at 12:13 PM, nathalie <nac at sanger.ac.uk> wrote:> thanks > why unique doesn't work here?? >> I believe you want the duplicated() function. >> >> Michael >> >> On Mar 2, 2012, at 10:19 AM, nathalie<nac at sanger.ac.uk> wrote: >> >>> HI, >>> this is my problem I want to subset this file df, using only unique df$exon printing the line once even if df$exon appear several times: >>> >>> unique(df$exon) will show me the unique exons >>> If I try to print only the unique exon lines >>> with df[unique(df$exon),] -this doesn't print only the unique ones :( >>> >>> could you help? >>> thanks >>> Nat >>> >>> >>> >>> >>> exon size chr start end >>> 413077 ChrX_133594175_133594368_HPRT1 193 ChrX 133594175 133594368 >>> 413270 ChrX_133594183_133594368_HPRT1 185 ChrX 133594183 133594368 >>> 413455 ChrX_133594381_133594565_HPRT1 184 ChrX 133594381 133594565 >>> 413639 ChrX_133607389_133607495_HPRT1 106 ChrX 133607389 133607495 >>> 413745 ChrX_133607389_133607495_HPRT1 106 ChrX 133607389 133607495 >>> 413851 ChrX_133607404_133607495_HPRT1 91 ChrX 133607404 133607495 >>> 413942 ChrX_133609211_133609394_HPRT1 183 ChrX 133609211 133609394 >>> 414125 ChrX_133609211_133609394_HPRT1 183 ChrX 133609211 133609394 >>> 414308 ChrX_133620495_133620560_HPRT1 65 ChrX 133620495 133620560 >>> 414373 ChrX_133620495_133620560_HPRT1 65 ChrX 133620495 133620560 >>> 414438 ChrX_133620692_133620696_HPRT1 4 ChrX 133620692 133620696 >>> 414442 ChrX_133624218_133624235_HPRT1 17 ChrX 133624218 133624235 >>> >>> >>> >>> -- >>> The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.