thr3ads.net - R help - [R] Getting a list of unique gene names from a list with semi-colons [Jan 2012]

If this information is useful, please help other people find it:
Share via:

Kurinji Pandiyan

2012-Jan-07 02:05 UTC

[R] Getting a list of unique gene names from a list with semi-colons

Hello,

I have one column in my dataframe that has gene names of interest.
Unfortunately, due to the fact that some probes lie between two genes or
two transcripts of a gene, it looks something like this -

  FAM81A  LOC283050;LOC283050;LOC283050;ZMIZ1  PINK1;PINK1  MRPL12;MRPL12
C1orf114  MMS19;UBTD1
I would like to know how to get a list with all the names with no
semi-colons and removing the replicates. I would like the end result to
look like -

FAM81A
LOC283050
ZMIZI
PINK1
MRPL12
C1orf114
MMS19
UBTD1

Thanks a lot for your help!
Kurinji

	[[alternative HTML version deleted]]

R. Michael Weylandt <michael.weylandt@gmail.com>

2012-Jan-07 02:17 UTC

head link

[R] Getting a list of unique gene names from a list with semi-colons

I think you can do this with something like this (untested):

unique(unlist(strsplit(XXX, ",")))

Michael

On Jan 6, 2012, at 8:05 PM, Kurinji Pandiyan <kurinji.pandiyan at
gmail.com> wrote:
> Hello,
> 
> I have one column in my dataframe that has gene names of interest.
> Unfortunately, due to the fact that some probes lie between two genes or
> two transcripts of a gene, it looks something like this -
> 
>  FAM81A  LOC283050;LOC283050;LOC283050;ZMIZ1  PINK1;PINK1  MRPL12;MRPL12
> C1orf114  MMS19;UBTD1
> I would like to know how to get a list with all the names with no
> semi-colons and removing the replicates. I would like the end result to
> look like -
> 
> FAM81A
> LOC283050
> ZMIZI
> PINK1
> MRPL12
> C1orf114
> MMS19
> UBTD1
> 
> Thanks a lot for your help!
> Kurinji
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Gabor Grothendieck

2012-Jan-07 02:25 UTC

head link

[R] Getting a list of unique gene names from a list with semi-colons

On Fri, Jan 6, 2012 at 9:05 PM, Kurinji Pandiyan
<kurinji.pandiyan at gmail.com> wrote:> Hello,
>
> I have one column in my dataframe that has gene names of interest.
> Unfortunately, due to the fact that some probes lie between two genes or
> two transcripts of a gene, it looks something like this -
>
> ?FAM81A ?LOC283050;LOC283050;LOC283050;ZMIZ1 ?PINK1;PINK1 ?MRPL12;MRPL12
> C1orf114 ?MMS19;UBTD1
> I would like to know how to get a list with all the names with no
> semi-colons and removing the replicates. I would like the end result to
> look like -
>
> FAM81A
> LOC283050
> ZMIZI
> PINK1
> MRPL12
> C1orf114
> MMS19
> UBTD1
>
> Thanks a lot for your help!
> Kurinji
>
This uses strapply in gsubfn:

x <- "FAM81A  LOC283050;LOC283050;LOC283050;ZMIZ1  PINK1;PINK1"
library(gsubfn)
unique(strapply(x, "\\w+", c)[[1]])

If x is very long then there is a high speed version of strapply
specialized to using c called strapplyc in the development version of
gsubfn. For example, see this example of extracting 275,000 words from
a novel:

https://groups.google.com/group/corpling-with-r/msg/b85f7ff917cccb5d?dmode=source&output=gplain&noredirect&pli=1





-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

R. Michael Weylandt <michael.weylandt@gmail.com>

2012-Jan-07 02:29 UTC

head link

[R] Getting a list of unique gene names from a list with semi-colons

Sorry. - that should be a semi-colon below. 

Michael Weylandt

On Jan 6, 2012, at 8:17 PM, "R. Michael Weylandt <michael.weylandt at
gmail.com>" <michael.weylandt at gmail.com> wrote:
> I think you can do this with something like this (untested):
> 
> unique(unlist(strsplit(XXX, ",")))
> 
> Michael
> 
> On Jan 6, 2012, at 8:05 PM, Kurinji Pandiyan <kurinji.pandiyan at
gmail.com> wrote:
> 
>> Hello,
>> 
>> I have one column in my dataframe that has gene names of interest.
>> Unfortunately, due to the fact that some probes lie between two genes
or
>> two transcripts of a gene, it looks something like this -
>> 
>> FAM81A  LOC283050;LOC283050;LOC283050;ZMIZ1  PINK1;PINK1  MRPL12;MRPL12
>> C1orf114  MMS19;UBTD1
>> I would like to know how to get a list with all the names with no
>> semi-colons and removing the replicates. I would like the end result to
>> look like -
>> 
>> FAM81A
>> LOC283050
>> ZMIZI
>> PINK1
>> MRPL12
>> C1orf114
>> MMS19
>> UBTD1
>> 
>> Thanks a lot for your help!
>> Kurinji
>> 
>>   [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

R help - Jan 2012 - Getting a list of unique gene names from a list with semi-colons

[R] Getting a list of unique gene names from a list with semi-colons

[R] Getting a list of unique gene names from a list with semi-colons

[R] Getting a list of unique gene names from a list with semi-colons

[R] Getting a list of unique gene names from a list with semi-colons

Maybe Matching Threads