Here is a way of determining where the dups are:
> vec <- scan(textConnection(' "STAT1" "STAT1"
"STAT1" "STAT1" "GAPDH"
"GAPDH" "GAPDH"
+
+ "ACTB" "ACTB" "ACTB" "DDR1"
"RFC2" "HSPA6" "PAX8"
+ "GUCA1A" "UBE1L" "THRA" "PTPN21"
"CCL5" "CYP2E1" "STAT1"
+ "THRA" "PAX8"'), what='')
Read 23 items>
> # create a list of which ones are the same; if the length of the list is
greater> # than one, then it marks where the dups are
> dup <- split(seq(vec), vec)
>
> dup
$ACTB
[1] 8 9 10
$CCL5
[1] 19
$CYP2E1
[1] 20
$DDR1
[1] 11
$GAPDH
[1] 5 6 7
$GUCA1A
[1] 15
$HSPA6
[1] 13
$PAX8
[1] 14 23
$PTPN21
[1] 18
$RFC2
[1] 12
$STAT1
[1] 1 2 3 4 21
$THRA
[1] 17 22
$UBE1L
[1] 16>
On Thu, Jun 18, 2009 at 10:28 AM, njhuang86 <njhuang86@yahoo.com> wrote:
>
> Hi all,
>
> Suppose I have a vector like this:
>
> [1] "STAT1" "STAT1" "STAT1"
"STAT1" "GAPDH" "GAPDH" "GAPDH"
"ACTB"
> "ACTB"
> [10] "ACTB" "DDR1" "RFC2"
"HSPA6" "PAX8" "GUCA1A" "UBE1L"
"THRA"
> "PTPN21"
> [19] "CCL5" "CYP2E1" "STAT1"
"THRA" "PAX8"
>
> I would like to produce a vector such that it has the same length as the
> one
> above but it tells me where the duplicates are. So essentially, if I could
> represent each gene symbol as a specific number, and have the duplicates be
> the same number, that would be ideal. Right now, I'm using the unique
> command along with two nested for loops to do the job... But it's
really
> taking too long... Any suggestions would be greatly appreciated. Thank you!
> --
> View this message in context:
>
http://www.nabble.com/Any-method-to-speed-up-this-problem--tp24094164p24094164.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
>
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
[[alternative HTML version deleted]]