thr3ads.net - R help - [R] Fast way to finding index in Vector [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Gundala Viswanath

2009-Jan-13 02:07 UTC

[R] Fast way to finding index in Vector

Dear all,

Suppose I have the following vector as repository:
> repo <- c("AAA", "AAT", "AAC",
"AAG", "ATA","ATT")
Given another query vector
> qr <- c("AAC", "ATT")
is there a way I can find the query index in repository in a fast way.

Giving:

[1] 3 6

Typically the size of  repo is around ~12million element, and
query around ~1 million element.


- Gundala Viswanath
Jakarta - Indonesia

Gundala Viswanath

2009-Jan-13 02:22 UTC

head link

[R] Fast way to finding index in Vector

Hi Jorge and all,

How can I modified your code when

query size can be bigger than repository,
meaning that it can contain repeats.

e.g. qr <- c("AAC", "ATT",
"ATT","AAC", "ATT", "ATT",
"AAT", "ATT", "ATT",  )


Sorry, I should have mentioned this earlier.


- Gundala Viswanath
Jakarta - Indonesia



On Tue, Jan 13, 2009 at 11:11 AM, Jorge Ivan Velez
<jorgeivanvelez at gmail.com> wrote:>
> Perhaps
> which(repo%in%qr)
> ?
> HTH,
>
> Jorge
>
>
> On Mon, Jan 12, 2009 at 9:07 PM, Gundala Viswanath <gundalav at
gmail.com>
> wrote:
>>
>> Dear all,
>>
>> Suppose I have the following vector as repository:
>>
>> > repo <- c("AAA", "AAT", "AAC",
"AAG", "ATA","ATT")
>>
>> Given another query vector
>>
>> > qr <- c("AAC", "ATT")
>>
>> is there a way I can find the query index in repository in a fast way.
>>
>> Giving:
>>
>> [1] 3 6
>>
>> Typically the size of  repo is around ~12million element, and
>> query around ~1 million element.
>>
>>
>> - Gundala Viswanath
>> Jakarta - Indonesia
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

jim holtman

2009-Jan-13 03:14 UTC

head link

[R] Fast way to finding index in Vector

Is this what you want:
> repo <- c("AAA", "AAT", "AAC",
"AAG", "ATA","ATT")
> qr <- c("AAC", "ATT",
"ATT","AAC", "ATT", "ATT",
"AAT", "ATT", "ATT")
> match(qr, repo)
[1] 3 6 6 3 6 6 2 6 6>


On Mon, Jan 12, 2009 at 9:22 PM, Gundala Viswanath <gundalav at gmail.com>
wrote:> Hi Jorge and all,
>
> How can I modified your code when
>
> query size can be bigger than repository,
> meaning that it can contain repeats.
>
> e.g. qr <- c("AAC", "ATT",
"ATT","AAC", "ATT", "ATT",
"AAT", "ATT", "ATT",  )
>
>
> Sorry, I should have mentioned this earlier.
>
>
> - Gundala Viswanath
> Jakarta - Indonesia
>
>
>
> On Tue, Jan 13, 2009 at 11:11 AM, Jorge Ivan Velez
> <jorgeivanvelez at gmail.com> wrote:
>>
>> Perhaps
>> which(repo%in%qr)
>> ?
>> HTH,
>>
>> Jorge
>>
>>
>> On Mon, Jan 12, 2009 at 9:07 PM, Gundala Viswanath <gundalav at
gmail.com>
>> wrote:
>>>
>>> Dear all,
>>>
>>> Suppose I have the following vector as repository:
>>>
>>> > repo <- c("AAA", "AAT",
"AAC", "AAG", "ATA","ATT")
>>>
>>> Given another query vector
>>>
>>> > qr <- c("AAC", "ATT")
>>>
>>> is there a way I can find the query index in repository in a fast
way.
>>>
>>> Giving:
>>>
>>> [1] 3 6
>>>
>>> Typically the size of  repo is around ~12million element, and
>>> query around ~1 million element.
>>>
>>>
>>> - Gundala Viswanath
>>> Jakarta - Indonesia
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

Gundala Viswanath

2009-Jan-13 03:17 UTC

head link

[R] Fast way to finding index in Vector

Yes Jim, exactly.

BTW, I found from ?match

" Matching for lists is potentially very slow and best avoided
     except in simple cases."

Since I am doing this for million of tags. Is there a faster alternatives?


- Gundala Viswanath
Jakarta - Indonesia



On Tue, Jan 13, 2009 at 12:14 PM, jim holtman <jholtman at gmail.com>
wrote:> Is this what you want:
>
>> repo <- c("AAA", "AAT", "AAC",
"AAG", "ATA","ATT")
>> qr <- c("AAC", "ATT",
"ATT","AAC", "ATT", "ATT",
"AAT", "ATT", "ATT")
>> match(qr, repo)
> [1] 3 6 6 3 6 6 2 6 6
>>
>
>
>
> On Mon, Jan 12, 2009 at 9:22 PM, Gundala Viswanath <gundalav at
gmail.com> wrote:
>> Hi Jorge and all,
>>
>> How can I modified your code when
>>
>> query size can be bigger than repository,
>> meaning that it can contain repeats.
>>
>> e.g. qr <- c("AAC", "ATT",
"ATT","AAC", "ATT", "ATT",
"AAT", "ATT", "ATT",  )
>>
>>
>> Sorry, I should have mentioned this earlier.
>>
>>
>> - Gundala Viswanath
>> Jakarta - Indonesia
>>
>>
>>
>> On Tue, Jan 13, 2009 at 11:11 AM, Jorge Ivan Velez
>> <jorgeivanvelez at gmail.com> wrote:
>>>
>>> Perhaps
>>> which(repo%in%qr)
>>> ?
>>> HTH,
>>>
>>> Jorge
>>>
>>>
>>> On Mon, Jan 12, 2009 at 9:07 PM, Gundala Viswanath <gundalav at
gmail.com>
>>> wrote:
>>>>
>>>> Dear all,
>>>>
>>>> Suppose I have the following vector as repository:
>>>>
>>>> > repo <- c("AAA", "AAT",
"AAC", "AAG", "ATA","ATT")
>>>>
>>>> Given another query vector
>>>>
>>>> > qr <- c("AAC", "ATT")
>>>>
>>>> is there a way I can find the query index in repository in a
fast way.
>>>>
>>>> Giving:
>>>>
>>>> [1] 3 6
>>>>
>>>> Typically the size of  repo is around ~12million element, and
>>>> query around ~1 million element.
>>>>
>>>>
>>>> - Gundala Viswanath
>>>> Jakarta - Indonesia
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>

jim holtman

2009-Jan-13 03:27 UTC

head link

[R] Fast way to finding index in Vector

Is this fast enough for you; matches of 2000 against 2M tags takes 0.2 seconds:
> str(x) chr [1:2000] "EAEDC" "DACCD" "BEAAD"
"CDDDA" "ABDCA" "ACACC" "DADAA"
"ABCAD" ...> str(z) chr [1:2000000] "EAEDC" "DACCD" "BEAAD"
"CDDDA" "ABDCA" "ACACC"
"DADAA" "ABCAD" ...> system.time(y <- match(x,z))   user  system elapsed
    0.2     0.0     0.2> str(y)
 int [1:2000] 1 2 3 4 5 6 7 8 9 10 ...>


On Mon, Jan 12, 2009 at 10:17 PM, Gundala Viswanath <gundalav at
gmail.com> wrote:> Yes Jim, exactly.
>
> BTW, I found from ?match
>
> " Matching for lists is potentially very slow and best avoided
>     except in simple cases."
>
> Since I am doing this for million of tags. Is there a faster alternatives?
>
>
> - Gundala Viswanath
> Jakarta - Indonesia
>
>
>
> On Tue, Jan 13, 2009 at 12:14 PM, jim holtman <jholtman at gmail.com>
wrote:
>> Is this what you want:
>>
>>> repo <- c("AAA", "AAT", "AAC",
"AAG", "ATA","ATT")
>>> qr <- c("AAC", "ATT",
"ATT","AAC", "ATT", "ATT",
"AAT", "ATT", "ATT")
>>> match(qr, repo)
>> [1] 3 6 6 3 6 6 2 6 6
>>>
>>
>>
>>
>> On Mon, Jan 12, 2009 at 9:22 PM, Gundala Viswanath <gundalav at
gmail.com> wrote:
>>> Hi Jorge and all,
>>>
>>> How can I modified your code when
>>>
>>> query size can be bigger than repository,
>>> meaning that it can contain repeats.
>>>
>>> e.g. qr <- c("AAC", "ATT",
"ATT","AAC", "ATT", "ATT",
"AAT", "ATT", "ATT",  )
>>>
>>>
>>> Sorry, I should have mentioned this earlier.
>>>
>>>
>>> - Gundala Viswanath
>>> Jakarta - Indonesia
>>>
>>>
>>>
>>> On Tue, Jan 13, 2009 at 11:11 AM, Jorge Ivan Velez
>>> <jorgeivanvelez at gmail.com> wrote:
>>>>
>>>> Perhaps
>>>> which(repo%in%qr)
>>>> ?
>>>> HTH,
>>>>
>>>> Jorge
>>>>
>>>>
>>>> On Mon, Jan 12, 2009 at 9:07 PM, Gundala Viswanath <gundalav
at gmail.com>
>>>> wrote:
>>>>>
>>>>> Dear all,
>>>>>
>>>>> Suppose I have the following vector as repository:
>>>>>
>>>>> > repo <- c("AAA", "AAT",
"AAC", "AAG", "ATA","ATT")
>>>>>
>>>>> Given another query vector
>>>>>
>>>>> > qr <- c("AAC", "ATT")
>>>>>
>>>>> is there a way I can find the query index in repository in
a fast way.
>>>>>
>>>>> Giving:
>>>>>
>>>>> [1] 3 6
>>>>>
>>>>> Typically the size of  repo is around ~12million element,
and
>>>>> query around ~1 million element.
>>>>>
>>>>>
>>>>> - Gundala Viswanath
>>>>> Jakarta - Indonesia
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>
>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>>
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

Gundala Viswanath

2009-Jan-13 04:33 UTC

head link

[R] Fast way to finding index in Vector

Thanks for the info, Jim.

- GV



On Tue, Jan 13, 2009 at 12:27 PM, jim holtman <jholtman at gmail.com>
wrote:> Is this fast enough for you; matches of 2000 against 2M tags takes 0.2
seconds:
>
>> str(x)
>  chr [1:2000] "EAEDC" "DACCD" "BEAAD"
"CDDDA" "ABDCA" "ACACC" "DADAA"
> "ABCAD" ...
>> str(z)
>  chr [1:2000000] "EAEDC" "DACCD" "BEAAD"
"CDDDA" "ABDCA" "ACACC"
> "DADAA" "ABCAD" ...
>> system.time(y <- match(x,z))
>   user  system elapsed
>    0.2     0.0     0.2
>> str(y)
>  int [1:2000] 1 2 3 4 5 6 7 8 9 10 ...
>>
>
>
>
> On Mon, Jan 12, 2009 at 10:17 PM, Gundala Viswanath <gundalav at
gmail.com> wrote:
>> Yes Jim, exactly.
>>
>> BTW, I found from ?match
>>
>> " Matching for lists is potentially very slow and best avoided
>>     except in simple cases."
>>
>> Since I am doing this for million of tags. Is there a faster
alternatives?
>>
>>
>> - Gundala Viswanath
>> Jakarta - Indonesia
>>
>>
>>
>> On Tue, Jan 13, 2009 at 12:14 PM, jim holtman <jholtman at
gmail.com> wrote:
>>> Is this what you want:
>>>
>>>> repo <- c("AAA", "AAT", "AAC",
"AAG", "ATA","ATT")
>>>> qr <- c("AAC", "ATT",
"ATT","AAC", "ATT", "ATT",
"AAT", "ATT", "ATT")
>>>> match(qr, repo)
>>> [1] 3 6 6 3 6 6 2 6 6
>>>>
>>>
>>>
>>>
>>> On Mon, Jan 12, 2009 at 9:22 PM, Gundala Viswanath <gundalav at
gmail.com> wrote:
>>>> Hi Jorge and all,
>>>>
>>>> How can I modified your code when
>>>>
>>>> query size can be bigger than repository,
>>>> meaning that it can contain repeats.
>>>>
>>>> e.g. qr <- c("AAC", "ATT",
"ATT","AAC", "ATT", "ATT",
"AAT", "ATT", "ATT",  )
>>>>
>>>>
>>>> Sorry, I should have mentioned this earlier.
>>>>
>>>>
>>>> - Gundala Viswanath
>>>> Jakarta - Indonesia
>>>>
>>>>
>>>>
>>>> On Tue, Jan 13, 2009 at 11:11 AM, Jorge Ivan Velez
>>>> <jorgeivanvelez at gmail.com> wrote:
>>>>>
>>>>> Perhaps
>>>>> which(repo%in%qr)
>>>>> ?
>>>>> HTH,
>>>>>
>>>>> Jorge
>>>>>
>>>>>
>>>>> On Mon, Jan 12, 2009 at 9:07 PM, Gundala Viswanath
<gundalav at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> Suppose I have the following vector as repository:
>>>>>>
>>>>>> > repo <- c("AAA", "AAT",
"AAC", "AAG", "ATA","ATT")
>>>>>>
>>>>>> Given another query vector
>>>>>>
>>>>>> > qr <- c("AAC", "ATT")
>>>>>>
>>>>>> is there a way I can find the query index in repository
in a fast way.
>>>>>>
>>>>>> Giving:
>>>>>>
>>>>>> [1] 3 6
>>>>>>
>>>>>> Typically the size of  repo is around ~12million
element, and
>>>>>> query around ~1 million element.
>>>>>>
>>>>>>
>>>>>> - Gundala Viswanath
>>>>>> Jakarta - Indonesia
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>>
>>>>>
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>>
>>>
>>>
>>>
>>> --
>>> Jim Holtman
>>> Cincinnati, OH
>>> +1 513 646 9390
>>>
>>> What is the problem that you are trying to solve?
>>>
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>

R help - Jan 2009 - Fast way to finding index in Vector

[R] Fast way to finding index in Vector

[R] Fast way to finding index in Vector

[R] Fast way to finding index in Vector

[R] Fast way to finding index in Vector

[R] Fast way to finding index in Vector

[R] Fast way to finding index in Vector