Dear all, Suppose I have the following vector as repository:> repo <- c("AAA", "AAT", "AAC", "AAG", "ATA","ATT")Given another query vector> qr <- c("AAC", "ATT")is there a way I can find the query index in repository in a fast way. Giving: [1] 3 6 Typically the size of repo is around ~12million element, and query around ~1 million element. - Gundala Viswanath Jakarta - Indonesia
Hi Jorge and all,
How can I modified your code when
query size can be bigger than repository,
meaning that it can contain repeats.
e.g. qr <- c("AAC", "ATT",
"ATT","AAC", "ATT", "ATT",
"AAT", "ATT", "ATT", )
Sorry, I should have mentioned this earlier.
- Gundala Viswanath
Jakarta - Indonesia
On Tue, Jan 13, 2009 at 11:11 AM, Jorge Ivan Velez
<jorgeivanvelez at gmail.com> wrote:>
> Perhaps
> which(repo%in%qr)
> ?
> HTH,
>
> Jorge
>
>
> On Mon, Jan 12, 2009 at 9:07 PM, Gundala Viswanath <gundalav at
gmail.com>
> wrote:
>>
>> Dear all,
>>
>> Suppose I have the following vector as repository:
>>
>> > repo <- c("AAA", "AAT", "AAC",
"AAG", "ATA","ATT")
>>
>> Given another query vector
>>
>> > qr <- c("AAC", "ATT")
>>
>> is there a way I can find the query index in repository in a fast way.
>>
>> Giving:
>>
>> [1] 3 6
>>
>> Typically the size of repo is around ~12million element, and
>> query around ~1 million element.
>>
>>
>> - Gundala Viswanath
>> Jakarta - Indonesia
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
Is this what you want:> repo <- c("AAA", "AAT", "AAC", "AAG", "ATA","ATT") > qr <- c("AAC", "ATT", "ATT","AAC", "ATT", "ATT", "AAT", "ATT", "ATT") > match(qr, repo)[1] 3 6 6 3 6 6 2 6 6>On Mon, Jan 12, 2009 at 9:22 PM, Gundala Viswanath <gundalav at gmail.com> wrote:> Hi Jorge and all, > > How can I modified your code when > > query size can be bigger than repository, > meaning that it can contain repeats. > > e.g. qr <- c("AAC", "ATT", "ATT","AAC", "ATT", "ATT", "AAT", "ATT", "ATT", ) > > > Sorry, I should have mentioned this earlier. > > > - Gundala Viswanath > Jakarta - Indonesia > > > > On Tue, Jan 13, 2009 at 11:11 AM, Jorge Ivan Velez > <jorgeivanvelez at gmail.com> wrote: >> >> Perhaps >> which(repo%in%qr) >> ? >> HTH, >> >> Jorge >> >> >> On Mon, Jan 12, 2009 at 9:07 PM, Gundala Viswanath <gundalav at gmail.com> >> wrote: >>> >>> Dear all, >>> >>> Suppose I have the following vector as repository: >>> >>> > repo <- c("AAA", "AAT", "AAC", "AAG", "ATA","ATT") >>> >>> Given another query vector >>> >>> > qr <- c("AAC", "ATT") >>> >>> is there a way I can find the query index in repository in a fast way. >>> >>> Giving: >>> >>> [1] 3 6 >>> >>> Typically the size of repo is around ~12million element, and >>> query around ~1 million element. >>> >>> >>> - Gundala Viswanath >>> Jakarta - Indonesia >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Yes Jim, exactly.
BTW, I found from ?match
" Matching for lists is potentially very slow and best avoided
except in simple cases."
Since I am doing this for million of tags. Is there a faster alternatives?
- Gundala Viswanath
Jakarta - Indonesia
On Tue, Jan 13, 2009 at 12:14 PM, jim holtman <jholtman at gmail.com>
wrote:> Is this what you want:
>
>> repo <- c("AAA", "AAT", "AAC",
"AAG", "ATA","ATT")
>> qr <- c("AAC", "ATT",
"ATT","AAC", "ATT", "ATT",
"AAT", "ATT", "ATT")
>> match(qr, repo)
> [1] 3 6 6 3 6 6 2 6 6
>>
>
>
>
> On Mon, Jan 12, 2009 at 9:22 PM, Gundala Viswanath <gundalav at
gmail.com> wrote:
>> Hi Jorge and all,
>>
>> How can I modified your code when
>>
>> query size can be bigger than repository,
>> meaning that it can contain repeats.
>>
>> e.g. qr <- c("AAC", "ATT",
"ATT","AAC", "ATT", "ATT",
"AAT", "ATT", "ATT", )
>>
>>
>> Sorry, I should have mentioned this earlier.
>>
>>
>> - Gundala Viswanath
>> Jakarta - Indonesia
>>
>>
>>
>> On Tue, Jan 13, 2009 at 11:11 AM, Jorge Ivan Velez
>> <jorgeivanvelez at gmail.com> wrote:
>>>
>>> Perhaps
>>> which(repo%in%qr)
>>> ?
>>> HTH,
>>>
>>> Jorge
>>>
>>>
>>> On Mon, Jan 12, 2009 at 9:07 PM, Gundala Viswanath <gundalav at
gmail.com>
>>> wrote:
>>>>
>>>> Dear all,
>>>>
>>>> Suppose I have the following vector as repository:
>>>>
>>>> > repo <- c("AAA", "AAT",
"AAC", "AAG", "ATA","ATT")
>>>>
>>>> Given another query vector
>>>>
>>>> > qr <- c("AAC", "ATT")
>>>>
>>>> is there a way I can find the query index in repository in a
fast way.
>>>>
>>>> Giving:
>>>>
>>>> [1] 3 6
>>>>
>>>> Typically the size of repo is around ~12million element, and
>>>> query around ~1 million element.
>>>>
>>>>
>>>> - Gundala Viswanath
>>>> Jakarta - Indonesia
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>
Is this fast enough for you; matches of 2000 against 2M tags takes 0.2 seconds:> str(x)chr [1:2000] "EAEDC" "DACCD" "BEAAD" "CDDDA" "ABDCA" "ACACC" "DADAA" "ABCAD" ...> str(z)chr [1:2000000] "EAEDC" "DACCD" "BEAAD" "CDDDA" "ABDCA" "ACACC" "DADAA" "ABCAD" ...> system.time(y <- match(x,z))user system elapsed 0.2 0.0 0.2> str(y)int [1:2000] 1 2 3 4 5 6 7 8 9 10 ...>On Mon, Jan 12, 2009 at 10:17 PM, Gundala Viswanath <gundalav at gmail.com> wrote:> Yes Jim, exactly. > > BTW, I found from ?match > > " Matching for lists is potentially very slow and best avoided > except in simple cases." > > Since I am doing this for million of tags. Is there a faster alternatives? > > > - Gundala Viswanath > Jakarta - Indonesia > > > > On Tue, Jan 13, 2009 at 12:14 PM, jim holtman <jholtman at gmail.com> wrote: >> Is this what you want: >> >>> repo <- c("AAA", "AAT", "AAC", "AAG", "ATA","ATT") >>> qr <- c("AAC", "ATT", "ATT","AAC", "ATT", "ATT", "AAT", "ATT", "ATT") >>> match(qr, repo) >> [1] 3 6 6 3 6 6 2 6 6 >>> >> >> >> >> On Mon, Jan 12, 2009 at 9:22 PM, Gundala Viswanath <gundalav at gmail.com> wrote: >>> Hi Jorge and all, >>> >>> How can I modified your code when >>> >>> query size can be bigger than repository, >>> meaning that it can contain repeats. >>> >>> e.g. qr <- c("AAC", "ATT", "ATT","AAC", "ATT", "ATT", "AAT", "ATT", "ATT", ) >>> >>> >>> Sorry, I should have mentioned this earlier. >>> >>> >>> - Gundala Viswanath >>> Jakarta - Indonesia >>> >>> >>> >>> On Tue, Jan 13, 2009 at 11:11 AM, Jorge Ivan Velez >>> <jorgeivanvelez at gmail.com> wrote: >>>> >>>> Perhaps >>>> which(repo%in%qr) >>>> ? >>>> HTH, >>>> >>>> Jorge >>>> >>>> >>>> On Mon, Jan 12, 2009 at 9:07 PM, Gundala Viswanath <gundalav at gmail.com> >>>> wrote: >>>>> >>>>> Dear all, >>>>> >>>>> Suppose I have the following vector as repository: >>>>> >>>>> > repo <- c("AAA", "AAT", "AAC", "AAG", "ATA","ATT") >>>>> >>>>> Given another query vector >>>>> >>>>> > qr <- c("AAC", "ATT") >>>>> >>>>> is there a way I can find the query index in repository in a fast way. >>>>> >>>>> Giving: >>>>> >>>>> [1] 3 6 >>>>> >>>>> Typically the size of repo is around ~12million element, and >>>>> query around ~1 million element. >>>>> >>>>> >>>>> - Gundala Viswanath >>>>> Jakarta - Indonesia >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> Jim Holtman >> Cincinnati, OH >> +1 513 646 9390 >> >> What is the problem that you are trying to solve? >> >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Thanks for the info, Jim. - GV On Tue, Jan 13, 2009 at 12:27 PM, jim holtman <jholtman at gmail.com> wrote:> Is this fast enough for you; matches of 2000 against 2M tags takes 0.2 seconds: > >> str(x) > chr [1:2000] "EAEDC" "DACCD" "BEAAD" "CDDDA" "ABDCA" "ACACC" "DADAA" > "ABCAD" ... >> str(z) > chr [1:2000000] "EAEDC" "DACCD" "BEAAD" "CDDDA" "ABDCA" "ACACC" > "DADAA" "ABCAD" ... >> system.time(y <- match(x,z)) > user system elapsed > 0.2 0.0 0.2 >> str(y) > int [1:2000] 1 2 3 4 5 6 7 8 9 10 ... >> > > > > On Mon, Jan 12, 2009 at 10:17 PM, Gundala Viswanath <gundalav at gmail.com> wrote: >> Yes Jim, exactly. >> >> BTW, I found from ?match >> >> " Matching for lists is potentially very slow and best avoided >> except in simple cases." >> >> Since I am doing this for million of tags. Is there a faster alternatives? >> >> >> - Gundala Viswanath >> Jakarta - Indonesia >> >> >> >> On Tue, Jan 13, 2009 at 12:14 PM, jim holtman <jholtman at gmail.com> wrote: >>> Is this what you want: >>> >>>> repo <- c("AAA", "AAT", "AAC", "AAG", "ATA","ATT") >>>> qr <- c("AAC", "ATT", "ATT","AAC", "ATT", "ATT", "AAT", "ATT", "ATT") >>>> match(qr, repo) >>> [1] 3 6 6 3 6 6 2 6 6 >>>> >>> >>> >>> >>> On Mon, Jan 12, 2009 at 9:22 PM, Gundala Viswanath <gundalav at gmail.com> wrote: >>>> Hi Jorge and all, >>>> >>>> How can I modified your code when >>>> >>>> query size can be bigger than repository, >>>> meaning that it can contain repeats. >>>> >>>> e.g. qr <- c("AAC", "ATT", "ATT","AAC", "ATT", "ATT", "AAT", "ATT", "ATT", ) >>>> >>>> >>>> Sorry, I should have mentioned this earlier. >>>> >>>> >>>> - Gundala Viswanath >>>> Jakarta - Indonesia >>>> >>>> >>>> >>>> On Tue, Jan 13, 2009 at 11:11 AM, Jorge Ivan Velez >>>> <jorgeivanvelez at gmail.com> wrote: >>>>> >>>>> Perhaps >>>>> which(repo%in%qr) >>>>> ? >>>>> HTH, >>>>> >>>>> Jorge >>>>> >>>>> >>>>> On Mon, Jan 12, 2009 at 9:07 PM, Gundala Viswanath <gundalav at gmail.com> >>>>> wrote: >>>>>> >>>>>> Dear all, >>>>>> >>>>>> Suppose I have the following vector as repository: >>>>>> >>>>>> > repo <- c("AAA", "AAT", "AAC", "AAG", "ATA","ATT") >>>>>> >>>>>> Given another query vector >>>>>> >>>>>> > qr <- c("AAC", "ATT") >>>>>> >>>>>> is there a way I can find the query index in repository in a fast way. >>>>>> >>>>>> Giving: >>>>>> >>>>>> [1] 3 6 >>>>>> >>>>>> Typically the size of repo is around ~12million element, and >>>>>> query around ~1 million element. >>>>>> >>>>>> >>>>>> - Gundala Viswanath >>>>>> Jakarta - Indonesia >>>>>> >>>>>> ______________________________________________ >>>>>> R-help at r-project.org mailing list >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>> PLEASE do read the posting guide >>>>>> http://www.R-project.org/posting-guide.html >>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>>> >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> >>> >>> -- >>> Jim Holtman >>> Cincinnati, OH >>> +1 513 646 9390 >>> >>> What is the problem that you are trying to solve? >>> >> > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem that you are trying to solve? >