thr3ads.net - Ferret talk - [Ferret-talk] Parallal Building? [Nov 2006]

If this information is useful, please help other people find it:
Share via:

Holden Karau

2006-Nov-20 02:52 UTC

[Ferret-talk] Parallal Building?

I''m trying to index ~130,000 documents [soon to grow to about 500,000 
documents] and I''m wondering if its possible to combine ferret
databases
or in some other way split up the building process.

Normally, indexing 130k documents wouldn''t be that painful except that 
there are different types of links between these documents and they are 
not absolute (so for example doc a refers to a document b but there are 
multiple different documents laballed document a and document b and to 
prevent false links I have to use some fairly computationally intensive 
heuristics].

If its not possible to split up the building of a ferret index I''ll 
probably resolve the links into absolute links as a separate part of the 
process [which I can split up] and then build the ferret index one one 
machine after that.

-- 
Posted via http://www.ruby-forum.com/.

Jens Kraemer

2006-Nov-20 10:46 UTC

head link

[Ferret-talk] Parallal Building?

On Mon, Nov 20, 2006 at 03:52:21AM +0100, Holden Karau
wrote:> I''m trying to index ~130,000 documents [soon to grow to about
500,000
> documents] and I''m wondering if its possible to combine ferret
databases
> or in some other way split up the building process.
>
> Normally, indexing 130k documents wouldn''t be that painful except
that
> there are different types of links between these documents and they are 
> not absolute (so for example doc a refers to a document b but there are 
> multiple different documents laballed document a and document b and to 
> prevent false links I have to use some fairly computationally intensive 
> heuristics].
> 
> If its not possible to split up the building of a ferret index
I''ll
> probably resolve the links into absolute links as a separate part of the 
> process [which I can split up] and then build the ferret index one one 
> machine after that.
Only one process or thread may write to the index at once, so you''ll
have to serialize your writing to the index somehow, i.e. gathering the
data on two machines (or threads) and hand it over to the indexer.


Jens


-- 
webit! Gesellschaft f?r neue Medien mbH          www.webit.de
Dipl.-Wirtschaftsingenieur Jens Kr?mer       kraemer at webit.de
Schnorrstra?e 76                         Tel +49 351 46766  0
D-01069 Dresden                          Fax +49 351 46766 66

Patrick Ritchie

2006-Nov-20 14:04 UTC

head link

[Ferret-talk] Parallal Building?

Jens Kraemer wrote:> On Mon, Nov 20, 2006 at 03:52:21AM +0100, Holden Karau wrote:
>   
>> I''m trying to index ~130,000 documents [soon to grow to about
500,000
>> documents] and I''m wondering if its possible to combine ferret
databases
>> or in some other way split up the building process.
>>
>> Normally, indexing 130k documents wouldn''t be that painful
except that
>> there are different types of links between these documents and they are
>> not absolute (so for example doc a refers to a document b but there are
>> multiple different documents laballed document a and document b and to 
>> prevent false links I have to use some fairly computationally intensive
>> heuristics].
>>
>> If its not possible to split up the building of a ferret index
I''ll
>> probably resolve the links into absolute links as a separate part of
the
>> process [which I can split up] and then build the ferret index one one 
>> machine after that.
>>     
>
> Only one process or thread may write to the index at once, so
you''ll
> have to serialize your writing to the index somehow, i.e. gathering the
> data on two machines (or threads) and hand it over to the indexer.*Ferret newbie warning*

Shouldn''t it be possible to use the add_indexes method to merge one or 
more indexes?

http://ferret.davebalmain.com/api/classes/Ferret/Index/Index.html#M000035

Cheers!
Patrick
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/ferret-talk/attachments/20061120/1babd1fb/attachment.html

Jens Kraemer

2006-Nov-20 16:07 UTC

head link

[Ferret-talk] Parallal Building?

On Mon, Nov 20, 2006 at 09:04:21AM -0500, Patrick Ritchie
wrote:> Jens Kraemer wrote:
[..]> >Only one process or thread may write to the index at once, so
you''ll
> >have to serialize your writing to the index somehow, i.e. gathering the
> >data on two machines (or threads) and hand it over to the indexer.
> *Ferret newbie warning*
> 
> Shouldn''t it be possible to use the add_indexes method to merge
one or
> more indexes?
> 
> http://ferret.davebalmain.com/api/classes/Ferret/Index/Index.html#M000035
interesting :-)
I didn''t ever try this, so if you do please let me know how it worked.

Jens

-- 
webit! Gesellschaft f?r neue Medien mbH          www.webit.de
Dipl.-Wirtschaftsingenieur Jens Kr?mer       kraemer at webit.de
Schnorrstra?e 76                         Tel +49 351 46766  0
D-01069 Dresden                          Fax +49 351 46766 66

Patrick Ritchie

2006-Nov-20 16:55 UTC

head link

[Ferret-talk] Parallal Building?

Jens Kraemer wrote:> On Mon, Nov 20, 2006 at 09:04:21AM -0500, Patrick Ritchie wrote:
>   
>> Jens Kraemer wrote:
>>     
> [..]
>   
>>> Only one process or thread may write to the index at once, so
you''ll
>>> have to serialize your writing to the index somehow, i.e. gathering
the
>>> data on two machines (or threads) and hand it over to the indexer.
>>>       
>> *Ferret newbie warning*
>>
>> Shouldn''t it be possible to use the add_indexes method to
merge one or
>> more indexes?
>>
>>
http://ferret.davebalmain.com/api/classes/Ferret/Index/Index.html#M000035
>>     
>
> interesting :-)
> I didn''t ever try this, so if you do please let me know how it
worked.
>
> Jens
>
>   I just did the following in IRB:

i1 = Index.new
i2 = Index.new

i1 << {:text => ''one''}
i2 << {:text => ''two''}

i1.search_each("text:one") {|id, score| puts
"#{i1[id][:text]"}
=> "one"

i1.search_each("text:two") {|id, score| puts
"#{i1[id][:text]"}
=> nil

i1.add_indexes i2
i1.search_each("text:two") {|id, score| puts
"#{i1[id][:text]"}
=> "two"

Seems to work as advertised...

Cheers!
Patrick
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/ferret-talk/attachments/20061120/aca5a89f/attachment.html

Holden Karau

2006-Nov-20 19:47 UTC

head link

[Ferret-talk] Parallal Building?

Patrick Ritchie wrote:> Jens Kraemer wrote:
>>> prevent false links I have to use some fairly computationally
intensive
>> data on two machines (or threads) and hand it over to the indexer.
> *Ferret newbie warning*
> 
> Shouldn''t it be possible to use the add_indexes method to merge
one or
> more indexes?
> 
> http://ferret.davebalmain.com/api/classes/Ferret/Index/Index.html#M000035
> 
> Cheers!
> PatrickI can''t believe I missed that. I''ll give it a shot sometime
over the
weekend, thanks :-)

-- 
Posted via http://www.ruby-forum.com/.

Possibly Parallel Threads

Search for more reasonably related threads

Ferret talk - Nov 2006 - Parallal Building?

[Ferret-talk] Parallal Building?

[Ferret-talk] Parallal Building?

[Ferret-talk] Parallal Building?

[Ferret-talk] Parallal Building?

[Ferret-talk] Parallal Building?

[Ferret-talk] Parallal Building?

Possibly Parallel Threads