thr3ads.net - Ferret talk - [Ferret-talk] Need some information about Ferret [Nov 2008]

If this information is useful, please help other people find it:
Share via:

Lyes Amazouz

2008-Nov-30 08:49 UTC

[Ferret-talk] Need some information about Ferret

Hi everybody!

In our company, we want to use Ferret as the main index/search engine of our
applications. And we are looking for some testimonies about how Ferret is
efficient when deployed in production.

* Was Ferret already deployed in production in some companies? is there some
testimonies about that?

* What is the maximum number of documents we can index with ferret? Has some
one informations about that.

* What is the best way to access a very huge Ferret Index? May we distribute
it on several machines or not?

By the way, can Ferret read Solr indexes as they are both clones of luceen?

thank you

-- 
========== |   Lyes Amazouz
 |   USTHB, Algiers
==========-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/ferret-talk/attachments/20081130/a8cad163/attachment.html>

Erik Hatcher

2008-Nov-30 09:48 UTC

head link

[Ferret-talk] Need some information about Ferret

On Nov 30, 2008, at 3:49 AM, Lyes Amazouz wrote:> By the way, can Ferret read Solr indexes as they are both clones of  
> luceen?
No.  While Ferret was designed around the Lucene index file format, it  
is not compatible with Java Lucene (and thus Solr).

	Erik

Lyes Amazouz

2008-Nov-30 15:54 UTC

head link

[Ferret-talk] Need some information about Ferret

On Sun, Nov 30, 2008 at 10:48 AM, Erik Hatcher
<erik at ehatchersolutions.com>wrote:
>
> On Nov 30, 2008, at 3:49 AM, Lyes Amazouz wrote:
>
>> By the way, can Ferret read Solr indexes as they are both clones of
>> luceen?
>>
>
> No.  While Ferret was designed around the Lucene index file format, it is
> not compatible with Java Lucene (and thus Solr).
>
>        Erik
>
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk
>

Hello Eik!

thank you for the information. But is there a mean to recover an existing
Solr index content and reindex it with Ferret?





-- 
========== |   Lyes Amazouz
 |   USTHB, Algiers
==========-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/ferret-talk/attachments/20081130/062b8674/attachment.html>

Erik Hatcher

2008-Nov-30 17:04 UTC

head link

[Ferret-talk] Need some information about Ferret

On Nov 30, 2008, at 10:54 AM, Lyes Amazouz wrote:>
> thank you for the information. But is there a mean to recover an  
> existing Solr index content and reindex it with Ferret?
It''ll probably be easier and faster to reindex your original content,  
which presumably you still have handy.  But... you''d have to have your
fields "stored" in Solr for them to be recoverable.  Using
solr-ruby''s
Solr::Importer::SolrSource would makes it easy to iterate over all  
documents in Solr (using a query of *:*).

But why move from Solr to Ferret?

	Erik

Lyes Amazouz

2008-Dec-01 07:52 UTC

head link

[Ferret-talk] Need some information about Ferret

Hello

But why move from Solr to Ferret?>
> We found that the search and the indexation with Solr was too slow, and we
decided to find another alternative. Ferret seems to be a good choice. We
tried Ferret on some examples and we found that it was better.
-- 
========== |   Lyes Amazouz
 |   USTHB, Algiers
==========-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/ferret-talk/attachments/20081201/942c1c31/attachment.html>

Erik Hatcher

2008-Dec-01 08:45 UTC

head link

[Ferret-talk] Need some information about Ferret

On Dec 1, 2008, at 2:52 AM, Lyes Amazouz wrote:
> Hello
>
> But why move from Solr to Ferret?
>
>
>  We found that the search and the indexation with Solr was too slow,  
> and we decided to find another alternative. Ferret seems to be a  
> good choice. We tried Ferret on some examples and we found that it  
> was better.
Thanks for the feedback.  If you don''t mind elaborating further, what  
kind of documents are you indexing (database rows?  file system  
files?  other?), how many documents do you have, and how are you  
indexing it?

Thanks,
	Erik

Lyes Amazouz

2008-Dec-01 10:36 UTC

head link

[Ferret-talk] Need some information about Ferret

Hello Erik

Thanks for the feedback.  If you don''t mind elaborating further, what
kind> of documents are you indexing (database rows?  file system files?  other?),
> how many documents do you have, and how are you indexing it?
>
> Thanks,
>
>        Erik
>

  Now, we are indexing  file system files varying from HTML pages (85%) to
IMAGES (10%) (We index Meta information here), PDF(2%) WORD (2%) and PURE
TEXT (1%), we have 100 000 000 documents to index (10%) is already done. And
for the last question, I didn''t exactly understand what do you mean by
"How
we are indexing", What I can say is that before we index non full text
documents (like PDF, WORD and HTML), we operate a content extraction
(usingpdftotext, antiword and ''hpricot'' ruby library). We
axtract also the
metadata related to each document we index.



>
>
>
>
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk
>


-- 
========== |   Lyes Amazouz
 |   USTHB, Algiers
==========-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/ferret-talk/attachments/20081201/152a22d6/attachment.html>

Jens Krämer

2008-Dec-01 15:48 UTC

head link

[Ferret-talk] Need some information about Ferret

Hi!

On 30.11.2008, at 09:49, Lyes Amazouz wrote:
> Hi everybody!
>
> In our company, we want to use Ferret as the main index/search  
> engine of our applications. And we are looking for some testimonies  
> about how Ferret is efficient when deployed in production.
>
> * Was Ferret already deployed in production in some companies? is  
> there some testimonies about that?
Yes, I use Ferret whenever I need some kind of search for a site or  
application I''m working on. Usually these are full text searches for  
product catalogs and/or html content - not really large scale, at most  
around 10000 documents. Most recent example is www.fahrrad-xxl.de.

We also use Ferret + aaf in a knowledge management system I''m working  
on for xscio AG (xscio.de).
> * What is the maximum number of documents we can index with ferret?  
> Has some one informations about that.
I have no idea whether there is an upper limit for the number the  
documents other than the maximum value a Ruby Fixnum instance can  
have...
> * What is the best way to access a very huge Ferret Index? May we  
> distribute it on several machines or not?
Afair there''s no way to distribute an index across multiple machines  
built into Ferret. You could do the distribution yourself of course by  
clustering your data and distributing across several independent  
ferret indexes. Downside is that search result scores from different  
indexes aren''t directly comparable.
> By the way, can Ferret read Solr indexes as they are both clones of  
> luceen?
Ferret isn''t really index compatible with Lucene anymore, it uses a  
slightly different index format mostly due to differences in the  
representation of utf8 values, but I think there were other changes,  
too.

Oh, and Solr also isn''t a clone of Lucene, it''s a search
server that
internally uses the Lucene library.


Cheers,
Jens

--
Jens Kr?mer
webit! Gesellschaft f?r neue Medien mbH
Schnorrstra?e 76 | 01069 Dresden
Telefon +49351467660 | Telefax +493514676666
kraemer at webit.de | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold

Lyes Amazouz

2008-Dec-04 13:03 UTC

head link

[Ferret-talk] Need some information about Ferret

Hello Jens!

Thank you for your contribution.

> Yes, I use Ferret whenever I need some kind of search for a site or
> application I''m working on. Usually these are full text searches
for product
> catalogs and/or html content - not really large scale, at most around 10000
> documents. Most recent example is www.fahrrad-xxl.de.
>
   Is 100 000 your maximum documents Number?

 We have more than 100.000.000 documents to index. 2.800.000 are already
done but the indexation machine starts to be heavy! Do you think that ferret
will be able to index all this?


>
>  * What is the best way to access a very huge Ferret Index? May we
>> distribute it on several machines or not?
>>
>
> Afair there''s no way to distribute an index across multiple
machines built
> into Ferret. You could do the distribution yourself of course by clustering
> your data and distributing across several independent ferret indexes.
> Downside is that search result scores from different indexes
aren''t directly
> comparable.

  Yes, it is a good Idea. But how will we merge the results when we will get
them back after a request?



-- 
========== |   Lyes Amazouz
 |   USTHB, Algiers
==========-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/ferret-talk/attachments/20081204/15c488b4/attachment.html>

Ferret talk - Nov 2008 - Need some information about Ferret

[Ferret-talk] Need some information about Ferret

[Ferret-talk] Need some information about Ferret

[Ferret-talk] Need some information about Ferret

[Ferret-talk] Need some information about Ferret

[Ferret-talk] Need some information about Ferret

[Ferret-talk] Need some information about Ferret

[Ferret-talk] Need some information about Ferret

[Ferret-talk] Need some information about Ferret

[Ferret-talk] Need some information about Ferret