Displaying 20 results from an estimated 1000 matches similar to: "How to beat Google aka Xapian & Natural Language Processing."
2007 Oct 11
2
Xapian 1.0.3 installation issues.
Xapian 1.0.3 installation issues,
I installed Xapian 1.0.3 and the search would not execute when run as
Apache user. I could run the search fine inside ssh. I rolled Xapian
to previous version 1.0.2 and the search still does not work even when
I put back the old index made by Xapian 1.0.2
... my search engine is out of work ...
Kevin Duraj
http://myhealthcare.com
2010 Dec 18
1
Xapian index size 475GB = 170 million documents (URLs)
Xapians,
I am maintaining about two indexes for my search engines which
approximately is each the same size. I would like to share this
knowledge with you, since many of you have never seen Xapian index of
this size. And of course you can search the index by yourself at
- http://myhealthcare.com/
- http://find1friend.com/
I need 2 x 100 million more documents into each index, and I hope it
will
2007 Feb 02
1
Working demo of search engine using boolean query.
Lately I was reading many articles about using boolean queries for search
engine but I haven't seen any complete working demo. Therefore I put
together very simple working demo of search engine using boolean query. Feel
free to suggest any performance improvement or error while keeping it as
simple as possible for understanding.
Thanks,
-Kevin Duraj
http://myhealthcare.com
2007 Jun 05
7
Chinese, Japanese, Korean Tokenizer.
Hi,
I am looking for Chinese Japanese and Korean tokenizer that could can
be use to tokenize terms for CJK languages. I am not very familiar
with these languages however I think that these languages contains one
or more words in one symbol which it make more difficult to tokenize
into searchable terms.
Lucene has CJK Tokenizer ... and I am looking around if there is some
open source that we
2006 Aug 04
4
REST
I''ve been looking into RESTful approaches lately. Everything I know
my dog, Lelu, taught me.
REST (REpresentational State Transfer) is an architectural technique
for networked applications first described by Roy Fielding in his
dissertation at UC Irvine-- excellent work, especially considering
the tempting proximity of Newport Beach. As Lelu described it to me,
REST strives
2009 Sep 30
2
C++ parser for doc.get_data() result.
Xapians!
Did anybody wrote and would like to share a routines that parse result
from doc.get_data() into some key and pair values in C++ ?
Code:
Xapian::Document doc = i.get_document();
string data = doc.get_data();
mymap = parse_result(data);
As you know the data string contain all the data within the document
delimited by "=" sign and "\n" new line and needs to be parse
2016 Jul 12
3
Xapian 1.4.0 released
On Mon, Jul 11, 2016 at 02:02:56PM -0700, Kevin Duraj wrote:
> You are saying that when I search for "delve Xapian 1.4" on Google, a
> company worth of 491 Billion of Dollars and you saying that their top
> of the search result has nothing to do with Xapian.
>
> https://www.google.com/search?q=xapian+delve&ie=utf-8&oe=utf-8#q=delve+xapian+1.4
Well, I'm not
2002 Jun 09
1
S or R used in natural language processing (NLP)?
Dear All,
Does anyone use S or R for statistical natural language processing (NLP)?
All I have found so far is a package called EMU
(http://www.shlrc.mq.edu.au/emu/emu-splus.shtml) which is a speech
wave-form processing package.
What I'm looking for are routines to support text processing, text
categorization, word sense disambiguation, text understanding etc.
In particular, I would
2007 Jul 17
1
BUG IN XAPIAN_FLUSH_THRESHOLD
There is is bug when setting XAPIAN_FLUSH_THRESHOLD=20000000
When trying for force Xapian flush documents to flush after 20 million
documents Xapian ignores the size and flush it after only 10,000
documents.
Data captured from delve after 60 seconds interval when has been set as follow:
XAPIAN_FLUSH_THRESHOLD=20000000
perl -e ' while(1) { system("delve ."); sleep(60); } '
2007 Jul 06
1
Using nouns
Luke,
I thought about what you said about nouns, but having trouble coming up
with an example of what you mean.
Would you be willing to rework my example into the words your talking about?
I''m not asking for a solution here...this is more about how to think
about the problem being solved.
Mike B.
----------------------------------------------------------------
This message was sent
2007 Feb 07
2
My new record: Indexing 20 millions docs = 79m9.378s
Gentoo Linux 2.6
8 AMD Opteron 64-bit Processors
32GB Memory
--------------------------------------------------------------------------------
Environment:
------------------
XAPIAN_FLUSH_THRESHOLD=21000000
XAPIAN_FLUSH_THRESHOLD_LENGTH=16000000
XAPIAN_PREFER_FLINT=True
Indexing 20 million documents:
--stemmer=none
-------------------------------------------
real 79m9.378s
user 77m28.696s
2016 Jul 10
3
Xapian 1.4.0 released
On Fri, Jul 08, 2016 at 06:42:23PM -0700, Kevin Duraj wrote:
> The issue is that delve was renamed to xapian-delve but documentation
> is still saying that delve is delve. Who has access to update the
> documentation?
>
> http://www.linuxfromscratch.org/blfs/view/svn/general/xapian.html
That website has nothing to do with Xapian, so you probably need to
contact whoever runs it.
2010 Aug 23
2
NetBeans and Java Bindings
Hello,
I was wondering if anyone has succeeded in getting the Java bindings to work
with NetBeans, in order to make use of NetBeans's GUI developer. I've had no
luck so far, does anyone know how to do that?
Many thanks.
2012 Sep 19
7
Renaming Journey and avoiding libraries with common noun names
Hi all,
I know this is a long shot, but could renaming the "Journey" module please
be considered by those in a position to support it?
I''ve written an issue on this in the journey repo also:
https://github.com/rails/journey/issues/49
Essentially our project has a model named Journey, the same as Rails 3.2''s
new routing driver. As a consequence we can no longer
2012 Nov 14
4
xapian-replicate errors
Hi,
While trying to setup xapian replication (initially for backup
purposes), I'm encountering some errors.
Our "fresh" index starts replication, and ends up with an index size
that matches the replication master (4.5GB), but then throws :
"Getting update for fresh from fresh
xapian-replicate: NetworkError: Unable to fully synchronise: Database
changing too fast"
I
2006 May 12
2
Pluralization of non-noun names
People,
I have an insurance company client and for the last eleven years I have
wanted to completely redevelop their system from scratch. However, the
boss has never been interested in hiring a team to develop the new
system (partly because of the cost and partly because of some famous and
expensive development failures in the industry) and has always insisted
on incremental development of the
2007 Jun 17
2
Flint failed to deliver indexing performance to Quartz.
Flint failed to deliver indexing performance to Quartz.
I am proposing to remove Flint as default database and place Quartz
database back as default. The catch is not that Flint database is
smaller and faster during searches then Quartz database as developers
were concerning when were measuring and neglecting to measure
performance when creating the large indexes.
The truth is that Flint
2006 Jun 12
4
Modelling: A table of domains
Greetings!
I''m thinking of setting up a table of "domains", consisting of the core
fields id, code, name, description, and type. Users is a domain, orders
is a domain, recipes is a domain, etc. Domain attributes other than
those covered by the core fields will go to, say, a user_other_fields
table, recipe_other_fields, etc. I see the advantage of having all
2016 Jul 24
3
Xapian 1.4.0 released
On Fri, Jul 22, 2016 at 07:19:43PM -0700, Kevin Duraj wrote:
> I would like to propose to change the following code while indexing a
> term that is larger than 245 characters and then crashing and aborting
> the entire index, we could rather truncate the term to 245 characters
> and continue with indexing.
Kevin -- I wonder what others are currently doing when this comes up
(or if
2007 Jul 09
7
Xapian pubmeet
Hi all,
A few of us have been discussing whether we should have a Xapian social
gathering of some kind. The current idea is meeting up in a pub in
London some time in autumn for drinks and food. However all of this
really depends on who might be able to come! It would be a chance to
meet other Xapian enthusiasts in an informal social setting and talk
about all things search-related (and