I was doing all my development on OSX and now I am working in a "windows only" environment. Are there precompiled windows binaries, I really don't look forward to doing the cygwin dance?
On Mon, Jan 14, 2008 at 03:56:30PM -0500, Jarrod Roberson wrote:> I was doing all my development on OSX and now I am working in a > "windows only" environment. Are there precompiled windows binaries, > I really don't look forward to doing the cygwin dance?<http://rurban.xarch.at/software/cygwin/contrib/xapian/> gives cygwin binaries, but they're the only precomiled ones I'm aware of. If you have MSVC you should be able to use Ulrik/Charlie's makefiles to build - I don't think anyone has made binaries from this available. J -- /--------------------------------------------------------------------------\ James Aylett xapian.org james@tartarus.org uncertaintydivision.org
Hi! I'm looking for a good way to get a good "snippet text" for a personal search engine based Xapian when showing the result. Actually, I'm using "OTS" (Open Text Summurizer) but the result is good, but not perfect (or almost if possible). Here's an example of usage: $ elinks "http://xapian.org/" -force-html -no-numbering -no-references 2>/dev/null | ots -r 40 =============== generated snippet =================The Xapian Project Welcome to the Xapian project website. It's written in C++, with bindings to allow use from Perl, Python, PHP, Java, Tcl, C# and Ruby (so far!) Xapian is a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. Unlike most other website search solutions, Xapian's versatility allows you to extend Omega to meet your needs as they grow. ==================================== The result is OK for this site (not for thoses with frames ...). I would like to obtain something similar to the google "text snippets". Advices are welcome? N.B: all the HTML pages I'm indexing are converted to text with "elinks" (the text browser) like in the previous example. Thanks in advance. cheers Y.
Hi, I just compiled xapian with MinGW. I could send you the files. Regards, Adi.
The Open Text Summarizer looks pretty good. Perhaps it could be use to fight spamdexing and keyword stuffing. I am wondering how it works? Is it based on natural language processing ? Kevin Duraj http://UncensoredWebSearch.com On Jan 25, 2008 10:26 PM, Bogdan M. Maryniuk <bogdan.maryniuk at gmail.com> wrote:> On Jan 26, 2008 3:04 AM, Peter Karman <peter at peknet.com> wrote: > > http://search.cpan.org/dist/Search-Tools/ > > The HiLiter and Snipper can be used with any text. > > Oh, sorry... I read as Hitler and Sniper... :-) > > -- > bm > > > _______________________________________________ > Xapian-discuss mailing list > Xapian-discuss at lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-discuss >
On 1/15/08, Adi Oanca <adioanca at gmail.com> wrote:> > Hi, > > I just compiled xapian with MinGW. > > I could send you the files. > > Regards, > > Adi. >Thanks but I am using the windows build of python so I need to compile with VS.
Le 08-02-28 ? 16:02, Jarrod Roberson a ?crit :> On 1/15/08, Adi Oanca <adioanca at gmail.com> wrote: >> >> Hi, >> >> I just compiled xapian with MinGW. >> >> I could send you the files. >> >> Regards, >> >> Adi. >> > > > Thanks but I am using the windows build of python so I need to > compile with > VS.If all you want are the Python bindings compiled with msvc, you might want to use the ones I routinely compile and publish: http://www.raptorized.com/xapian-python-win32/ They are available for Python 2.4 and 2.5, and they are built with msvc. -- Alexandre Gauthier Network Analyst / Analyste R?seau Services Informatiques
Hi All Following on from a discussion that was flying around a while back about document snippets (summaries). I have knocked together some proof of concept code (C++) that uses the Xapian stemming ability and sentence extraction (see http://en.wikipedia.org/wiki/Sentence_extraction) . I also used the Open Text Summarizer project as an inspiration. It works quite well, but has some caveats which are explained in the code comments. It can summarise, highlight sentences and highlight words. It also has the ability to do context summaries. For example: If you supply it with terms it will summarise the text within the context of those terms. I am new to C++ programming so while your laughing out loud at the poor coding, please keep that in mind. The code was assembled on an Ubuntu Linux and comes with a Makefile. I have also supplied my stopper class. For some reason the stopper still fails to stop some of the words in the stopper (like "the") if anyone knows why, please let me know. Feedback / comments / changes / improvements are more than welcome - bring it on. I really hope this sparks an interest. Regards Colin -------------- next part -------------- On 9 Feb 2008, at 01:24, Kevin Duraj wrote:> The Open Text Summarizer looks pretty good. Perhaps it could be use to > fight spamdexing > and keyword stuffing. I am wondering how it works? Is it based on > natural language processing ? > > > Kevin Duraj > http://UncensoredWebSearch.com > > > On Jan 25, 2008 10:26 PM, Bogdan M. Maryniuk <bogdan.maryniuk at gmail.com > > wrote: >> On Jan 26, 2008 3:04 AM, Peter Karman <peter at peknet.com> wrote: >>> http://search.cpan.org/dist/Search-Tools/ >>> The HiLiter and Snipper can be used with any text. >> >> Oh, sorry... I read as Hitler and Sniper... :-) >> >> -- >> bm >> >> >> _______________________________________________ >> Xapian-discuss mailing list >> Xapian-discuss at lists.xapian.org >> http://lists.xapian.org/mailman/listinfo/xapian-discuss >> > > _______________________________________________ > Xapian-discuss mailing list > Xapian-discuss at lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-discuss
As per my previous message about summarisation You can download the code here http://www.cbell.info/files/XapSum.zip Regards Colin On 9 Feb 2008, at 01:24, Kevin Duraj wrote:> The Open Text Summarizer looks pretty good. Perhaps it could be use to > fight spamdexing > and keyword stuffing. I am wondering how it works? Is it based on > natural language processing ? > > > Kevin Duraj > http://UncensoredWebSearch.com > > > On Jan 25, 2008 10:26 PM, Bogdan M. Maryniuk <bogdan.maryniuk at gmail.com > > wrote: >> On Jan 26, 2008 3:04 AM, Peter Karman <peter at peknet.com> wrote: >>> http://search.cpan.org/dist/Search-Tools/ >>> The HiLiter and Snipper can be used with any text. >> >> Oh, sorry... I read as Hitler and Sniper... :-) >> >> -- >> bm >> >> >> _______________________________________________ >> Xapian-discuss mailing list >> Xapian-discuss at lists.xapian.org >> http://lists.xapian.org/mailman/listinfo/xapian-discuss >> > > _______________________________________________ > Xapian-discuss mailing list > Xapian-discuss at lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-discuss
On Tue, Mar 18, 2008 at 7:15 AM, Colin Bell <colinabell at gmail.com> wrote:> Hi All > > Following on from a discussion that was flying around a while back > about document snippets (summaries). I have knocked together some > proof of concept code (C++) that uses the Xapian stemming ability and > sentence extraction (see http://en.wikipedia.org/wiki/Sentence_extraction) > . I also used the Open Text Summarizer project as an inspiration. > > It works quite well, but has some caveats which are explained in the > code comments. It can summarise, highlight sentences and highlight > words. It also has the ability to do context summaries. For example: > If you supply it with terms it will summarise the text within the > context of those terms. > > I am new to C++ programming so while your laughing out loud at the > poor coding, please keep that in mind. The code was assembled on an > Ubuntu Linux and comes with a Makefile. I have also supplied my > stopper class. For some reason the stopper still fails to stop some of > the words in the stopper (like "the") if anyone knows why, please let > me know. > > Feedback / comments / changes / improvements are more than welcome - > bring it on. I really hope this sparks an interest. > > Regards > > Colin >Colin! Great job, it definitely sparks an interest. Can you share the code with us, or send the link where we can download it . I will run it against myhealthcare.com 73 million document search engine using the sentence summarizer, and we will see what kind of results we will get on the top. Hopefully, we will get rid of web sites using excessive keywords stuffing and spamdexing techniques. Did you have a chance to take a look at Flesh-Kincaid readability algorithm design to measure comprehension difficulty in English language? http://en.wikipedia.org/wiki/Flesch-Kincaid_Readability_Test Kevin Duraj http://myhealthcare.com
Hi Kevin I did attach the source code to the original posting but it seems to not made it through the mailing list. You can download it here. I am using on our company search and its doing a good job and is pretty fast. Needs a bit of tidying up and my C++ knowledge is very weak, could do with some help. I will do some reading on the link you sent, thanks. http://www.cbell.info/XapSum.zip Regards Colin On 19 Mar 2008, at 18:29, Kevin Duraj wrote:> On Tue, Mar 18, 2008 at 7:15 AM, Colin Bell <colinabell at gmail.com> > wrote: >> Hi All >> >> Following on from a discussion that was flying around a while back >> about document snippets (summaries). I have knocked together some >> proof of concept code (C++) that uses the Xapian stemming ability and >> sentence extraction (see http://en.wikipedia.org/wiki/Sentence_extraction) >> . I also used the Open Text Summarizer project as an inspiration. >> >> It works quite well, but has some caveats which are explained in the >> code comments. It can summarise, highlight sentences and highlight >> words. It also has the ability to do context summaries. For example: >> If you supply it with terms it will summarise the text within the >> context of those terms. >> >> I am new to C++ programming so while your laughing out loud at the >> poor coding, please keep that in mind. The code was assembled on an >> Ubuntu Linux and comes with a Makefile. I have also supplied my >> stopper class. For some reason the stopper still fails to stop some >> of >> the words in the stopper (like "the") if anyone knows why, please let >> me know. >> >> Feedback / comments / changes / improvements are more than welcome - >> bring it on. I really hope this sparks an interest. >> >> Regards >> >> Colin >> > > Colin! > > Great job, it definitely sparks an interest. Can you share the code > with us, or send the link where we can download it . I will run it > against myhealthcare.com 73 million document search engine using the > sentence summarizer, and we will see what kind of results we will > get on the top. Hopefully, we will get rid of web sites using > excessive keywords stuffing and spamdexing techniques. > > Did you have a chance to take a look at Flesh-Kincaid readability > algorithm design to measure comprehension difficulty in English > language? > http://en.wikipedia.org/wiki/Flesch-Kincaid_Readability_Test > > Kevin Duraj > http://myhealthcare.com
Colin, Your code does not compile on Linux, I think it was written on Windows and I do not have much time to fix it. Even so, here is another great algorithm Gunning fog index. http://en.wikipedia.org/wiki/Gunning_fog_index Gunning fog index is designed to measure the readability of English writing. The resulting number is an indication of the number of years of formal education that a person requires in order to easily understand the text on the first reading. With Gunning fog index we could potentially measure the intelligence of a web page, assign boost value to it and get some great page ranking like Google does. :-) Kevin Duraj http://myhealthcare.com On Wed, Mar 19, 2008 at 12:37 PM, Colin Bell <colinabell at gmail.com> wrote:> > Hi Kevin > > I did attach the source code to the original posting but it seems to not > made it through the mailing list. You can download it here. I am using on > our company search and its doing a good job and is pretty fast. Needs a bit > of tidying up and my C++ knowledge is very weak, could do with some help. > > I will do some reading on the link you sent, thanks. > > http://www.cbell.info/XapSum.zip > Regards > Colin > > > On 19 Mar 2008, at 18:29, Kevin Duraj wrote: > On Tue, Mar 18, 2008 at 7:15 AM, Colin Bell <colinabell at gmail.com> wrote: > Hi All > > Following on from a discussion that was flying around a while back > about document snippets (summaries). I have knocked together some > proof of concept code (C++) that uses the Xapian stemming ability and > sentence extraction (see http://en.wikipedia.org/wiki/Sentence_extraction) > . I also used the Open Text Summarizer project as an inspiration. > > It works quite well, but has some caveats which are explained in the > code comments. It can summarise, highlight sentences and highlight > words. It also has the ability to do context summaries. For example: > If you supply it with terms it will summarise the text within the > context of those terms. > > I am new to C++ programming so while your laughing out loud at the > poor coding, please keep that in mind. The code was assembled on an > Ubuntu Linux and comes with a Makefile. I have also supplied my > stopper class. For some reason the stopper still fails to stop some of > the words in the stopper (like "the") if anyone knows why, please let > me know. > > Feedback / comments / changes / improvements are more than welcome - > bring it on. I really hope this sparks an interest. > > Regards > > Colin > > > Colin! > > Great job, it definitely sparks an interest. Can you share the code with us, > or send the link where we can download it . I will run it against > myhealthcare.com 73 million document search engine using the sentence > summarizer, and we will see what kind of results we will get on the top. > Hopefully, we will get rid of web sites using excessive keywords stuffing > and spamdexing techniques. > > Did you have a chance to take a look at Flesh-Kincaid readability algorithm > design to measure comprehension difficulty in English language? > http://en.wikipedia.org/wiki/Flesch-Kincaid_Readability_Test > > Kevin Duraj > http://myhealthcare.com > >