Abhishek Singh Kushwah
2014-Nov-29 15:34 UTC
[Xapian-devel] Adding Support for Krovetz Stemmer Algo in Xapian
Hello, As mentioned on the project ideas page, Adding more support for stemmer algorithm, i found an implementation of Krovetz Stemmer Algo in C++ but before working on it to merge it into xapian, i needed help in recognizing the license information associated with the source code. To avoid further licensing issues kindly someone check the link http://sourceforge.net/p/lemur/wiki/KrovetzStemmer/ -Abhishek -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20141129/9e6bbd96/attachment-0002.html>
James Aylett
2014-Nov-30 15:28 UTC
[Xapian-devel] Adding Support for Krovetz Stemmer Algo in Xapian
On 29 Nov 2014, at 15:34, Abhishek Singh Kushwah <abhishek18kushwah at gmail.com> wrote:> As mentioned on the project ideas page, Adding more support for stemmer algorithm, > > i found an implementation of Krovetz Stemmer Algo in C++ but before working on it to merge it into xapian, i needed help in recognizing the license information associated with the source code. > > To avoid further licensing issues kindly someone check the link > > http://sourceforge.net/p/lemur/wiki/KrovetzStemmer/The project suggests it?s under a BSD license, but the code itself is a mire of confusion, citing that license (by URL), and an ?all rights reserved? copyright statement underneath. Since the BSD license grant is for ?use of the Lemur Toolkit?? I don?t think we can reasonably assume that the individual code is actually licensed appropriately for us to use. Our general attitude, based on past experiences with ?foreign? code, is that unless you?ve written it, it?s difficult for us to accept it. Best, James -- James Aylett, occasional trouble-maker xapian.org
James Aylett
2014-Nov-30 18:33 UTC
[Xapian-devel] Adding Support for Krovetz Stemmer Algo in Xapian
[Please keep replies on the mailing list so that everyone can help and benefit.] On 30 Nov 2014, at 17:51, Abhishek Singh Kushwah <abhishek18kushwah at gmail.com> wrote:> Two of the implementation of algorithms has already been rejected previously due to licenses both being the implementation of porter but our xapian use implementation in snowball which i assume is under GPL.Snowball (and the stemmer implementations shipped with it) is under a BSD license.> Tell me how can a stemmer algo possible so lengthy be incorporated in a bit-size project if we have to code it from the scratch.Krovetz isn?t actually particularly lengthy for a hand-coded algorithm; it?s about 1000 lines (and then almost another 6000 lines of dictionaries). I think the problem here is that Krovetz doesn?t seem amenable to implementing directly in Snowball, which means more work. The original paper <http://people.scs.carleton.ca/~armyunis/projects/KAPI/Krovetz.pdf> doesn?t describe the algorithm particularly concisely, but it doesn?t seem hugely difficult or time-consuming to implement, although there are always concerns about efficiency in stemming algorithms. We?d need a dictionary or dictionaries from somewhere; I?m not clear from a quick skim of the paper what we?d need to do to construct useful ones. Also note from <http://www.comp.lancs.ac.uk/computing/research/stemming/general/krovetz.htm> that Krovetz, in IR, is often combined with other stemmers; at the moment we don?t provide a way of ?chaining? stemmers together. (This could separately be a bite-sized project, however, as it doesn?t sound terribly complex.) If you can get an explicit license grant from the copyright holder of the Krovetz stemmer (which seems to be either the University of Massachusetts or the Applied Computer Systems Institute of Massachusetts, Inc. ? it?s unclear from the Krovetz source code), then (providing it?s compatible) we could accept a derived version directly into Xapian. The problem is the ambiguity about licensing, which is made worse by pointing to <http://www.lemurproject.org/license.html> which asserts yet another copyright holder (albeit also asserting BSD, so if the other two claims are taken care of cleanly then it?ll work out). J -- James Aylett, occasional trouble-maker xapian.org
Olly Betts
2014-Dec-01 04:27 UTC
[Xapian-devel] Adding Support for Krovetz Stemmer Algo in Xapian
On Sun, Nov 30, 2014 at 03:28:35PM +0000, James Aylett wrote:> On 29 Nov 2014, at 15:34, Abhishek Singh Kushwah <abhishek18kushwah at gmail.com> wrote: > > i found an implementation of Krovetz Stemmer Algo in C++ but before > > working on it to merge it into xapian, i needed help in recognizing > > the license information associated with the source code. > > > > To avoid further licensing issues kindly someone check the link > > > > http://sourceforge.net/p/lemur/wiki/KrovetzStemmer/ > > The project suggests it?s under a BSD license, but the code itself is > a mire of confusion, citing that license (by URL), and an ?all rights > reserved? copyright statement underneath. Since the BSD license grant > is for ?use of the Lemur Toolkit?? I don?t think we can reasonably > assume that the individual code is actually licensed appropriately for > us to use.I think a bigger issue is that this licence is actually a variant of the BSD licence which is probably GPL-incompatible. Here's the two licences to compare: http://www.lemurproject.org/license.html http://directory.fsf.org/wiki/License:BSD_3Clause Clauses 1 and 2 are the same, the capitalised disclaiming of warranties is the same except for "AUTHOR" being substituted, and 3 is similar but with "author" substituted and some rewording. But the added clause 4 is problematic as it renders the licence incompatible with the GPL. It's essentially the same as clause 4 of the PHP licence: http://php.net/license/3_01.txt And that clause apparently is a strong restriction on naming derived products: http://www.gnu.org/licenses/license-list.html#PHP-3.01 Cheers, Olly
Olly Betts
2014-Dec-01 04:41 UTC
[Xapian-devel] Adding Support for Krovetz Stemmer Algo in Xapian
> On 30 Nov 2014, at 17:51, Abhishek Singh Kushwah <abhishek18kushwah at gmail.com> wrote: > > > Two of the implementation of algorithms has already been rejected > > previously due to licenses both being the implementation of porter > > but our xapian use implementation in snowball which i assume is > > under GPL.The only cases I can think you might be referring to are two different submissions of patches based on Andy Stark's implementation of the Paice/Husk stemmer (not the Porter stemmer). The problem there is that this implementation has no explicit licence, which means we simply can't use it: http://www.gnu.org/licenses/license-list.html#NoLicense Cheers, Olly