Han Jiang
2012-Mar-03  16:08 UTC
[Xapian-devel] GSoC 2012: Backend for Lucene format indexes
Hi All, I'm Billy, a senior undergraduate student in Peking University. I'm working in the area of Information Retrieval and Web Mining. When going through the idea list, I felt quite interested in the "Backend for Lucene format indexes" project. I have been using java-lucene for about one year, but my subsequent work prefers C++ codes. This project is very meaningful to smooth the transition. As far as I know, the operation of index file, e.g. IndexReader, has changed quite some (Lucene3.5 File Format<http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html#Index%20File%20Formats>) , while the idea page itself still linked to an old 3.0 version<http://lucene.apache.org/core/old_versioned_docs/versions/3_0_3/fileformats.html>. Since it doesn't seem a simple work to cope with all the versions, shall we just implement to support the old 3.0 format, or a more stable version? Thank you! -- Han Jiang EECS, Peking University, China Every Effort Creates Smile Senior Student -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20120304/9162bacc/attachment-0001.html>
Olly Betts
2012-Mar-04  04:31 UTC
[Xapian-devel] GSoC 2012: Backend for Lucene format indexes
On Sun, Mar 04, 2012 at 12:08:47AM +0800, Han Jiang wrote:> As far as I know, the operation of index file, e.g. IndexReader, has > changed quite some (Lucene3.5 File > Format<http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html#Index%20File%20Formats>) > , while the idea page itself still linked to an old 3.0 > version<http://lucene.apache.org/core/old_versioned_docs/versions/3_0_3/fileformats.html>. > Since it doesn't seem a simple work to cope with all the versions, shall we > just implement to support the old 3.0 format, or a more stable version?Thanks for noticing this - this idea was carried over from last year's list, and that's why the link points to an old version. I've updated the link on the wiki to the newer one you gave above. I think it makes sense to support the latest version as the priority, with support for older versions possibly useful if there's time. Cheers, Olly
Possibly Parallel Threads
- Backend for Lucene format indexes-How to get doclength
- Packages for R-CRAN (organizing aspects)
- Backend for Lucene format indexes-How to get doclength
- Draft Application for GSoC 11 - Text extraction libraries - please review
- Getting a Lucene.net index readable by Ferret