Displaying 20 results from an estimated 10000 matches similar to: "Draft Application for GSoC 11 - Text extraction libraries - please review"
2012 Mar 03
1
GSoC 2012: Backend for Lucene format indexes
Hi All,
I'm Billy, a senior undergraduate student in Peking University. I'm working
in the area of Information Retrieval and Web Mining. When going through the
idea list, I felt quite interested in the "Backend for Lucene format
indexes" project. I have been using java-lucene for about one year, but my
subsequent work prefers C++ codes. This project is very meaningful to
smooth
2019 Mar 23
2
[GSoC] Questions about project Text-Extraction Libraries
Thanks!
That was really useful!
I wanted to share my approach to this project with the hope that you can
give me some feedback.
I am think that applying a design that foresees the incorporation of new
file formats is the most suitable way to solve the problem.
In the attached sketch we can see:
* Bug_Box: It is responsible for encapsulating and handling errors.
* File_extrator: It presents an
2011 Mar 22
1
Gsoc- Text Extraction Libraries
Hello,
My name is Zongwei, and I'm a 2nd year computer science major at UCLA. I was interested in the text extraction library project, since I have almost 2 years experience with C++ and half a year with Linux/Unix. As I look the formats that Omega already supports, I see that there a lot of formats that only work if a certain program is included. What would be the most important formats to
2019 Mar 21
2
[GSoC] Questions about project Text-Extraction Libraries
Hello!
I have a few question related to the project Text-Extraction Libraries.
Firstly, I think that trying to isolate library bugs in subprocesses could
get to work, but I am not sure about how to handle deadlocks or infinite
loops. I feel that using a timer is the only way to deal with it but I
would like to know what you think about it.
Secondly, I have been reading the source code of
2013 Jun 16
3
Backend for Lucene format indexes-How to get doclength
Hi, all:
I have wrote a demo patch for Backend for Lucene format indexes, Lucene
version is 3.6.2.
http://lucene.apache.org/core/3_6_2/fileformats.html
Now, this demo patch just support the basic features in Lucene. Compound
File(.cfs/.cfe)?term vector(.tvx/.tvd/.tvf)
delete document(.del) are not supported, skip list in .fdx is not supported
too
example/quest.cc is used to test this demo.
2014 Feb 14
2
GSoC 2014
Hi,
I am Nikhar Agrawal, currently studying in my third year at IIIT-H,
pursuing Computer Science and Engineering. I am fairly proficient in C++. I
have been a GSoC 2013 participant for Boost C++ libraries and managed to
successfully merge my project into Boost trunk.
As a part of my course on Information Retrieval and Extraction, I did a
project on searching for queries on the latest 40 gb
2012 Mar 19
1
I want to volunteer Xapian in GSoC 2012.
Hi,
I am an undergraduate student of Computer Science and Engineering. I want
to volunteer Xapian in Google Summer of Code 2012. But I have no experience
of working on Open Source projects. I am really interested in "Test
Extraction Library" project. I meet some of its requirements like I'm
pretty much familiar with C++ (and now learning even advanced C++
programming) and have keen
2016 Mar 10
2
Introduction and Doubts
I was not sharing it on maling list because i thought that someone can use
all ideas i proposed in their GSOC proposal.
Surely i will contribute to xapian project.
sorry if that was against the rules
The algorithm is not developed by me but after having much research on
various clustering techniques.
I found that there is a new algorithm called CLUBS(Clustering Using Binary
Splitting) which gives
2014 Mar 06
2
Regarding GSOC 2014
Sir,
I am a 4th yr undergraduate student pursuing my BTech in CSE at IIIT
Hyderbad, India.
I am interested in applying for Xapian in Gsoc 2014. I had gone through
this year's idea page and interested in applying for 'posting list encoding
improvements' project.
I am good at C/C++,python; which is one of the requirement. I had done gone
through the information Retrieval and
2019 Jun 14
2
Text-Extraction Libraries for Omindex
This is a list with some libraries that I have been looking at.
The idea is to discuss the advantages and disadvantages of adding some of
these libraries to Xapian.
If anyone knows another library that could be add to the list it would be
great!
Libfreexl:
* For Excel (.xls)
* Last release: 2018-02
* Info: gaia-gis.it/fossil/freexl/index
* License: MPL tri-license
2019 Mar 13
2
[GSoC] Bug tracker access
Hi!
My name is Bruno Baruffaldi, I am a Computer Science student from Argentina
.
I am interested in working for Xapian for GSoC and I have been reading the
developers guide.
I try to take a look of the bug tracker, but it is seems that I need a
username and a password.
Is it correct?
--
Atte. Bruno Baruffaldi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
2019 Mar 02
2
A Greeting for Xapian community
Dear mentors and friends working on Xapian:
Sorry for bothering you here, please excuse my rudeness. In order to
clearly represent my thoughts, I think my words going a bit verbose,
thus it is unsuitable to put them in the chat room or it would be a
hell for the readers.
This email consists of 3 parts, my self introduction (I'm new here)
and two question I met while building Xapian from git.
2012 Dec 23
1
Fwd: Re: Another ue for Recoll/Xapian? - AI/Eliza
People,
I sent this note to JF at Recoll and he suggested asking here (his
response below) - any suggestions?
Thanks,
Phil.
-------- Original Message --------
Subject: Re: Another ue for Recoll? - AI/Eliza
Date: 2012-12-23 19:22
From: jf at dockes.org
To: <phil at pricom.com.au>
Philip Rhoades writes:
> Jean,
>
> I have been using Recoll happily for some time now but I
2012 Mar 24
1
Regarding OMEGA project, GsoC
Hi,
This is Rahul Singhal, a student of Computer Science And Engineering
Department of Indian Institute Of Technology, Bombay.
My interest lies in coding & algorithm development . I love being wired all
the night. I have a lot of experience of C\C++ language,
did a course which aims at the deep knowledge of C++ at IIT Bombay itself.
As a part of this course I made an SQL compiler in C++
in my
2016 Mar 10
2
Introduction and Doubts
Tf-idf is most used used weighting scheme is easy to understand and has
been used in other frameworks like lucene and many other places.
okapi bm25(implemented in xapian) is theoretically better/improved measure
than tf-idf and
i am looking into various other weighting scheme which are there in xapian
or can be implemented like TF-ICF(term frequecy inverse corpus
frequency),TF-RF(term
2013 Mar 03
0
Sent a pull request for testing TradWeight using an Rset.
Hello guys.As discussed on IRC,I have sent a pull request for a test for
testing TradWeight with an Rset.
On Fri, Mar 1, 2013 at 5:30 PM, <xapian-devel-request at lists.xapian.org>wrote:
> Send Xapian-devel mailing list submissions to
> xapian-devel at lists.xapian.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>
2017 Jan 16
0
GSoC 2017
Hello,
I am Abado Jacob Mtulla, a 3rd year Computer Science student. I would
love to implement the Go bindings for Xapian.
On 01/16/2017 05:55 AM, Olly Betts wrote:
> Google are running their Summer of Code again this year. If you're not
> familiar with it, see:
>
> https://developers.google.com/open-source/gsoc/
>
> Last year was very successful and I think we probably
2011 Mar 29
0
GSoC Project: Support Erlang Language
"Support Erlang Language" By Vladimir Zaytsev, Xapian, 2011
*About me*
Name: Vladimir Zaytsev
E-mail address: vladimir at zvm.me
WWW: zvm.me/, facebook.com/vladimir.zaytsev<http://www.facebook.com/vladimir.zaytsev>
Emergency contact phone number: +79028195844
Short biography:
I was born in 5th Febrary, 1991 in Donetsk, USSR; now live in
Khanty-Mansiysk, Russia. In 2008
2014 Mar 09
2
[GSOC 2014] Some questions about Letor module
Thanks for your reply! For the third question: In https://inex.mmci.uni-saarland.de/data/documentcollection.jsp, I can find inex2010-article.qrels in 2010 assessment, but can?t find query files. Could you send me the link? I have registered on INEX website. And I also need to download ``INEX 2009 collection without annotation tags: (unofficial)`` on
2019 Jan 22
0
GSoC 2019
Google are running their Summer of Code again this year. If you're not
familiar with it, see:
https://summerofcode.withgoogle.com/
Interested orgs can apply already up until February 6th (about two weeks
away as I write).
We've taken part many times before, and it's resulted in both new
contributors and interesting new features - I think it's well worth
applying again.
If