similar to: Draft Application for GSoC 11 - Text extraction libraries - please review

Displaying 20 results from an estimated 10000 matches similar to: "Draft Application for GSoC 11 - Text extraction libraries - please review"

2012 Mar 03
1
GSoC 2012: Backend for Lucene format indexes
Hi All, I'm Billy, a senior undergraduate student in Peking University. I'm working in the area of Information Retrieval and Web Mining. When going through the idea list, I felt quite interested in the "Backend for Lucene format indexes" project. I have been using java-lucene for about one year, but my subsequent work prefers C++ codes. This project is very meaningful to smooth
2019 Mar 23
2
[GSoC] Questions about project Text-Extraction Libraries
Thanks! That was really useful! I wanted to share my approach to this project with the hope that you can give me some feedback. I am think that applying a design that foresees the incorporation of new file formats is the most suitable way to solve the problem. In the attached sketch we can see: * Bug_Box: It is responsible for encapsulating and handling errors. * File_extrator: It presents an
2011 Mar 22
1
Gsoc- Text Extraction Libraries
Hello, My name is Zongwei, and I'm a 2nd year computer science major at UCLA. I was interested in the text extraction library project, since I have almost 2 years experience with C++ and half a year with Linux/Unix. As I look the formats that Omega already supports, I see that there a lot of formats that only work if a certain program is included. What would be the most important formats to
2019 Mar 21
2
[GSoC] Questions about project Text-Extraction Libraries
Hello! I have a few question related to the project Text-Extraction Libraries. Firstly, I think that trying to isolate library bugs in subprocesses could get to work, but I am not sure about how to handle deadlocks or infinite loops. I feel that using a timer is the only way to deal with it but I would like to know what you think about it. Secondly, I have been reading the source code of
2013 Jun 16
3
Backend for Lucene format indexes-How to get doclength
Hi, all: I have wrote a demo patch for Backend for Lucene format indexes, Lucene version is 3.6.2. http://lucene.apache.org/core/3_6_2/fileformats.html Now, this demo patch just support the basic features in Lucene. Compound File(.cfs/.cfe)?term vector(.tvx/.tvd/.tvf) delete document(.del) are not supported, skip list in .fdx is not supported too example/quest.cc is used to test this demo.
2014 Feb 14
2
GSoC 2014
Hi, I am Nikhar Agrawal, currently studying in my third year at IIIT-H, pursuing Computer Science and Engineering. I am fairly proficient in C++. I have been a GSoC 2013 participant for Boost C++ libraries and managed to successfully merge my project into Boost trunk. As a part of my course on Information Retrieval and Extraction, I did a project on searching for queries on the latest 40 gb
2012 Mar 19
1
I want to volunteer Xapian in GSoC 2012.
Hi, I am an undergraduate student of Computer Science and Engineering. I want to volunteer Xapian in Google Summer of Code 2012. But I have no experience of working on Open Source projects. I am really interested in "Test Extraction Library" project. I meet some of its requirements like I'm pretty much familiar with C++ (and now learning even advanced C++ programming) and have keen
2016 Mar 10
2
Introduction and Doubts
I was not sharing it on maling list because i thought that someone can use all ideas i proposed in their GSOC proposal. Surely i will contribute to xapian project. sorry if that was against the rules The algorithm is not developed by me but after having much research on various clustering techniques. I found that there is a new algorithm called CLUBS(Clustering Using Binary Splitting) which gives
2014 Mar 06
2
Regarding GSOC 2014
Sir, I am a 4th yr undergraduate student pursuing my BTech in CSE at IIIT Hyderbad, India. I am interested in applying for Xapian in Gsoc 2014. I had gone through this year's idea page and interested in applying for 'posting list encoding improvements' project. I am good at C/C++,python; which is one of the requirement. I had done gone through the information Retrieval and
2019 Jun 14
2
Text-Extraction Libraries for Omindex
This is a list with some libraries that I have been looking at. The idea is to discuss the advantages and disadvantages of adding some of these libraries to Xapian. If anyone knows another library that could be add to the list it would be great! Libfreexl: * For Excel (.xls) * Last release: 2018-02 * Info: gaia-gis.it/fossil/freexl/index * License: MPL tri-license
2019 Mar 13
2
[GSoC] Bug tracker access
Hi! My name is Bruno Baruffaldi, I am a Computer Science student from Argentina . I am interested in working for Xapian for GSoC and I have been reading the developers guide. I try to take a look of the bug tracker, but it is seems that I need a username and a password. Is it correct? -- Atte. Bruno Baruffaldi -------------- next part -------------- An HTML attachment was scrubbed... URL:
2019 Mar 02
2
A Greeting for Xapian community
Dear mentors and friends working on Xapian: Sorry for bothering you here, please excuse my rudeness. In order to clearly represent my thoughts, I think my words going a bit verbose, thus it is unsuitable to put them in the chat room or it would be a hell for the readers. This email consists of 3 parts, my self introduction (I'm new here) and two question I met while building Xapian from git.
2012 Dec 23
1
Fwd: Re: Another ue for Recoll/Xapian? - AI/Eliza
People, I sent this note to JF at Recoll and he suggested asking here (his response below) - any suggestions? Thanks, Phil. -------- Original Message -------- Subject: Re: Another ue for Recoll? - AI/Eliza Date: 2012-12-23 19:22 From: jf at dockes.org To: <phil at pricom.com.au> Philip Rhoades writes: > Jean, > > I have been using Recoll happily for some time now but I
2012 Mar 24
1
Regarding OMEGA project, GsoC
Hi, This is Rahul Singhal, a student of Computer Science And Engineering Department of Indian Institute Of Technology, Bombay. My interest lies in coding & algorithm development . I love being wired all the night. I have a lot of experience of C\C++ language, did a course which aims at the deep knowledge of C++ at IIT Bombay itself. As a part of this course I made an SQL compiler in C++ in my
2016 Mar 10
2
Introduction and Doubts
Tf-idf is most used used weighting scheme is easy to understand and has been used in other frameworks like lucene and many other places. okapi bm25(implemented in xapian) is theoretically better/improved measure than tf-idf and i am looking into various other weighting scheme which are there in xapian or can be implemented like TF-ICF(term frequecy inverse corpus frequency),TF-RF(term
2013 Mar 03
0
Sent a pull request for testing TradWeight using an Rset.
Hello guys.As discussed on IRC,I have sent a pull request for a test for testing TradWeight with an Rset. On Fri, Mar 1, 2013 at 5:30 PM, <xapian-devel-request at lists.xapian.org>wrote: > Send Xapian-devel mailing list submissions to > xapian-devel at lists.xapian.org > > To subscribe or unsubscribe via the World Wide Web, visit >
2017 Jan 16
0
GSoC 2017
Hello, I am Abado Jacob Mtulla, a 3rd year Computer Science student. I would love to implement the Go bindings for Xapian. On 01/16/2017 05:55 AM, Olly Betts wrote: > Google are running their Summer of Code again this year. If you're not > familiar with it, see: > > https://developers.google.com/open-source/gsoc/ > > Last year was very successful and I think we probably
2011 Mar 29
0
GSoC Project: Support Erlang Language
"Support Erlang Language" By Vladimir Zaytsev, Xapian, 2011 *About me* Name: Vladimir Zaytsev E-mail address: vladimir at zvm.me WWW: zvm.me/, facebook.com/vladimir.zaytsev<http://www.facebook.com/vladimir.zaytsev> Emergency contact phone number: +79028195844 Short biography: I was born in 5th Febrary, 1991 in Donetsk, USSR; now live in Khanty-Mansiysk, Russia. In 2008
2014 Mar 09
2
[GSOC 2014] Some questions about Letor module
Thanks for your reply! For the third question: In https://inex.mmci.uni-saarland.de/data/documentcollection.jsp, I can find inex2010-article.qrels in 2010 assessment, but can?t find query files. Could you send me the link? I have registered on INEX website. And I also need to download ``INEX 2009 collection without annotation tags: (unofficial)`` on
2019 Jan 22
0
GSoC 2019
Google are running their Summer of Code again this year. If you're not familiar with it, see: https://summerofcode.withgoogle.com/ Interested orgs can apply already up until February 6th (about two weeks away as I write). We've taken part many times before, and it's resulted in both new contributors and interesting new features - I think it's well worth applying again. If