similar to: Tika 0.8 failure rates

Displaying 20 results from an estimated 1000 matches similar to: "Tika 0.8 failure rates"

2009 Feb 03
1
PowerPoint 2007 filter
Hi, I'm trying to write the PowerPoint2007 filter in the same manner that I did for *.docx and *.xlsx but I'm getting the following error when I tru an index. The document is called: Indexing "/Frisk in Power Point.pptx" as application/vnd.openxmlformats-officedocument.presentationml.presentation ... caution: filename not matched: ppt/notesSlides/notesSlide*.xml caution:
2011 Oct 25
0
ACL with ActiveDirectory@Groups **UP**
************ I added : acl check permissions = False veto oplock files = /*.doc/*.docx/*.xls/*.xlsx/*.pptx/*.ppsx/*.ppt/*.pps but still doesn't work. ************* > Dear All, > > I have problem with this smb.conf share section > (I'm not samba admin, but I know this configuration) > > smb.conf 3.5.8 > > ################### > [AD-test-acl] > comment
2011 Sep 21
1
File permissions 0070 with Office 2010 after saving
I think this is a recurrence of an old bug. Running Samba 3.5.4 with CTDB on GPFS 3.4.0.6 with the vfs_gpfs module using CentOS 5.6. It is a vanilla CentOS RPM's with the vfs_gpfs module a self compiled add on. Running with NFSv4 ACL's. Basically what happens is when a user saves a file in Office 2010 (no Office 2007 to test with) with Windows 7 on the Unix side the permissions on the
2022 Oct 22
1
Anyone using odpdown?
On 10/21/2022 12:45 PM, Leon Fauster via CentOS wrote: > Am 21.10.22 um 17:42 schrieb H: >> On 10/20/2022 02:52 PM, H wrote: >>> Is anyone using odpdown to convert markdown files to OpenOffice Impress slide presentations under CentOS 7? >>> >>> It is not available in the CentOS repositories I have searched. >>> >>>
2014 Aug 05
4
[LLVMdev] Publication: Languages Used in LLVM During Compilation
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of David Chisnall > Subject: Re: [LLVMdev] Publication: Languages Used in LLVM During Compilation > How do I download it? The link sends me to a site with ads, then when I > find the little English icon, there's a download link that takes me to a > page with a picture telling me an
2011 Oct 18
2
patch proposal: omindex library or daemon
Olly (looking at commit logs, I think this is your dept :-) For apps which re/index files frequently and need format conversion, I'd like to propose a patch for one of... Omindex library (thread safe): Omindex::init(options) // struct Omindex::options { ... } initialize mime_map, store default options session = new Omindex::Session(db_pathname) user threads use different sessions
2012 Dec 13
1
omindex one file at a time?
Hi, all -- I want to do Plain Old Omindex'ing *but* the mapping between my documents' filenames and the URLs where I hope search users to find them is, uh..., strange. The simplest thing (to me) would be to run omindex for each document, e.g. omindex --no-delete -U /cool-url-1 /funky/doc/file-blah.pdf omindex --no-delete -U /cool-url-7 /doc/funky/ohmy/blah-file.txt ... and so on...
2016 Sep 27
1
omega issues/notes
All, I've run into a couple of things using omega/omindex under cygwin. I don't think I'd attribute them to xapian, omega or omindex, but wanted to get them out to the list so that if anyone else should run into these things down the road, hopefully someone will remember and be able to help. 1) after compiling and building omega, and doing make install, I get a set violation when
2009 May 19
4
omindex options
Hi. I am writing a python equivalent of omindex (we are using scriptindex currently - but I wanted to use omindex instead, and extend it to work with our internal file format.. BUT did not want to compile code if possible... so anyway). I have tried to keep the code as close to possible to the omindex native code, but am facing a bit of confusion: what exactly is the reason for omindex to take
2024 Dec 20
1
Plain text files without extension
On Thu, Dec 19, 2024 at 03:17:13PM -0600, Wilbert van Bakel wrote: > I have many plain text files that don't have an extension. > I notice that omindex is skipping them. > Is there a way to include these files? Are you using a build of omega with libmagic support enabled (it's optional in 1.4.x, but will be a hard requirement in the next release series)? If not, I'd try
2024 Apr 22
2
How to use Xapian Omega directly (i.e., without using `recoll` and `xapiandb`) ... Full Set Of Questions Below:
Dear senior ML members and developers of Xapian Omega, Mr. Olly has helped me cross the bump of the initial learning curve. (ref: https://lists.xapian.org/pipermail/xapian-discuss/2024-April/010034.html) How can I use Xapian Omega directly (i.e., without using `recoll` and `xapiandb`) to index a directory of text files with all strings greater than 3 characters, to create an index text file
2009 Feb 02
2
Ticket #282: omindex-assorted-enhancements.patch woes
I would really like to try out the features in the patch above. But I can't ever seem to get the resulting omindex.cc to "make". I tried updating to rev 10801 from the SVN then run /bootstrap but then I seem to get errors compiling everything when I try and do "make" (I'm using ubuntu 8.10). So I thought I'd try an apply the patch to the latest stable version
2013 May 15
1
How to omindex some sub-directories?
Given a directory tree like ... /foo | +-- A | +-- B | +-- C ... what is the best way to index A and C into a single Xapian database? AFAIK the alternatives are: omindex --db /my_db --no-delete /foo /foo/A omindex --db /my_db --no-delete /foo /foo/B or omindex --db /my_A_db /foo /foo/A omindex --db /my_B_db /foo /foo/B xapian-compact /my_A_db /my_B_db /my_db The first alternative does not
2014 Mar 11
2
[GSOC 2014] Indexing INEX dataset
On Tue, Mar 11, 2014 at 12:02:15PM +0100, Parth Gupta wrote: > During the indexing with omindex, only you need to make sure is indexing > with prefix 'S' for title as explained here in Letor documentation: > xapian-letor/docs/letor.rst > > Previously when I edited omindex.cc it was modified as can be seen >
2004 Dec 17
2
Omega changes
I propose making a few changes to the way omega (and omindex) operate. I'm posting these to the list before doing so to check if they'll cause obvious problems for anyone. 1) Configuration handling for omega. Omega has a configuration file, which specifies where databases, templates and logfiles are to be found. It currently looks for this configuration file in its current working
2013 Nov 26
1
Oplock break failed for file
Hi, I am running 2DCs and 1 member server. All are running samba 4.1.2 The member server hosts the file for access and it is full of log like: [2013/11/26 14:57:46.970108, 0] ../source3/smbd/oplock.c:333(oplock_timeout_handler) Oplock break failed for file Putonghua/aaa.pptx -- replying anyway [2013/11/26 14:57:50.069924, 0] ../source3/smbd/oplock.c:333(oplock_timeout_handler) Oplock
2007 Jul 12
1
omega: omindex behaviour with duplicate files
Hi all I need a little clarification with regard to Omega's behaviour with 'duplicate' files when running 'omindex'. How is a duplicate recognised? Is it simply by file path? How is an unmodified file detected, if at all? I would like to set up subversion post-commit hook to update my index. If possible I would like to just update the index with the newly commited files.
2017 Apr 20
2
Question about the ticket #743 omindex: delay libmagic checks
Hi, I'm working on the ticket #743 omindex: delay libmagic checks <https://trac.xapian.org/ticket/743>. As the ticket's Description mention, the call to libmagic is expensive than call the stat, so we can check the size by call the stat to get size before call libmagic to get a mime type. But how about the timestamps check? since timestamps check need to iterate the DB to check if
2005 Mar 31
1
omindex and scriptindex question
Hi, I was researching indexing of text in omindex and scriptindex. While indexing text with omindex.cc possition of terms is saved with gap. This is not happening with scriptindex.cc While this is happening ? Another question is why in omindex.cc the term possition starts with 0 while in scriptindex it starts from 1 ? Code snippet from omindex.cc // Add postings for terms to the document
2014 Mar 11
2
[GSOC 2014] Indexing INEX dataset
Hi Parth, I?ve implemented SVMRanker class and also sorted out most of current Letor APIs. Now I?m trying to use INEX dataset to verify my implement. But I stuck in the indexing part. You said in the documentation that we have to add prefix when indexing. Also I notice that you set some metadata in omindex.cc of your version. But the omindex.cc has changed since 2011. I think that?s why my result