thr3ads.net - similar to: "Reading a password-protected PDF"

Displaying 20 results from an estimated 500 matches similar to: "Reading a password-protected PDF"

Need Beginner Guide for Matcher Optimisations Project

2013 Mar 04

Need Beginner Guide for Matcher Optimisations Project

Hi, While searching for a project which matches my interest andskill level, I found this project named Matcher Optimization. This project is really challenging and excting from my view point and I would like to be a part of this project. Optimization techniques metioned in the reference links provided will take some time for me to have a good understanding about them. But I am trying to get my

Getting Started

2013 Mar 02

Getting Started

Hello all, I am Mohd Azeem. I want to contribute in Xapian but I am a newbie here. I wonder if anyone could help me in getting started with Xapian. I have some basic knowledge of IR and implemented TF*IDF and PageRank schemes, and also implemented Inverted Index and Web-Crawler. regards, Azeem -------------- next part -------------- An HTML attachment was scrubbed... URL:

How to add an custom weight to the relevancy value and sort it.

2013 Mar 02

How to add an custom weight to the relevancy value and sort it.

Hello guys, I have an weight value which is calculated by some factor and i need to add the weight with the relevancy value of a result and sort it with that value is that possible in xapian. Thanks, VishnuKumar -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20130302/9831e287/attachment-0001.html>

Dealing with image PDF's

2008 Jul 30

Dealing with image PDF's

Guys, I was just playing around and added a bit of code to omindex.cc so I could ocr tiff and tif with gocr which seems to work. Here's what it looks like: // Tiff: } else if (startswith(mimetype, "image/tif")) { // Inspired by http://mjr.towers.org.uk/comp/sxw2text string safefile = shell_protect(file); string cmd = "tifftopnm " + safefile + "

Dealing with image PDF's

2008 Jul 30

Dealing with image PDF's

Proposed changes to omindex

2006 Aug 11

Proposed changes to omindex

Proposed changes to omindex Currently Available Items ========================= 1) Have the Q prefix contain the 16 byte MD5 of the full file name used for document lookup during indexing. 2) Add the document?s last modified time to the value table (ID 0). This would allow incremental indexing based on the timestamp and also sorting by date in omega (SORT=0) a. Currently I store the timestamp

RC7: BUG! and patch [Was: Re: rc7 bug? [Was: deliver LDA and INBOX location] (fwd)] (fwd)

2006 Sep 15

RC7: BUG! and patch [Was: Re: rc7 bug? [Was: deliver LDA and INBOX location] (fwd)] (fwd)

Could someone confirm, please, that this bug report and its proposed fix are being checked? 1. Is my analysis (message below) about right? 2. Is my proposed patch (attached) about right? 3. Is this being addressed for "rc8" (or whatever) and its successors? Many thanks. -- : David Lee I.T. Service : : Senior Systems Programmer

pdftotext latest version for CentOS 7

2019 Dec 15

pdftotext latest version for CentOS 7

I have pdftotext 0.26.5, the current version for CentOS 7 and the Mate desktop as far as I can ascertain. The page https://www.xpdfreader.com/pdftotext-man.html seems to suggest that the latest version is 4.02 which seems a gigantic leap ahead. Since I have a Chinese text PDF which I am unable to extract any text from using pdftotext, instead I end up with a collection of garbage Latin

omindex => Unknown extension

2009 Apr 06

omindex => Unknown extension

Hi all, I'm having a recurrent problem with Omega's indexing. When I run omindex, it sometimes misses to recognize the extension of some files (.doc, .pdf) and skips them. In the same run, omindex is otherwise perfectly able to index other files with same extensions. The reason is not clear but it should occur before it selects a content converter since for example, if I manually run

[GSoC] Questions about project Text-Extraction Libraries

2019 Mar 21

[GSoC] Questions about project Text-Extraction Libraries

Hello! I have a few question related to the project Text-Extraction Libraries. Firstly, I think that trying to isolate library bugs in subprocesses could get to work, but I am not sure about how to handle deadlocks or infinite loops. I feel that using a timer is the only way to deal with it but I would like to know what you think about it. Secondly, I have been reading the source code of

rc7 bug? [Was: deliver LDA and INBOX location] (fwd)

2006 Sep 05

rc7 bug? [Was: deliver LDA and INBOX location] (fwd)

Anyone had any thoughts on the item below? If the problem is with my config, I'd like to be guided towards how I might resolve it. If it is a bug in rc7, it would be good to fix it, and I'd be happy to beta-test. -- : David Lee I.T. Service : : Senior Systems Programmer Computer Centre : :

"Complex?" import of pdf files (criminal records) into R table

2009 Oct 15

"Complex?" import of pdf files (criminal records) into R table

Hi there, I'm facing the decision if it would be possible to transform several more or less complex pdf files into an R Table-Format or if it has to be done manually. I think it would be a impudent to expect a complete solution, but I would be grateful if anyone could give me an advice on how the structure of such a R-program could look like, and if it's possible in general. Here

Reading PDF files

2012 Dec 02

Reading PDF files

I need to do text mining on PDF files. I understand there is a readPDF command in tm that can be used. Have read the 2008 posts on converting PDF files to text by Tony Breyal and others. Wondering if the procedure has been standardized in any tutorial or otherwise? Being new to R, I was able to follow only part of the discussion. Any way to get a set of step by step instructions

reading data from a pdf

2005 Oct 22

reading data from a pdf

> Hi, I'm trying to read data from a PDF file.Is it possible to do it > with R? Thanks, Marco If cut and paste to a text file fails, try this: pdftotext (from the xpdf project) or http://pdftohtml.sourceforge.net pdftohtml is a utility which converts PDF files into HTML and XML formats In addition, pdftk, the command line pdf toolkit may be useful http://www.accesspdf.com/pdftk/

Windows PC PostScript printer driver -> CUPS data import fails

2018 Apr 12

Windows PC PostScript printer driver -> CUPS data import fails

Yan Li wrote: > On 04/12/2018 03:08 AM, Gary Stainburn wrote: >> The PDF contains: >> >> ERROR: invalidfileaccess >> OFFENDING COMMAND: .findfont >> OPERAND STACK: >> r >> /usr/share/X11/fonts/Type1/UTBI____.pfa >> --nostringval-- >> true >> NimbusMonL-Regu >> Courier >> --nostringval-- >> Courier >> 4544317

parsing pdf files

2010 Jan 09

parsing pdf files

I have a pdf file that I would like to parse into R: http://www.williams.edu/Registrar/geninfo/faculty.pdf For now, I open the file in Acrobat by hand, then save it "as text" and then use readLines(). That works fine but a) I am concerned that some information may be lost and b) I may be doing this a lot, so I would rather have R grab the information from the pdf file directly. So: is

Windows PC PostScript printer driver -> CUPS data import fails

2018 Apr 12

Windows PC PostScript printer driver -> CUPS data import fails

Hi all, For some years now I have been using a simple system I found online which allows me to easily import data from Windows Programs. Hopefully others out there are using the system and already have found the answer to my problem. I have installed on my Centos server a virtual CUPS printer which receives a PS file, and then runs 'ps2pdf' and 'pdftotext -layout' to end up

PATCH: sftp-server logging.

2002 Mar 15

PATCH: sftp-server logging.

This is another take on logging for sftp-server. Given the number of private email requests I've received for this patch, I assume there is signifigant enough interest to request it be reviewed for inclusion into the release. The patch is against 3.1p1, and is completely disabled by default. To enable logging, one must use compile time directives (-DSFTP_LOGGING). This was done due to prior

how to index the result of any instance method

2006 May 22

how to index the result of any instance method

Hi, One of the AAF features is to be able to index results of methods, but I haven''t seen anywhere how to do this. I have a method that returns the full text of a file and I''d like for this to be indexed. Can anyone out there help me out on this one? Tom -- Posted via http://www.ruby-forum.com/.

Authentication problems Still

2002 Feb 14

Authentication problems Still

Hi All, I've tried everything I can think of, and spent many many hours trying to get this Sun Solaris 8 machine to authenticate with a Linux 7.1 machine acting as the PDC. In summary: Both machines are running 2.2.3a compiled from source without additional arguments (just ran configure on each platform) No problems with Win95, Win98, WinNT authenticating with the Linux PDC. The Sun machine

similar to: Reading a password-protected PDF