Hello respected developers, I was wondering if it is possible for xapian to read a password-protected PDF. Searches in the archives and google had yield 0 results. I also tried looking at the source code but I could not find the specific one related to this issue. The characteristic of the set of PDF is as: 1. a set of password protected PDF documents 2. all PDF is set with the same password. 3. only the content of the PDF is encrypted, not the metadata. If it is possible could you guys point me in the right direction. Thanks and warm regards, Zaim Zuhuri -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20130227/e6b8488e/attachment.html>
Hello, Because the content is encrypted thus the crawler may not see the content. But it may provide the search for the document on the basis of metadata (name or other metadata info). regards, Azeem ________________________________ From: Zaim Zuhuri <mzaimz at gmail.com> To: xapian-devel at lists.xapian.org Sent: Wednesday, 27 February 2013 12:36 PM Subject: [Xapian-devel] Reading a password-protected PDF Hello respected developers, I was wondering if it is possible for xapian to read a password-protected PDF. Searches in the archives and google had yield 0 results. I also tried looking at the source code but I could not find the specific one related to this issue. The characteristic of the set of PDF is as: 1. a set of password protected PDF documents 2. all PDF is set with the same password. 3. only the content of the PDF is encrypted, not the metadata.? If it is possible could you guys point me in the right direction. Thanks and warm regards, Zaim Zuhuri _______________________________________________ Xapian-devel mailing list Xapian-devel at lists.xapian.org http://lists.xapian.org/mailman/listinfo/xapian-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20130302/c5225c3d/attachment.html>
On Wed, Feb 27, 2013 at 03:06:29PM +0800, Zaim Zuhuri wrote:> I was wondering if it is possible for xapian to read a password-protected > PDF.[...]> 2. all PDF is set with the same password. > 3. only the content of the PDF is encrypted, not the metadata. > > If it is possible could you guys point me in the right direction.Xapian runs pdftotext to extract text from PDF files, so the question really is "can pdftotext read a password-protected PDF?" Looking at pdftotext --help, I see: -opw <string> : owner password (for encrypted files) -upw <string> : user password (for encrypted files) Not sure what the difference is, but I'd try both and see which works. So I'd try creating a simple wrapper script so when omindex runs pdftotext it runs your wrapper instead, which runs pdftotext with extra command line arguments: #!/bin/sh exec /usr/bin/pdftotext -upw 'secret-password' "$@" Save that as (say) /home/zaim/pdftotext-wrapper/pdftotext, then make it executable and add that directory to PATH before you run omindex: chmod a+x /home/zaim/pdftotext-wrapper/pdftotext env PATH="/home/zaim/pdftotext-wrapper:$PATH" omindex [...] Cheers, Olly