Displaying 20 results from an estimated 500 matches similar to: "Reading a password-protected PDF"
2013 Mar 04
2
Need Beginner Guide for Matcher Optimisations Project
Hi,
While searching for a project which matches my interest andskill level, I
found this project named Matcher Optimization. This project is really
challenging and excting from my view point and I would like to be a part of
this project.
Optimization techniques metioned in the reference links provided will take
some time for me to have a good understanding about them. But I am trying
to get my
2013 Mar 02
2
Getting Started
Hello all,
I am Mohd Azeem. I want to contribute in Xapian but I am a newbie here. I wonder if anyone could help me in getting started with Xapian. I have some basic knowledge of IR and implemented TF*IDF and PageRank schemes, and also implemented Inverted Index and Web-Crawler.
regards,
Azeem
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
2013 Mar 02
3
How to add an custom weight to the relevancy value and sort it.
Hello guys,
I have an weight value which is calculated by some factor and i need to add
the weight with the relevancy value of a result and sort it with that value
is that possible in xapian.
Thanks,
VishnuKumar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20130302/9831e287/attachment-0001.html>
2008 Jul 30
3
Dealing with image PDF's
Guys,
I was just playing around and added a bit of code to omindex.cc so I
could ocr tiff and tif with gocr which seems to work. Here's what it
looks like:
// Tiff:
} else if (startswith(mimetype, "image/tif"))
{
// Inspired by http://mjr.towers.org.uk/comp/sxw2text
string safefile = shell_protect(file);
string cmd = "tifftopnm " + safefile + "
2008 Jul 30
3
Dealing with image PDF's
Guys,
I was just playing around and added a bit of code to omindex.cc so I
could ocr tiff and tif with gocr which seems to work. Here's what it
looks like:
// Tiff:
} else if (startswith(mimetype, "image/tif"))
{
// Inspired by http://mjr.towers.org.uk/comp/sxw2text
string safefile = shell_protect(file);
string cmd = "tifftopnm " + safefile + "
2006 Aug 11
3
Proposed changes to omindex
Proposed changes to omindex
Currently Available Items
=========================
1) Have the Q prefix contain the 16 byte MD5 of the full file name used for document lookup during
indexing.
2) Add the document?s last modified time to the value table (ID 0). This would allow incremental
indexing based on the timestamp and also sorting by date in omega (SORT=0)
a. Currently I store the timestamp
2006 Sep 15
3
RC7: BUG! and patch [Was: Re: rc7 bug? [Was: deliver LDA and INBOX location] (fwd)] (fwd)
Could someone confirm, please, that this bug report and its proposed fix
are being checked?
1. Is my analysis (message below) about right?
2. Is my proposed patch (attached) about right?
3. Is this being addressed for "rc8" (or whatever) and its successors?
Many thanks.
--
: David Lee I.T. Service :
: Senior Systems Programmer
2019 Dec 15
1
pdftotext latest version for CentOS 7
I have pdftotext 0.26.5, the current version for CentOS 7 and the Mate desktop as far as I can ascertain. The page https://www.xpdfreader.com/pdftotext-man.html seems to suggest that the latest version is 4.02 which seems a gigantic leap ahead.
Since I have a Chinese text PDF which I am unable to extract any text from using pdftotext, instead I end up with a collection of garbage Latin
2009 Apr 06
2
omindex => Unknown extension
Hi all,
I'm having a recurrent problem with Omega's indexing.
When I run omindex, it sometimes misses to recognize the extension of
some files (.doc, .pdf) and skips them. In the same run, omindex is
otherwise perfectly able to index other files with same extensions. The
reason is not clear but it should occur before it selects a content
converter since for example, if I manually run
2019 Mar 21
2
[GSoC] Questions about project Text-Extraction Libraries
Hello!
I have a few question related to the project Text-Extraction Libraries.
Firstly, I think that trying to isolate library bugs in subprocesses could
get to work, but I am not sure about how to handle deadlocks or infinite
loops. I feel that using a timer is the only way to deal with it but I
would like to know what you think about it.
Secondly, I have been reading the source code of
2006 Sep 05
2
rc7 bug? [Was: deliver LDA and INBOX location] (fwd)
Anyone had any thoughts on the item below?
If the problem is with my config, I'd like to be guided towards how I
might resolve it.
If it is a bug in rc7, it would be good to fix it, and I'd be happy to
beta-test.
--
: David Lee I.T. Service :
: Senior Systems Programmer Computer Centre :
:
2009 Oct 15
1
"Complex?" import of pdf files (criminal records) into R table
Hi there,
I'm facing the decision if it would be possible to transform several
more or less complex pdf files into an R Table-Format or if it has to be
done manually. I think it would be a impudent to expect a complete
solution, but I would be grateful if anyone could give me an advice on
how the structure of such a R-program could look like, and if it's
possible in general.
Here
2012 Dec 02
1
Reading PDF files
I need to do text mining on PDF files. I understand there is a readPDF
command in tm that can be used. Have read the 2008 posts on converting
PDF files to text by Tony Breyal and others.
Wondering if the procedure has been standardized in any tutorial or
otherwise? Being new to R, I was able to follow only part of the
discussion.
Any way to get a set of step by step instructions
2005 Oct 22
1
reading data from a pdf
> Hi, I'm trying to read data from a PDF file.Is it possible to do it
> with R? Thanks, Marco
If cut and paste to a text file fails, try this:
pdftotext (from the xpdf project)
or
http://pdftohtml.sourceforge.net
pdftohtml is a utility which converts PDF files into HTML and
XML formats
In addition, pdftk, the command line pdf toolkit may be useful
http://www.accesspdf.com/pdftk/
2018 Apr 12
2
Windows PC PostScript printer driver -> CUPS data import fails
Yan Li wrote:
> On 04/12/2018 03:08 AM, Gary Stainburn wrote:
>> The PDF contains:
>>
>> ERROR: invalidfileaccess
>> OFFENDING COMMAND: .findfont
>> OPERAND STACK:
>> r
>> /usr/share/X11/fonts/Type1/UTBI____.pfa
>> --nostringval--
>> true
>> NimbusMonL-Regu
>> Courier
>> --nostringval--
>> Courier
>> 4544317
2010 Jan 09
4
parsing pdf files
I have a pdf file that I would like to parse into R:
http://www.williams.edu/Registrar/geninfo/faculty.pdf
For now, I open the file in Acrobat by hand, then save it "as text"
and then use readLines(). That works fine but a) I am concerned that
some information may be lost and b) I may be doing this a lot, so I
would rather have R grab the information from the pdf file directly.
So: is
2018 Apr 12
2
Windows PC PostScript printer driver -> CUPS data import fails
Hi all,
For some years now I have been using a simple system I found online which
allows me to easily import data from Windows Programs.
Hopefully others out there are using the system and already have found the
answer to my problem.
I have installed on my Centos server a virtual CUPS printer which receives a
PS file, and then runs 'ps2pdf' and 'pdftotext -layout' to end up
2002 Mar 15
4
PATCH: sftp-server logging.
This is another take on logging for sftp-server. Given the number
of private email requests I've received for this patch, I assume
there is signifigant enough interest to request it be reviewed for
inclusion into the release.
The patch is against 3.1p1, and is completely disabled by default.
To enable logging, one must use compile time directives
(-DSFTP_LOGGING). This was done due to prior
2006 May 22
7
how to index the result of any instance method
Hi,
One of the AAF features is to be able to index results of methods, but I
haven''t seen anywhere how to do this. I have a method that returns the
full text of a file and I''d like for this to be indexed. Can anyone out
there help me out on this one?
Tom
--
Posted via http://www.ruby-forum.com/.
2002 Feb 14
4
Authentication problems Still
Hi All,
I've tried everything I can think of, and spent many many
hours trying to get this Sun Solaris 8 machine to authenticate
with a Linux 7.1 machine acting as the PDC.
In summary:
Both machines are running 2.2.3a compiled from source
without additional arguments (just ran configure on each platform)
No problems with Win95, Win98, WinNT authenticating with the Linux
PDC.
The Sun machine