On 12/14/19 7:28 PM, H wrote:> I have pdftotext 0.26.5, the current version for CentOS 7 and the Mate
desktop as far as I can ascertain. The page
https://www.xpdfreader.com/pdftotext-man.html seems to suggest that the latest
version is 4.02 which seems a gigantic leap ahead.
>
> Since I have a Chinese text PDF which I am unable to extract any text from
using pdftotext, instead I end up with a collection of garbage Latin characters,
I am curious how to get a later version? Copying and pasting from Atril 1.16.1
(seems to be part of the Mate desktop I am running) also makes me end up with
garbage... Not surprising since it also seems to use pdftotext 0.26.5...
>
> Any suggestions? Later version of pdftotext? If so, wherefrom? Another
PDF-viewer?
pdftotext is distributed as part of the poppler package, which as you
suggest is at 0.26.5. However, the latest version of poppler is 0.83.0.
And the man page for pdftotext on EL7 suggests it is at version 3.03,
which is not quite so dramatic a difference.
In any case, welcome to the joys of running an enterprise distribution.
You'll find newer versions in EL8 or Fedora. It's an integral core
component of the system so generally not updated lightly.
--
Orion Poplawski
Manager of NWRA Technical Systems 720-772-5637
NWRA, Boulder/CoRA Office FAX: 303-415-9702
3380 Mitchell Lane orion at nwra.com
Boulder, CO 80301 https://www.nwra.com/