Displaying 3 results from an estimated 3 matches for "sxw2text".
2008 Jul 30
3
Dealing with image PDF's
Guys,
I was just playing around and added a bit of code to omindex.cc so I
could ocr tiff and tif with gocr which seems to work. Here's what it
looks like:
// Tiff:
} else if (startswith(mimetype, "image/tif"))
{
// Inspired by http://mjr.towers.org.uk/comp/sxw2text
string safefile = shell_protect(file);
string cmd = "tifftopnm " + safefile + " | gocr -f UTF8 -";
try {
dump = stdout_to_string(cmd);
} catch (ReadError) {
cout << "\"" << cmd << "\" failed - skipping\n&...
2008 Jul 30
3
Dealing with image PDF's
Guys,
I was just playing around and added a bit of code to omindex.cc so I
could ocr tiff and tif with gocr which seems to work. Here's what it
looks like:
// Tiff:
} else if (startswith(mimetype, "image/tif"))
{
// Inspired by http://mjr.towers.org.uk/comp/sxw2text
string safefile = shell_protect(file);
string cmd = "tifftopnm " + safefile + " | gocr -f UTF8 -";
try {
dump = stdout_to_string(cmd);
} catch (ReadError) {
cout << "\"" << cmd << "\" failed - skipping\n&...
2009 Feb 03
1
PowerPoint 2007 filter
...see the other command commented out that also extracts notes
and comments from the powerpoint file.
// Start: PowerPoint 2007 .pptx
} else if (startswith(mimetype,
"application/vnd.openxmlformats-officedocument.presentationml."))
{
// Inspired by http://mjr.towers.org.uk/comp/sxw2text
string safefile = shell_protect(file);
/* string cmd = "unzip -p " + safefile + " ppt/slides/slide*.xml
ppt/notesSlides/notesSlide*.xml ppt/comments/comment*.xml"; */
string cmd = "unzip -p " + safefile + " ppt/slides/slide*.xml";
try {...