On Tue, Sep 18, 2012 at 12:19:08PM +1000, linbloke
wrote:> For a given keynote file called testxyz.key:
>
> cp textxyz.key textxyz.key.zip
> mkdir textxyz.key.tmp
> cd textxyz.key.tmp
> unzip ../textxyz.key.zip
>
> All text within the keynote file is stored in an xml file called
> index.apxl. The following adds newlines after xml tag closures and
> then filters xml tags, filters some > garbage, leaving only the text
> from the keynote file.
>
> cat index.apxl | perl -pe 's/>/>\n/g' | perl -pe
's/<(.*?)>//g' | strings | grep -v '\>' >
testxyz.key.txt
You can unpack just that file on the fly from the .key file using
unzip -p, simplifying all the commands above to a single pipeline:
unzip -p textxyz.key index.apxl | perl -pe 's/>/>\n/g' | perl -pe
's/<(.*?)>//g' | strings | grep -v '\>' >
testxyz.key.txt
> Probably a better way to do it would be with an xml parser but that's
> beyond me. Please CC me with comments.
Yeah, the > is a ">" character escaped in XML, and
you'll see other
characters escaped like this too.
Do you have any example keynote files with a liberal licence?
Cheers,
Olly