thr3ads.net - webgen users - [webgen-users] Pipeline for LaTeX files [Feb 2011]

If this information is useful, please help other people find it:
Share via:

Vincent Beffara

2011-Feb-15 13:38 UTC

[webgen-users] Pipeline for LaTeX files

Hi,

I have a bunch of LaTeX files which I would like to include into my web
site as individual pages. Thanks to MathJax, this can be done almost
automatically, or at least just by essentially copy-and-pasting the main
{document} environment into the body of a .html file. So far, so good.

I would like to teach webgen to do all that automatically: if I drop a
.tex file into the src/ directory, it should parse it for metadata like
title, date ..., transform it into the proper .html file, with the same
template as for the .page files, include it in the menu and so
on. Trying to figure out how to do that, I got lost in the
documentation.

Where should I start ? Any pointers ?

Thanks for your help,

       /vincent

-- 
|                 |   UMPA - ENS Lyon   | M?l: vbeffara at ens-lyon.fr |
| Vincent Beffara |  46 all?e d''Italie  | T?l: (+33) 4 72 72 85 25  |
|                 | 69364 Lyon Cedex 07 | Fax: (+33) 4 72 72 84 80  |

Thomas Leitner

2011-Feb-16 07:20 UTC

head link

[webgen-users] Pipeline for LaTeX files

On 2011-02-15 14:38 +0100 Vincent Beffara wrote:> I would like to teach webgen to do all that automatically: if I drop a
> .tex file into the src/ directory, it should parse it for metadata
> like title, date ..., transform it into the proper .html file, with
> the same template as for the .page files, include it in the menu and
> so on. Trying to figure out how to do that, I got lost in the
> documentation.
> 
> Where should I start ? Any pointers ?
* First you need to provide a content processor for your tex files that
  parses the .tex files and converts them to HTML. webgen has no
  built-in content processor to do that - you will need to code it
  yourself.

  Have a look at the API documentation for
  [Webgen::ContentProcessor][1] which provides general information and
  an example.

* Then you need to tell the source handler ''page'' to use the
.tex files
  together with your new content processor (I''ll call it `texproc`).
  This can be done by adding the following to your config.yaml:

      patterns:
        Page:
          add: [**/*.tex]

  After that you need to create a src/texfiles.metainfo file with the
  following content:

      --- name:paths
      /**/*.tex:
        blocks:
          default:
            pipeline: texproc

This allows you to:

* Drop in a .tex anywhere in your src/ directory and have it processed
  via the source handler ''page'' like any .page file.

* Set the `template` meta information like with normal .page files. Or,
  for that matter, any other meta information you need (like `in_menu).

* And, since webgen assumes that .tex files are in Webgen Page Format,
  you can add a meta information block at the beginning with needed
  meta information. Or just use a meta information file for this
  purpose so that the .tex files don''t need to be edited.

The default processing pipeline used for .tex files is set to your
newly created tex processor via the meta information file.

BUT be aware that, since we are using the source handler
''page'' for
handling your .tex files, it assumes that .tex files are in the Webgen
Page Format (like .page files). Therefore you would need to escape
three dashes at the beginning of a line. However I don''t think that
this happens in .tex files anyway...

Let me know if you need more help/information/pointers!

Best regards,
  Thomas

[1]: http://webgen.rubyforge.org/documentation/rdoc/Webgen/ContentProcessor.html

Vincent Beffara

2011-Feb-16 14:50 UTC

head link

[webgen-users] Pipeline for LaTeX files

Hi Thomas, and thanks for the advice !

It already gives quite acceptable results after just a few tries. I''m
glad I chose webgen for my website.

TL> * First you need to provide a content processor for your tex files
TL>   that parses the .tex files and converts them to HTML. webgen has
TL>   no built-in content processor to do that - you will need to code
TL>   it yourself.

Mkay, that step will be the longest because of all the tweaking
necessary, but at least I can manage. Transforming LaTeX into markdown
is not too difficult in first aproximation, and then kramdown will do
the rest.

TL> * Then you need to tell the source handler ''page'' to
use the .tex
TL>   files together with your new content processor (I''ll call it
TL>   `texproc`).

Great. This works as expected.

TL> * Drop in a .tex anywhere in your src/ directory and have it
TL>   processed via the source handler ''page'' like any
.page file.

That works very well. Only thing is, the generated page is still named
whatever.tex, how can I have it renamed whatever.html instead ?

TL> * Set the `template` meta information like with normal .page
TL>   files. Or, for that matter, any other meta information you need
TL>   (like `in_menu).

This I could not figure out how to do. I know how to parse the file to
extract the title, but how do I tell webgen that this is it? Probably
something like context[''title'']="Whatever" in the
content processor, but
I couldn''t figure it out.

TL> * And, since webgen assumes that .tex files are in Webgen Page
TL>   Format, you can add a meta information block at the beginning with
TL>   needed meta information. Or just use a meta information file for
TL>   this purpose so that the .tex files don''t need to be edited.

I would rather not touch the .tex file in that way, and have webgen
extract the relevant info automatically instead.

Thanks again !

Cheers,

        /vincent

-- 
|                 |   UMPA - ENS Lyon   | M?l: vbeffara at ens-lyon.fr |
| Vincent Beffara |  46 all?e d''Italie  | T?l: (+33) 4 72 72 85 25  |
|                 | 69364 Lyon Cedex 07 | Fax: (+33) 4 72 72 84 80  |

Vincent Beffara

2011-Feb-16 20:56 UTC

head link

[webgen-users] Pipeline for LaTeX files

Hi again,

TL> * Drop in a .tex anywhere in your src/ directory and have it
TL> processed via the source handler ''page'' like any .page
file.

VB> That works very well. Only thing is, the generated page is still
VB> named whatever.tex, how can I have it renamed whatever.html instead
VB> ?

OK, I got this one by tweaking the texfiles.metainfo file. Any way to
achieve the same thing (declaring that .tex files should be treated,
setting the output file name) by editing config.yaml instead of creating
such a metainfo file ? Maybe it''s just me but I like having the whole
configuration in a single file ...

TL> * Set the `template` meta information like with normal .page
TL> files. Or, for that matter, any other meta information you need
TL> (like `in_menu).

VB> This I could not figure out how to do. I know how to parse the file
VB> to extract the title, but how do I tell webgen that this is it?
VB> Probably something like
context[''title'']="Whatever" in the content
VB> processor, but I couldn''t figure it out.

On this I am still looking ... But I am already quite happy with the
results.

Thanks again !

       /v

-- 
|                 |   UMPA - ENS Lyon   | M?l: vbeffara at ens-lyon.fr |
| Vincent Beffara |  46 all?e d''Italie  | T?l: (+33) 4 72 72 85 25  |
|                 | 69364 Lyon Cedex 07 | Fax: (+33) 4 72 72 84 80  |

Thomas Leitner

2011-Feb-17 08:34 UTC

head link

[webgen-users] Pipeline for LaTeX files

On 2011-02-16 21:56 +0100 Vincent Beffara wrote:> OK, I got this one by tweaking the texfiles.metainfo file. Any way to
> achieve the same thing (declaring that .tex files should be treated,
> setting the output file name) by editing config.yaml instead of
> creating such a metainfo file ? Maybe it''s just me but I like
having
> the whole configuration in a single file ...
I don''t think so. Most of the "configuration" is done by
setting some
meta information on the files. There are some configuration options for
setting default meta information but that''s it. The `output_path_style`
meta information determines the output path and it, if not changed for
all files handled by a source handler, is changed via a metainfo file.
 > VB> This I could not figure out how to do. I know how to parse the
> VB> file to extract the title, but how do I tell webgen that this is
> VB> it? Probably something like
context[''title'']="Whatever" in the
> VB> content processor, but I couldn''t figure it out.
Hmmm... that won''t work: The meta information for a file is set when
creating the internal node representation for the file and should not
be changed afterwards (for consistency). Since the content processor is
run when writing the file to the destination, it is too late to extract
meta information from the file because other files may have already
been written out and may already have used, for example, the
''title''
meta information.

You can work around this problem by writing your on source handler.
Just copy the implementation of the page handler and, in #create_node,
parse your .tex file, extract and set the needed meta information and
then create the node (ie. the internal representation). This would also
allow you to work around the problem with setting the output path since
you could use the default meta information to set it.

-- Thomas

Vincent Beffara

2011-Feb-17 12:41 UTC

head link

[webgen-users] Pipeline for LaTeX files

>> Maybe it''s just me but I like having the whole configuration
in a
>> single file ...
TL> I don''t think so. Most of the "configuration" is done
by setting
TL> some meta information on the files. There are some configuration
TL> options for setting default meta information but that''s it. The
TL> `output_path_style` meta information determines the output path and
TL> it, if not changed for all files handled by a source handler, is
TL> changed via a metainfo file.

Fair enough, that makes sense. As long as usage is convenient - which it
is, most definitely.

TL> Since the content processor is run when writing the file to the
TL> destination, it is too late to extract meta information from the
TL> file because other files may have already been written out and may
TL> already have used, for example, the ''title'' meta
information.

Ah, good point, I hadn''t thought of that. Even if I had managed to do
what I was looking for, the result would have been wrong anyway.

TL> You can work around this problem by writing your on source handler.

Thanks for the advice, I will try to do that. Somehow it feels a bit
more involved than a content processor, but also more like the right way
to do it.

Cheers,

        /v

-- 
|                 |   UMPA - ENS Lyon   | M?l: vbeffara at ens-lyon.fr |
| Vincent Beffara |  46 all?e d''Italie  | T?l: (+33) 4 72 72 85 25  |
|                 | 69364 Lyon Cedex 07 | Fax: (+33) 4 72 72 84 80  |

Vincent Beffara

2011-Feb-24 11:19 UTC

head link

[webgen-users] Pipeline for LaTeX files

Hello,

    TL> You can work around this problem by writing your on source
    TL> handler.

    VB> Thanks for the advice, I will try to do that. Somehow it feels a
    VB> bit more involved than a content processor, but also more like
    VB> the right way to do it.

I finally managed to do just that, and it works beautifully. Thanks
again ! (Finally I ended up generating html directly rather than
markdown, because escaping the _ and * from latex files was a bit
hit-and-miss ...) It looks like this,

http://www.umpa.ens-lyon.fr/~vbeffara/articles/Cardy_Easy.html

and actually mathjax is doing most of the work for me. I did have to
tweak the .tex file a little bit to make parsing it easier, but I can
live with that.

I have a ruby question though, to see if I got everything right. Page
derives from Base, and I wanted to derive LaTeX from Page because it is
mostly a tweak on it, just to extract one piece of metadata. But
extracting it before calling super is overridden, and extracting it
afterwards is too late ... so I ended up doing this (found by googling,
I don''t really get what happens here):

| module Webgen::SourceHandler
|   class LaTeX < Page
|     def create_node(path)
|       page = page_from_path(path)
|       path.meta_info[''lang''] ||=
website.config[''website.lang'']
|       path.ext = ''html'' if path.ext ==
''tex''
|       
|       page.blocks[1].content.lines do |l|
|         if l.match(/\\title\{(.*)\}/):
path.meta_info[''title''] = $1 end
|       end
| 
|       path.meta_info[''title''].gsub!( /\\[A-Za-z]* */,
'' '' )
| 
|       Base.instance_method(:create_node).bind(self).call(path) do |node|
|         node.node_info[:sh_page_node_mi] = \
            Webgen::Page.meta_info_from_data(path.io.data)
|         node.node_info[:page] = page
|       end
|     end
|   end
| end

Is there a better way to do the last bit ?

Thanks again for everyhting !

       /v

-- 
|                 |   UMPA - ENS Lyon   | M?l: vbeffara at ens-lyon.fr |
| Vincent Beffara |  46 all?e d''Italie  | T?l: (+33) 4 72 72 85 25  |
|                 | 69364 Lyon Cedex 07 | Fax: (+33) 4 72 72 84 80  |

Thomas Leitner

2011-Feb-24 14:47 UTC

head link

[webgen-users] Pipeline for LaTeX files

On 2011-02-24 12:19 +0100 Vincent Beffara wrote:> I have a ruby question though, to see if I got everything right. Page
> derives from Base, and I wanted to derive LaTeX from Page because it
> is mostly a tweak on it, just to extract one piece of metadata. But
> extracting it before calling super is overridden, and extracting it
> afterwards is too late ... so I ended up doing this (found by
> googling, I don''t really get what happens here):
> 
> | module Webgen::SourceHandler
> |   class LaTeX < Page
> |     def create_node(path)
> |       page = page_from_path(path)
> |       path.meta_info[''lang''] ||=
website.config[''website.lang'']
> |       path.ext = ''html'' if path.ext ==
''tex''
> |       
> |       page.blocks[1].content.lines do |l|
> |         if l.match(/\\title\{(.*)\}/):
path.meta_info[''title''] = $1
> end |       end
> | 
> |       path.meta_info[''title''].gsub!( /\\[A-Za-z]* */,
'' '' )
> | 
> |       Base.instance_method(:create_node).bind(self).call(path) do
> |node| |         node.node_info[:sh_page_node_mi] = \
>             Webgen::Page.meta_info_from_data(path.io.data)
> |         node.node_info[:page] = page
> |       end
> |     end
> |   end
> | end
> 
> Is there a better way to do the last bit ?
I don''t think that there is a better way to do this. I will add a TODO
item to restructure the code to make such tweaks easier!

What `Base.instance_method...` does is that it retrieves the
`create_node` method from the Webgen::Base class, ie. the actual
implementation. This returns an object of class Method. Such a Method
object can be bound to an instance (done with `bind(self)` and then
called. This allows you to directly invoke the Webgen::Base#create_node
method which would not be possible with `super(...)`. 

-- Thomas

webgen users - Feb 2011 - Pipeline for LaTeX files

[webgen-users] Pipeline for LaTeX files

[webgen-users] Pipeline for LaTeX files

[webgen-users] Pipeline for LaTeX files

[webgen-users] Pipeline for LaTeX files

[webgen-users] Pipeline for LaTeX files

[webgen-users] Pipeline for LaTeX files

[webgen-users] Pipeline for LaTeX files

[webgen-users] Pipeline for LaTeX files