Hi, I have a bunch of LaTeX files which I would like to include into my web site as individual pages. Thanks to MathJax, this can be done almost automatically, or at least just by essentially copy-and-pasting the main {document} environment into the body of a .html file. So far, so good. I would like to teach webgen to do all that automatically: if I drop a .tex file into the src/ directory, it should parse it for metadata like title, date ..., transform it into the proper .html file, with the same template as for the .page files, include it in the menu and so on. Trying to figure out how to do that, I got lost in the documentation. Where should I start ? Any pointers ? Thanks for your help, /vincent -- | | UMPA - ENS Lyon | M?l: vbeffara at ens-lyon.fr | | Vincent Beffara | 46 all?e d''Italie | T?l: (+33) 4 72 72 85 25 | | | 69364 Lyon Cedex 07 | Fax: (+33) 4 72 72 84 80 |
On 2011-02-15 14:38 +0100 Vincent Beffara wrote:> I would like to teach webgen to do all that automatically: if I drop a > .tex file into the src/ directory, it should parse it for metadata > like title, date ..., transform it into the proper .html file, with > the same template as for the .page files, include it in the menu and > so on. Trying to figure out how to do that, I got lost in the > documentation. > > Where should I start ? Any pointers ?* First you need to provide a content processor for your tex files that parses the .tex files and converts them to HTML. webgen has no built-in content processor to do that - you will need to code it yourself. Have a look at the API documentation for [Webgen::ContentProcessor][1] which provides general information and an example. * Then you need to tell the source handler ''page'' to use the .tex files together with your new content processor (I''ll call it `texproc`). This can be done by adding the following to your config.yaml: patterns: Page: add: [**/*.tex] After that you need to create a src/texfiles.metainfo file with the following content: --- name:paths /**/*.tex: blocks: default: pipeline: texproc This allows you to: * Drop in a .tex anywhere in your src/ directory and have it processed via the source handler ''page'' like any .page file. * Set the `template` meta information like with normal .page files. Or, for that matter, any other meta information you need (like `in_menu). * And, since webgen assumes that .tex files are in Webgen Page Format, you can add a meta information block at the beginning with needed meta information. Or just use a meta information file for this purpose so that the .tex files don''t need to be edited. The default processing pipeline used for .tex files is set to your newly created tex processor via the meta information file. BUT be aware that, since we are using the source handler ''page'' for handling your .tex files, it assumes that .tex files are in the Webgen Page Format (like .page files). Therefore you would need to escape three dashes at the beginning of a line. However I don''t think that this happens in .tex files anyway... Let me know if you need more help/information/pointers! Best regards, Thomas [1]: http://webgen.rubyforge.org/documentation/rdoc/Webgen/ContentProcessor.html
Hi Thomas, and thanks for the advice ! It already gives quite acceptable results after just a few tries. I''m glad I chose webgen for my website. TL> * First you need to provide a content processor for your tex files TL> that parses the .tex files and converts them to HTML. webgen has TL> no built-in content processor to do that - you will need to code TL> it yourself. Mkay, that step will be the longest because of all the tweaking necessary, but at least I can manage. Transforming LaTeX into markdown is not too difficult in first aproximation, and then kramdown will do the rest. TL> * Then you need to tell the source handler ''page'' to use the .tex TL> files together with your new content processor (I''ll call it TL> `texproc`). Great. This works as expected. TL> * Drop in a .tex anywhere in your src/ directory and have it TL> processed via the source handler ''page'' like any .page file. That works very well. Only thing is, the generated page is still named whatever.tex, how can I have it renamed whatever.html instead ? TL> * Set the `template` meta information like with normal .page TL> files. Or, for that matter, any other meta information you need TL> (like `in_menu). This I could not figure out how to do. I know how to parse the file to extract the title, but how do I tell webgen that this is it? Probably something like context[''title'']="Whatever" in the content processor, but I couldn''t figure it out. TL> * And, since webgen assumes that .tex files are in Webgen Page TL> Format, you can add a meta information block at the beginning with TL> needed meta information. Or just use a meta information file for TL> this purpose so that the .tex files don''t need to be edited. I would rather not touch the .tex file in that way, and have webgen extract the relevant info automatically instead. Thanks again ! Cheers, /vincent -- | | UMPA - ENS Lyon | M?l: vbeffara at ens-lyon.fr | | Vincent Beffara | 46 all?e d''Italie | T?l: (+33) 4 72 72 85 25 | | | 69364 Lyon Cedex 07 | Fax: (+33) 4 72 72 84 80 |
Hi again, TL> * Drop in a .tex anywhere in your src/ directory and have it TL> processed via the source handler ''page'' like any .page file. VB> That works very well. Only thing is, the generated page is still VB> named whatever.tex, how can I have it renamed whatever.html instead VB> ? OK, I got this one by tweaking the texfiles.metainfo file. Any way to achieve the same thing (declaring that .tex files should be treated, setting the output file name) by editing config.yaml instead of creating such a metainfo file ? Maybe it''s just me but I like having the whole configuration in a single file ... TL> * Set the `template` meta information like with normal .page TL> files. Or, for that matter, any other meta information you need TL> (like `in_menu). VB> This I could not figure out how to do. I know how to parse the file VB> to extract the title, but how do I tell webgen that this is it? VB> Probably something like context[''title'']="Whatever" in the content VB> processor, but I couldn''t figure it out. On this I am still looking ... But I am already quite happy with the results. Thanks again ! /v -- | | UMPA - ENS Lyon | M?l: vbeffara at ens-lyon.fr | | Vincent Beffara | 46 all?e d''Italie | T?l: (+33) 4 72 72 85 25 | | | 69364 Lyon Cedex 07 | Fax: (+33) 4 72 72 84 80 |
On 2011-02-16 21:56 +0100 Vincent Beffara wrote:> OK, I got this one by tweaking the texfiles.metainfo file. Any way to > achieve the same thing (declaring that .tex files should be treated, > setting the output file name) by editing config.yaml instead of > creating such a metainfo file ? Maybe it''s just me but I like having > the whole configuration in a single file ...I don''t think so. Most of the "configuration" is done by setting some meta information on the files. There are some configuration options for setting default meta information but that''s it. The `output_path_style` meta information determines the output path and it, if not changed for all files handled by a source handler, is changed via a metainfo file.> VB> This I could not figure out how to do. I know how to parse the > VB> file to extract the title, but how do I tell webgen that this is > VB> it? Probably something like context[''title'']="Whatever" in the > VB> content processor, but I couldn''t figure it out.Hmmm... that won''t work: The meta information for a file is set when creating the internal node representation for the file and should not be changed afterwards (for consistency). Since the content processor is run when writing the file to the destination, it is too late to extract meta information from the file because other files may have already been written out and may already have used, for example, the ''title'' meta information. You can work around this problem by writing your on source handler. Just copy the implementation of the page handler and, in #create_node, parse your .tex file, extract and set the needed meta information and then create the node (ie. the internal representation). This would also allow you to work around the problem with setting the output path since you could use the default meta information to set it. -- Thomas
>> Maybe it''s just me but I like having the whole configuration in a >> single file ...TL> I don''t think so. Most of the "configuration" is done by setting TL> some meta information on the files. There are some configuration TL> options for setting default meta information but that''s it. The TL> `output_path_style` meta information determines the output path and TL> it, if not changed for all files handled by a source handler, is TL> changed via a metainfo file. Fair enough, that makes sense. As long as usage is convenient - which it is, most definitely. TL> Since the content processor is run when writing the file to the TL> destination, it is too late to extract meta information from the TL> file because other files may have already been written out and may TL> already have used, for example, the ''title'' meta information. Ah, good point, I hadn''t thought of that. Even if I had managed to do what I was looking for, the result would have been wrong anyway. TL> You can work around this problem by writing your on source handler. Thanks for the advice, I will try to do that. Somehow it feels a bit more involved than a content processor, but also more like the right way to do it. Cheers, /v -- | | UMPA - ENS Lyon | M?l: vbeffara at ens-lyon.fr | | Vincent Beffara | 46 all?e d''Italie | T?l: (+33) 4 72 72 85 25 | | | 69364 Lyon Cedex 07 | Fax: (+33) 4 72 72 84 80 |
Hello, TL> You can work around this problem by writing your on source TL> handler. VB> Thanks for the advice, I will try to do that. Somehow it feels a VB> bit more involved than a content processor, but also more like VB> the right way to do it. I finally managed to do just that, and it works beautifully. Thanks again ! (Finally I ended up generating html directly rather than markdown, because escaping the _ and * from latex files was a bit hit-and-miss ...) It looks like this, http://www.umpa.ens-lyon.fr/~vbeffara/articles/Cardy_Easy.html and actually mathjax is doing most of the work for me. I did have to tweak the .tex file a little bit to make parsing it easier, but I can live with that. I have a ruby question though, to see if I got everything right. Page derives from Base, and I wanted to derive LaTeX from Page because it is mostly a tweak on it, just to extract one piece of metadata. But extracting it before calling super is overridden, and extracting it afterwards is too late ... so I ended up doing this (found by googling, I don''t really get what happens here): | module Webgen::SourceHandler | class LaTeX < Page | def create_node(path) | page = page_from_path(path) | path.meta_info[''lang''] ||= website.config[''website.lang''] | path.ext = ''html'' if path.ext == ''tex'' | | page.blocks[1].content.lines do |l| | if l.match(/\\title\{(.*)\}/): path.meta_info[''title''] = $1 end | end | | path.meta_info[''title''].gsub!( /\\[A-Za-z]* */, '' '' ) | | Base.instance_method(:create_node).bind(self).call(path) do |node| | node.node_info[:sh_page_node_mi] = \ Webgen::Page.meta_info_from_data(path.io.data) | node.node_info[:page] = page | end | end | end | end Is there a better way to do the last bit ? Thanks again for everyhting ! /v -- | | UMPA - ENS Lyon | M?l: vbeffara at ens-lyon.fr | | Vincent Beffara | 46 all?e d''Italie | T?l: (+33) 4 72 72 85 25 | | | 69364 Lyon Cedex 07 | Fax: (+33) 4 72 72 84 80 |
On 2011-02-24 12:19 +0100 Vincent Beffara wrote:> I have a ruby question though, to see if I got everything right. Page > derives from Base, and I wanted to derive LaTeX from Page because it > is mostly a tweak on it, just to extract one piece of metadata. But > extracting it before calling super is overridden, and extracting it > afterwards is too late ... so I ended up doing this (found by > googling, I don''t really get what happens here): > > | module Webgen::SourceHandler > | class LaTeX < Page > | def create_node(path) > | page = page_from_path(path) > | path.meta_info[''lang''] ||= website.config[''website.lang''] > | path.ext = ''html'' if path.ext == ''tex'' > | > | page.blocks[1].content.lines do |l| > | if l.match(/\\title\{(.*)\}/): path.meta_info[''title''] = $1 > end | end > | > | path.meta_info[''title''].gsub!( /\\[A-Za-z]* */, '' '' ) > | > | Base.instance_method(:create_node).bind(self).call(path) do > |node| | node.node_info[:sh_page_node_mi] = \ > Webgen::Page.meta_info_from_data(path.io.data) > | node.node_info[:page] = page > | end > | end > | end > | end > > Is there a better way to do the last bit ?I don''t think that there is a better way to do this. I will add a TODO item to restructure the code to make such tweaks easier! What `Base.instance_method...` does is that it retrieves the `create_node` method from the Webgen::Base class, ie. the actual implementation. This returns an object of class Method. Such a Method object can be bound to an instance (done with `bind(self)` and then called. This allows you to directly invoke the Webgen::Base#create_node method which would not be possible with `super(...)`. -- Thomas