Hi, I have several hundred pages of text, all carefully marked up with textile, which I use redcloth to convert to html for display. Now I find I need all these pages to alternatively output plain text - ie. A-Za-z0-9 and simple punctuation only. Any suggestions on the best way to do this? My only thought is to let redcloth do its stuff, and then strip out the html tags - but it feels wrong. Thanks, -- Tony
On 5/27/07, Tony White <tony.mzungu at gmail.com> wrote:> I have several hundred pages of text, all carefully marked up with > textile, which I use redcloth to convert to html for display. Now I > find I need all these pages to alternatively output plain text - ie. > A-Za-z0-9 and simple punctuation only. Any suggestions on the best > way to do this? My only thought is to let redcloth do its stuff, and > then strip out the html tags - but it feels wrong.That''s how I do it in acts_as_textiled. -- Chris Wanstrath http://errfree.com // http://errtheblog.com
On 28/05/07, Chris Wanstrath <chris at ozmm.org> wrote:> On 5/27/07, Tony White <tony.mzungu at gmail.com> wrote: > > > I have several hundred pages of text, all carefully marked up with > > textile, which I use redcloth to convert to html for display. Now I > > find I need all these pages to alternatively output plain text - ie. > > A-Za-z0-9 and simple punctuation only. Any suggestions on the best > > way to do this? My only thought is to let redcloth do its stuff, and > > then strip out the html tags - but it feels wrong. > > That''s how I do it in acts_as_textiled. >Thanks Chris - I''ll have a look at acts_as_textiled tomorrow - it''s 3am here, and the power''s gone off (for the 3rd time today :-( ) and my ups is about to die! -- Tony
Chris Wanstrath wrote:> That''s how I do it in acts_as_textiled.My initial thoughts were that this is a shame, because (at least with a dumb "strip out the tags" conversion) you''d lose the parts of the Textile formatting that work equally well in plain text, such as bullet point lists. But then I remembered that Textile can include HTML tags anyway, so converting from Textile to HTML, then from HTML to text, especially with an intelligent converter, is probably the most sane approach. There''s lots of prior art for the latter, especially given that you could (probably) assume the HTML output from the Textile engine to be clean. If I were to be adding PDF export to a Textile-based application I''d probably go via HTML for the same reasons. -- TTFN, Andrew Hodgkinson Find some electronic music at: Photos, wallpaper, software and more: http://pond.org.uk/music.html http://pond.org.uk/