Am I doing something wrong or have I hit a bug in the JRuby version of RedCloth? RedCloth version 4.2.2 Program: require ''rubygems'' require ''redcloth'' str = ''bl?b?rgr?d'' puts ''String : '' + str puts ''HTML : '' + RedCloth.new(str).to_html() In "ruby 1.8.6 (2008-08-11 patchlevel 287) [i386-mswin32]" the output is String : bl?b?rgr?d HTML : <p>bl?b?rgr?d</p> In "jruby 1.4.0 (ruby 1.8.7 patchlevel 174) (2009-11-02 69fbfa3) (Java HotSpot(TM) Client VM 1.6.0_17) [x86-java]" the output is String : bl?b?rgr?d HTML : <p>bl</p> As you can see, when running i JRuby, parsing of the string seems to have stopped, when the first Danish national character, "?", was met. Claus -- Posted via http://www.ruby-forum.com/.
Claus Folke Brobak wrote:> Am I doing something wrong or have I hit a bug in the JRuby version of > RedCloth? > > RedCloth version 4.2.2 >I should add that I am on Windows XP. Claus -- Posted via http://www.ruby-forum.com/.
Hi Claus, I hit the same issue (Linux, Redcloth 4.2.3). My workaround was to html-escape non-standard characters before passing them to redcloth. It works but my solution is a little bit fragile. Did you find a proper solution? Best, Georg -- Posted via http://www.ruby-forum.com/.
Hi Georg, I went the same went. Eventually I managed to kind of solve the problem, I have the proper jar generated which handles UTF characters properly. What basicly what needs to be done is to use char (16-bit) datatype in Ragel code instead of byte (8-bit). You can take a look here at my work: http://github.com/kowalski/redcloth Problem is, that when you run rspec now on the new code, there is a number of tests that fails. The difference is the extra whitespaces added to the resulting html code. For me it is no harm so I have this jar working on production for a few months now. Cheers, Marek Kowalski 2010/10/18 Georg M. <lists at ruby-forum.com>:> Hi Claus, > > I hit the same issue (Linux, Redcloth 4.2.3). > > My workaround was to html-escape non-standard characters before passing > them to redcloth. It works but my solution is a little bit fragile. Did > you find a proper solution? > > Best, Georg > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Redcloth-upwards mailing list > Redcloth-upwards at rubyforge.org > http://rubyforge.org/mailman/listinfo/redcloth-upwards >
Great work, Marek! I pulled your work into a branch: jruby-mbc The problem, as you pointed out, is extra whitespace. I''m hoping you or someone else can help me get it figured out so I can release it! It seems to be just when there''s HTML in the input. (At least that''s all I''ve found so far.) When it''s a standalone HTML tag (just a block tag on a line), it puts two BRs after. When it''s an HTML block (start tag, contents, end tag), it puts the BR inside the beginning of the next block. When just one newline ends the document, it puts a BR inside the end of the last block; two newlines before EOF behave fine though.> <div>html_block</div> > > This is a paragraph with > a line break > > standalone_html coming up. > > <div> > > Another p > > <div>test</div> > > Another p.Results in:> <div>html_block</div> > <p><br /> > This is a paragraph with<br /> > a line break</p> > <p>standalone_html coming up.</p> > <div><br /> > <br /> > > <p>Another p</p> > <div>test</div> > <p><br /> > Another p.<br /> > </p>Weird, huh? I''d greatly appreciate anyone who can help this Java dunce (me). Here''s the fast way to get it checked out and set up:> git clone git at github.com:jgarber/redcloth.git > cd redcloth > git checkout jruby-mbc > rvm use jruby-1.5.3 at redcloth # assuming you''re using rvm and you''ve done ''rvm install jruby'' > bundle > rake compileThanks! Jason On Oct 18, 2010, at 6:00 AM, Marek Kowalski wrote:> Hi Georg, > I went the same went. Eventually I managed to kind of solve the > problem, I have the proper jar generated which handles UTF characters > properly. What basicly what needs to be done is to use char (16-bit) > datatype in Ragel code instead of byte (8-bit). You can take a look > here at my work: > http://github.com/kowalski/redcloth > Problem is, that when you run rspec now on the new code, there is a > number of tests that fails. The difference is the extra whitespaces > added to the resulting html code. For me it is no harm so I have this > jar working on production for a few months now. > > Cheers, > Marek Kowalski > > 2010/10/18 Georg M. <lists at ruby-forum.com>: >> Hi Claus, >> >> I hit the same issue (Linux, Redcloth 4.2.3). >> >> My workaround was to html-escape non-standard characters before passing >> them to redcloth. It works but my solution is a little bit fragile. Did >> you find a proper solution? >> >> Best, Georg >> >> -- >> Posted via http://www.ruby-forum.com/. >> _______________________________________________ >> Redcloth-upwards mailing list >> Redcloth-upwards at rubyforge.org >> http://rubyforge.org/mailman/listinfo/redcloth-upwards >> > _______________________________________________ > Redcloth-upwards mailing list > Redcloth-upwards at rubyforge.org > http://rubyforge.org/mailman/listinfo/redcloth-upwards-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/redcloth-upwards/attachments/20101112/368f4ae4/attachment-0001.html>
Hey, I''m glad I could contribute a little :) I spent a lot of time trying to figure out the reason for extra whitespace and failed. I came to the conclusion that the newline coding for ragel had to be updated so that it matches the 32bit characters. I know nothing about Ragel so I gave up. Nowadays I''m doing python, so I cannot help any further. Still I guess that the whitespace problem should be far easier to spot and fix. Last but not least it could be just ignored, this is what I did, I just deployed the code as it is to the webapp I''ve been working on. Regards and good luck! Marek Kowalski 2010/11/12 Jason Garber <jg at jasongarber.com>:> Great work, Marek! ?I pulled your work into a branch:?jruby-mbc > The problem, as you pointed out, is extra whitespace. ?I''m hoping you or > someone else can help me get it figured out so I can release it! > It seems to be just when there''s HTML in the input. (At least that''s all > I''ve found so far.) ?When it''s a standalone HTML tag (just a block tag on a > line), it puts two BRs after. ?When it''s an HTML block (start tag, contents, > end tag), it puts the BR inside the beginning of the next block. ?When just > one newline ends the document, it puts a BR inside the end of the last > block; two newlines before EOF behave fine though. > > <div>html_block</div> > This is a paragraph with > a line break > standalone_html coming up. > <div> > Another p > <div>test</div> > Another p. > > Results in: > > <div>html_block</div> > <p><br /> > This is a paragraph with<br /> > a line break</p> > <p>standalone_html coming up.</p> > <div><br /> > <br /> > <p>Another p</p> > <div>test</div> > <p><br /> > Another p.<br /> > </p> > > Weird, huh? ?I''d greatly appreciate anyone who can help this Java dunce > (me). ?Here''s the fast way to get it checked out and set up: > > git clone?git at github.com:jgarber/redcloth.git > cd redcloth > git checkout jruby-mbc > rvm use jruby-1.5.3 at redcloth ?# assuming you''re using rvm and you''ve done > ''rvm install jruby'' > bundle > rake compile > > Thanks! > Jason > On Oct 18, 2010, at 6:00 AM, Marek Kowalski wrote: > > Hi Georg, > I went the same went. Eventually I managed to kind of solve the > problem, I have the proper jar generated which handles UTF characters > properly. What basicly what needs to be done is to use char (16-bit) > datatype in Ragel code instead of byte (8-bit). You can take a look > here at my work: > http://github.com/kowalski/redcloth > Problem is, that when you run rspec now on the new code, there is a > number of tests that fails. The difference is the extra whitespaces > added to the resulting html code. For me it is no harm so I have this > jar working on production for a few months now. > > Cheers, > Marek Kowalski > > 2010/10/18 Georg M. <lists at ruby-forum.com>: > > Hi Claus, > > I hit the same issue (Linux, Redcloth 4.2.3). > > My workaround was to html-escape non-standard characters before passing > > them to redcloth. It works but my solution is a little bit fragile. Did > > you find a proper solution? > > Best, Georg > > -- > > Posted via http://www.ruby-forum.com/. > > _______________________________________________ > > Redcloth-upwards mailing list > > Redcloth-upwards at rubyforge.org > > http://rubyforge.org/mailman/listinfo/redcloth-upwards > > _______________________________________________ > Redcloth-upwards mailing list > Redcloth-upwards at rubyforge.org > http://rubyforge.org/mailman/listinfo/redcloth-upwards > > > _______________________________________________ > Redcloth-upwards mailing list > Redcloth-upwards at rubyforge.org > http://rubyforge.org/mailman/listinfo/redcloth-upwards >
I''ve posted this task to oDesk with a $100 budget. Let''s hope someone takes the job! http://www.odesk.com/jobs/JRuby-fix-for-RedCloth_%7E%7E273ff7a15a938782 On Fri, Nov 12, 2010 at 3:57 AM, Jason Garber <jg at jasongarber.com> wrote:> Great work, Marek! I pulled your work into a branch: jruby-mbc<https://github.com/jgarber/redcloth/tree/jruby-mbc> > > The problem, as you pointed out, is extra whitespace. I''m hoping you or > someone else can help me get it figured out so I can release it! > > It seems to be just when there''s HTML in the input. (At least that''s all > I''ve found so far.) When it''s a standalone HTML tag (just a block tag on a > line), it puts two BRs after. When it''s an HTML block (start tag, contents, > end tag), it puts the BR inside the beginning of the next block. When just > one newline ends the document, it puts a BR inside the end of the last > block; two newlines before EOF behave fine though. > > <div>html_block</div> > > This is a paragraph with > a line break > > standalone_html coming up. > > <div> > > Another p > > <div>test</div> > > Another p. > > > Results in: > > <div>html_block</div> > <p><br /> > This is a paragraph with<br /> > a line break</p> > <p>standalone_html coming up.</p> > <div><br /> > <br /> > > <p>Another p</p> > <div>test</div> > <p><br /> > Another p.<br /> > </p> > > > Weird, huh? I''d greatly appreciate anyone who can help this Java dunce > (me). Here''s the fast way to get it checked out and set up: > > git clone git at github.com:jgarber/redcloth.git > cd redcloth > git checkout jruby-mbc > rvm use jruby-1.5.3 at redcloth # assuming you''re using rvm and you''ve done > ''rvm install jruby'' > bundle > rake compile > > > Thanks! > Jason > > On Oct 18, 2010, at 6:00 AM, Marek Kowalski wrote: > > Hi Georg, > I went the same went. Eventually I managed to kind of solve the > problem, I have the proper jar generated which handles UTF characters > properly. What basicly what needs to be done is to use char (16-bit) > datatype in Ragel code instead of byte (8-bit). You can take a look > here at my work: > http://github.com/kowalski/redcloth > Problem is, that when you run rspec now on the new code, there is a > number of tests that fails. The difference is the extra whitespaces > added to the resulting html code. For me it is no harm so I have this > jar working on production for a few months now. > > Cheers, > Marek Kowalski > > 2010/10/18 Georg M. <lists at ruby-forum.com>: > > Hi Claus, > > > I hit the same issue (Linux, Redcloth 4.2.3). > > > My workaround was to html-escape non-standard characters before passing > > them to redcloth. It works but my solution is a little bit fragile. Did > > you find a proper solution? > > > Best, Georg > > > -- > > Posted via http://www.ruby-forum.com/. > > _______________________________________________ > > Redcloth-upwards mailing list > > Redcloth-upwards at rubyforge.org > > http://rubyforge.org/mailman/listinfo/redcloth-upwards > > > _______________________________________________ > Redcloth-upwards mailing list > Redcloth-upwards at rubyforge.org > http://rubyforge.org/mailman/listinfo/redcloth-upwards > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/redcloth-upwards/attachments/20101127/00c51cb3/attachment.html>
Mhm, thanks for keeping me informed! I don''t agree with the task description though. As far as I remember the problem was with ragel code not html_esc function. It should be easy to figured out for someone with the ragel experience.. You might want to update the description, to lure people with correct profile. Well we will see, hope someone figures it out. Cheers! MK 2010/11/27 Jason Garber <jg at jasongarber.com>:> I''ve posted this task to oDesk with a $100 budget. ?Let''s hope someone takes > the job! > http://www.odesk.com/jobs/JRuby-fix-for-RedCloth_%7E%7E273ff7a15a938782 > > On Fri, Nov 12, 2010 at 3:57 AM, Jason Garber <jg at jasongarber.com> wrote: >> >> Great work, Marek! ?I pulled your work into a branch:?jruby-mbc >> The problem, as you pointed out, is extra whitespace. ?I''m hoping you or >> someone else can help me get it figured out so I can release it! >> It seems to be just when there''s HTML in the input. (At least that''s all >> I''ve found so far.) ?When it''s a standalone HTML tag (just a block tag on a >> line), it puts two BRs after. ?When it''s an HTML block (start tag, contents, >> end tag), it puts the BR inside the beginning of the next block. ?When just >> one newline ends the document, it puts a BR inside the end of the last >> block; two newlines before EOF behave fine though. >> >> <div>html_block</div> >> This is a paragraph with >> a line break >> standalone_html coming up. >> <div> >> Another p >> <div>test</div> >> Another p. >> >> Results in: >> >> <div>html_block</div> >> <p><br /> >> This is a paragraph with<br /> >> a line break</p> >> <p>standalone_html coming up.</p> >> <div><br /> >> <br /> >> <p>Another p</p> >> <div>test</div> >> <p><br /> >> Another p.<br /> >> </p> >> >> Weird, huh? ?I''d greatly appreciate anyone who can help this Java dunce >> (me). ?Here''s the fast way to get it checked out and set up: >> >> git clone?git at github.com:jgarber/redcloth.git >> cd redcloth >> git checkout jruby-mbc >> rvm use jruby-1.5.3 at redcloth ?# assuming you''re using rvm and you''ve done >> ''rvm install jruby'' >> bundle >> rake compile >> >> Thanks! >> Jason >> On Oct 18, 2010, at 6:00 AM, Marek Kowalski wrote: >> >> Hi Georg, >> I went the same went. Eventually I managed to kind of solve the >> problem, I have the proper jar generated which handles UTF characters >> properly. What basicly what needs to be done is to use char (16-bit) >> datatype in Ragel code instead of byte (8-bit). You can take a look >> here at my work: >> http://github.com/kowalski/redcloth >> Problem is, that when you run rspec now on the new code, there is a >> number of tests that fails. The difference is the extra whitespaces >> added to the resulting html code. For me it is no harm so I have this >> jar working on production for a few months now. >> >> Cheers, >> Marek Kowalski >> >> 2010/10/18 Georg M. <lists at ruby-forum.com>: >> >> Hi Claus, >> >> I hit the same issue (Linux, Redcloth 4.2.3). >> >> My workaround was to html-escape non-standard characters before passing >> >> them to redcloth. It works but my solution is a little bit fragile. Did >> >> you find a proper solution? >> >> Best, Georg >> >> -- >> >> Posted via http://www.ruby-forum.com/. >> >> _______________________________________________ >> >> Redcloth-upwards mailing list >> >> Redcloth-upwards at rubyforge.org >> >> http://rubyforge.org/mailman/listinfo/redcloth-upwards >> >> _______________________________________________ >> Redcloth-upwards mailing list >> Redcloth-upwards at rubyforge.org >> http://rubyforge.org/mailman/listinfo/redcloth-upwards >> > > > _______________________________________________ > Redcloth-upwards mailing list > Redcloth-upwards at rubyforge.org > http://rubyforge.org/mailman/listinfo/redcloth-upwards >