Robbie Allen
2009-Jan-07 19:44 UTC
[Mongrel] HTTP parse error due to an extra percent sign
If you append an extra percent sign to a URL that gets passed to mongrel, it will return a Bad Request error. Kind of odd that "http://localhost/%" causes a "Bad Request" instead of a "Not Found" error. Here is the error from the mongrel log: HTTP parse error, malformed request (127.0.0.1): #<Mongrel::HttpParserError: Invalid HTTP format, parsing fails.> I''m using Nginx in front of mongrel. I understand this is a bad URL, but is there anyway to have mongrel ignore lone percent signs? Or perhaps a Nginx rewrite rule that will encode extraneous percent signs? -- Posted via http://www.ruby-forum.com/.
Stephan Wehner
2009-Jan-07 20:00 UTC
[Mongrel] HTTP parse error due to an extra percent sign
On Wed, Jan 7, 2009 at 11:44 AM, Robbie Allen <lists at ruby-forum.com> wrote:> If you append an extra percent sign to a URL that gets passed to > mongrel, it will return a Bad Request error. Kind of odd that > "http://localhost/%" causes a "Bad Request" instead of a "Not Found" > error. > > Here is the error from the mongrel log: > HTTP parse error, malformed request (127.0.0.1): > #<Mongrel::HttpParserError: Invalid HTTP format, parsing fails.> > > I''m using Nginx in front of mongrel. I understand this is a bad URL, > but is there anyway to have mongrel ignore lone percent signs? Or > perhaps a Nginx rewrite rule that will encode extraneous percent signs?Out of curiousity, why does mongrel''s handling of this case bother you? Looks like entirely standard behaviour, see http://groklaw.net/% http://slashdot.org/% http://w3c.org/% (All produce status 400) Stephan> Posted via http://www.ruby-forum.com/. > _______________________________________________ > Mongrel-users mailing list > Mongrel-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-users >-- Stephan Wehner -> http://stephan.sugarmotor.org -> http://www.thrackle.org -> http://www.buckmaster.ca -> http://www.trafficlife.com -> http://stephansmap.org -- blog.stephansmap.org
Jonathan Rochkind
2009-Jan-07 20:06 UTC
[Mongrel] HTTP parse error due to an extra percent sign
Yes. I have run into this before. Mongrel will error on an invalid HTTP URI, with one common case being characters not properly escaped, which is what your example is. When one of the developers of my app brought this up before, he was told by the Mongrel developer that this was intentional, and would not be changed. I didn''t like this then, and I don''t like it now, for a variety of reasons, including that my app needs to respond to URLs sent by third parties that are not under my control. Perhaps the current mongrel developers (IS there even any active development on mongrel?) have a different opinion, and this could be changed, or made configurable. In the meantime, I have gotten around it with some mod_rewrite rules in apache on top of mongrel, to take illegal URLs and escape/rewrite them to be legal. Except due to some weird (bugs?) in apache and mod_rewrite around escaping and difficulty of controlling escaping in the apache conf, I actually had to use an external perl file too. Here''s what I do: Apache conf, applying to mongrel urls (which in my setup are all urls on a given apache virtual host) RewriteEngine on RewriteMap query_escape prg:/data/web/findit/Umlaut/distribution/script/rewrite_map.pl #RewriteLock /var/lock/subsys/apache.rewrite.lock RewriteCond %{query_string} ^(.*[\>\<].*)$ RewriteRule ^(.*)$ $1?${query_escape:%1} [R,L,NE] The rewrite_map.pl file: #!/usr/bin/perl $| = 1; # Turn off buffering while (<STDIN>) { s/>/%3E/g; s/</%3C/g; s/\//%2F/g; s/\\/%5C/g; s/ /\+/g; print $_; } ## Looks like I''m not actually escaping bare ''%'' chars, since i hadn''t run into those before in the URLs I need to handle. It would be trickier to add a regexp for that, since you need to distinguish an improper % from an % that''s actually part of an entity reference. Maybe something like: s/%([^A-F0-9]|$)([^A-F0-9]|$)/%25/g; ''/%25'' would be a valid URI path representing the % char. ''/%'' is not. Hope this helps, Jonathan Robbie Allen wrote:> If you append an extra percent sign to a URL that gets passed to > mongrel, it will return a Bad Request error. Kind of odd that > "http://localhost/%" causes a "Bad Request" instead of a "Not Found" > error. > > Here is the error from the mongrel log: > HTTP parse error, malformed request (127.0.0.1): > #<Mongrel::HttpParserError: Invalid HTTP format, parsing fails.> > > I''m using Nginx in front of mongrel. I understand this is a bad URL, > but is there anyway to have mongrel ignore lone percent signs? Or > perhaps a Nginx rewrite rule that will encode extraneous percent signs? >-- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu
Robbie Allen
2009-Jan-07 20:28 UTC
[Mongrel] HTTP parse error due to an extra percent sign
So how do you catch it? All of those errors are not very friendly and completely bypass the site look and feel. See these: http://www.google.com/% http://www.yahoo.com/% Robbie Stephan Wehner wrote:> On Wed, Jan 7, 2009 at 11:44 AM, Robbie Allen <lists at ruby-forum.com> > wrote: >> but is there anyway to have mongrel ignore lone percent signs? Or >> perhaps a Nginx rewrite rule that will encode extraneous percent signs? > > Out of curiousity, why does mongrel''s handling of this case bother you? > Looks like entirely standard behaviour, see > > http://groklaw.net/% > http://slashdot.org/% > http://w3c.org/% > > (All produce status 400) > > > Stephan-- Posted via http://www.ruby-forum.com/.
Stephan Wehner
2009-Jan-07 21:09 UTC
[Mongrel] HTTP parse error due to an extra percent sign
On Wed, Jan 7, 2009 at 12:06 PM, Jonathan Rochkind <rochkind at jhu.edu> wrote:> Yes. I have run into this before. Mongrel will error on an invalid HTTP URI, > with one common case being characters not properly escaped, which is what > your example is. When one of the developers of my app brought this up > before, he was told by the Mongrel developer that this was intentional, and > would not be changed. > > I didn''t like this then, and I don''t like it now, for a variety of reasons, > including that my app needs to respond to URLs sent by third parties that > are not under my control. Perhaps the current mongrel developers (IS there > even any active development on mongrel?) have a different opinion, and this > could be changed, or made configurable. > > In the meantime, I have gotten around it with some mod_rewrite rules in > apache on top of mongrel, to take illegal URLs and escape/rewrite them to be > legal. Except due to some weird (bugs?) in apache and mod_rewrite around > escaping and difficulty of controlling escaping in the apache conf, I > actually had to use an external perl file too. Here''s what I do: > > Apache conf, applying to mongrel urls (which in my setup are all urls on a > given apache virtual host) > > RewriteEngine on > RewriteMap query_escape > prg:/data/web/findit/Umlaut/distribution/script/rewrite_map.pl > #RewriteLock /var/lock/subsys/apache.rewrite.lock > RewriteCond %{query_string} ^(.*[\>\<].*)$ > RewriteRule ^(.*)$ $1?${query_escape:%1} [R,L,NE] > > The rewrite_map.pl file: > > #!/usr/bin/perl > $| = 1; # Turn off buffering > while (<STDIN>) { > > s/>/%3E/g; > s/</%3C/g; > s/\//%2F/g; > s/\\/%5C/g; > s/ /\+/g; > print $_; > } > ##It strikes me as a good thing that Apache weeds out bad URL''s. Less parsing for mongrel, less work, and one less point of failure to worry about. (When I see code like above after "Turn off buffering" - with all respect - I get worried.) On the other hand, does Apache not allow configuring the page returned for 400 Bad Request. This would then also allow addressing the issue that "All of those errors are not very friendly and completely bypass the site look and feel." ("Robbie") Stephan> Looks like I''m not actually escaping bare ''%'' chars, since i hadn''t run into > those before in the URLs I need to handle. It would be trickier to add a > regexp for that, since you need to distinguish an improper % from an % > that''s actually part of an entity reference. Maybe something like: > > s/%([^A-F0-9]|$)([^A-F0-9]|$)/%25/g; > > ''/%25'' would be a valid URI path representing the % char. ''/%'' is not. > > Hope this helps, > > Jonathan > > > Robbie Allen wrote: >> >> If you append an extra percent sign to a URL that gets passed to >> mongrel, it will return a Bad Request error. Kind of odd that >> "http://localhost/%" causes a "Bad Request" instead of a "Not Found" >> error. >> >> Here is the error from the mongrel log: >> HTTP parse error, malformed request (127.0.0.1): >> #<Mongrel::HttpParserError: Invalid HTTP format, parsing fails.> >> >> I''m using Nginx in front of mongrel. I understand this is a bad URL, >> but is there anyway to have mongrel ignore lone percent signs? Or >> perhaps a Nginx rewrite rule that will encode extraneous percent signs? >> > > -- > Jonathan Rochkind > Digital Services Software Engineer > The Sheridan Libraries > Johns Hopkins University > 410.516.8886 rochkind (at) jhu.edu > _______________________________________________ > Mongrel-users mailing list > Mongrel-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-users >-- Stephan Wehner -> http://stephan.sugarmotor.org -> http://www.thrackle.org -> http://www.buckmaster.ca -> http://www.trafficlife.com -> http://stephansmap.org -- blog.stephansmap.org
Robbie Allen
2009-Jan-07 21:25 UTC
[Mongrel] HTTP parse error due to an extra percent sign
> On the other hand, does Apache not allow configuring the page returned > for 400 Bad Request. This would > then also allow addressing the issue thatI''m using Nginx and it does allow you to set error_pages, but that doesn''t seem to work in this case (at least for me). Has anyone got it to work so Nginx will use an actual error page instead of the default when encountering a Bad Request? Robbie -- Posted via http://www.ruby-forum.com/.
Jonathan Rochkind
2009-Jan-07 21:31 UTC
[Mongrel] HTTP parse error due to an extra percent sign
This particular case actually doesn''t bother me in particular. It may be fine for "/%" to be a 400 rather than a 404. My particular case involved needing to process mal-formed query strings sent by third parties. I had no control over these third parties. I _needed_ to be able to process query strings that included un-escaped ampersands and such. Yes, the third party sending me this information in a query string was doing it in a way that was illegal and violated standards, but they are more powerful than I, and I can not make them change their behavior, and I need to handle those URLs anyway. I took care of it with a rewrite on the apache end before it reached mongrel though. This ended up being somewhat more complicated then I hoped it would be becuase of Apache''s weird and unpredictable behavior when it came to escaping, but now that I have it working, it works out. Jonathan Stephan Wehner wrote:> On Wed, Jan 7, 2009 at 11:44 AM, Robbie Allen <lists at ruby-forum.com> wrote: > >> If you append an extra percent sign to a URL that gets passed to >> mongrel, it will return a Bad Request error. Kind of odd that >> "http://localhost/%" causes a "Bad Request" instead of a "Not Found" >> error. >> >> Here is the error from the mongrel log: >> HTTP parse error, malformed request (127.0.0.1): >> #<Mongrel::HttpParserError: Invalid HTTP format, parsing fails.> >> >> I''m using Nginx in front of mongrel. I understand this is a bad URL, >> but is there anyway to have mongrel ignore lone percent signs? Or >> perhaps a Nginx rewrite rule that will encode extraneous percent signs? >> > > Out of curiousity, why does mongrel''s handling of this case bother you? > Looks like entirely standard behaviour, see > > http://groklaw.net/% > http://slashdot.org/% > http://w3c.org/% > > (All produce status 400) > > > Stephan > > >> Posted via http://www.ruby-forum.com/. >> _______________________________________________ >> Mongrel-users mailing list >> Mongrel-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mongrel-users >> >> > > > >-- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu
On 1/7/09, Jonathan Rochkind <rochkind at jhu.edu> wrote:> Yes. I have run into this before. Mongrel will error on an invalid HTTP URI, > with one common case being characters not properly escaped, which is what > your example is. When one of the developers of my app brought this up > before, he was told by the Mongrel developer that this was intentional, and > would not be changed.Mongrel''s HTTP parser grammar was written by Zed to be very RFC conformant.> I didn''t like this then, and I don''t like it now, for a variety of reasons, > including that my app needs to respond to URLs sent by third parties that > are not under my control. Perhaps the current mongrel developers (IS there > even any active development on mongrel?) have a different opinion, and this > could be changed, or made configurable.The mongrel HTTP parser is very stable, and is in use by multiple projects. I can''t speak for the other mongrel devs, but if it were up to me alone, I''d keep mongrel''s HTTP parser RFC compliant. Since you have a special case, I would suggest that you just take a look at the grammar for the parser and consider compiling your own parser. You could probably just remove ''%'' from the unsafe type, and see if that will work for you: It looks like this: unsafe = (CTL | " " | "\"" | "#" | "%" | "<" | ">"); Kirk Haines
Jonathan Rochkind
2009-Jan-07 21:38 UTC
[Mongrel] HTTP parse error due to an extra percent sign
Stephan Wehner wrote:> > It strikes me as a good thing that Apache weeds out bad URL''s. Less > parsing for mongrel, less work, and one less point of failure to worry > about. (When I see code like above after "Turn off buffering" - with > all respect - I get worried.) >Um, that code that worries you is the code that was neccesary to get Apache to ''fix'' these bad URLs to be good URLs. If you have a better way to do it, let me know and I''m happy to use it! That actually took me several solid days of work to get that far, because Apache is _weird_ when it comes to escaping and mod_rewrite. Without using the external perl rewrite map, I could only get it to end up double-escaped or not properly escaped at all, I could NOT get mod_rewrite alone withotu perl to rewrite > to %3E etc all by itself. I kept ending up with things like %253E instead, because Apache would go ahead and apply another escaping when I didn''t want it to. I could get apache to do no escaping, or double escaping, but couldn''t get it to do the kind of escaping I needed---until I figured out I had to resort to an external perl rewrite map. Which yes, resulted in code that I don''t like that much either, but it was all I could come up with to figure out a solution to my unavoidable business problem. So you like solving it in Apache rather than Mongrel, but don''t like the best way I came up with to solve it in Apache after nearly a week of hacking? Heh, I''m not sure what you''re suggesting. Now that I''ve got it done, it works, but it was kind of a frustrating four days of work hacking mod_rewrite and apache conf when that''s not what I wanted to be doing. Oddly, I could find hardly anyone Googling who had to deal with this problem before. I guess the circumstance of having to deal with long complicated possibly ill-formed query strings sent by third parties is rare. And having to deal with it at the Apache layer is not the choice anyone else made, when they did have to deal with it. (In general, doing complicated things in apache conf reminds me of trying to do complicated things in sendmail. It gets unpredictable and turns into ''twist this knob and see what happens'' pretty quick. I''d much rather be writing ruby than hacking apache confs.) Jonathan> On the other hand, does Apache not allow configuring the page returned > for 400 Bad Request. This would > then also allow addressing the issue that > > "All of those errors are not very friendly and completely bypass the > site look and feel." ("Robbie") > > Stephan > > >> Looks like I''m not actually escaping bare ''%'' chars, since i hadn''t run into >> those before in the URLs I need to handle. It would be trickier to add a >> regexp for that, since you need to distinguish an improper % from an % >> that''s actually part of an entity reference. Maybe something like: >> >> s/%([^A-F0-9]|$)([^A-F0-9]|$)/%25/g; >> >> ''/%25'' would be a valid URI path representing the % char. ''/%'' is not. >> >> Hope this helps, >> >> Jonathan >> >> >> Robbie Allen wrote: >> >>> If you append an extra percent sign to a URL that gets passed to >>> mongrel, it will return a Bad Request error. Kind of odd that >>> "http://localhost/%" causes a "Bad Request" instead of a "Not Found" >>> error. >>> >>> Here is the error from the mongrel log: >>> HTTP parse error, malformed request (127.0.0.1): >>> #<Mongrel::HttpParserError: Invalid HTTP format, parsing fails.> >>> >>> I''m using Nginx in front of mongrel. I understand this is a bad URL, >>> but is there anyway to have mongrel ignore lone percent signs? Or >>> perhaps a Nginx rewrite rule that will encode extraneous percent signs? >>> >>> >> -- >> Jonathan Rochkind >> Digital Services Software Engineer >> The Sheridan Libraries >> Johns Hopkins University >> 410.516.8886 rochkind (at) jhu.edu >> _______________________________________________ >> Mongrel-users mailing list >> Mongrel-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mongrel-users >> >> > > >-- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu
Cody Caughlan
2009-Jan-07 22:09 UTC
[Mongrel] HTTP parse error due to an extra percent sign
> Has anyone got it to work so Nginx will use an actual error page instead > of the default when encountering a Bad Request?This is pretty easy to do, just use the "error_page" directive and list all of the ones you want to capture and then specify it to use your 500 page, e.g. we capture all 5xx errors using these nginx config rules: error_page 500 502 503 504 /500.html; location = /500.html { root /u/apps/project/public; } Just add 400 to the list of 5xx errors. You will also want to have a "500.html" in your document root or modify the path to the file accordingly. /Cody On Wed, Jan 7, 2009 at 1:25 PM, Robbie Allen <lists at ruby-forum.com> wrote:>> On the other hand, does Apache not allow configuring the page returned >> for 400 Bad Request. This would >> then also allow addressing the issue that > > I''m using Nginx and it does allow you to set error_pages, but that > doesn''t seem to work in this case (at least for me). > > Has anyone got it to work so Nginx will use an actual error page instead > of the default when encountering a Bad Request? > > Robbie > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Mongrel-users mailing list > Mongrel-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-users >
James Tucker
2009-Jan-08 00:39 UTC
[Mongrel] HTTP parse error due to an extra percent sign
On 7 Jan 2009, at 21:31, Jonathan Rochkind wrote:> Yes, the third party sending me this information in a query string > was doing it in a way that was illegal and violated standards, but > they are more powerful than I, and I can not make them change their > behavior, and I need to handle those URLs anyway.This is a sad day, they won. Next time, scream from the rooftops, unless you already signed your free speech away, that is. Printing an RFC and highlighting the relevant clauses and handing that to a level higher than disgruntled developers is often relatively effective too.
Jonathan Rochkind
2009-Jan-08 17:51 UTC
[Mongrel] HTTP parse error due to an extra percent sign
I can scream to the rooftops all I want. This particular product I am working on is an OpenURL Resolver that handles OpenURLs representing scholarly citations sent from third-party licensed search providers that the library (where I work) pays for. We have contracts with literally hundreds of such providers. This one provider in particular that sends the bad URL is a particular large company (EBSCO), with billions of dollars in revenue, and thousands of customers of which we are just one. Most of their other customers are not using mongrel-fronted (or Rails at all) solutions for OpenURL link resolving; the solutions they are using manage to deal with the faulty URLs. Now ours does too, with some ugly apache/perl hacking in front of mongrel. I complained both privately and publically about this. But the world is not always as we would like it, and sometimes our software needs to deal with incoming data that is not standards compliant, just the way it is. Jonathan James Tucker wrote:> > On 7 Jan 2009, at 21:31, Jonathan Rochkind wrote: > >> Yes, the third party sending me this information in a query string >> was doing it in a way that was illegal and violated standards, but >> they are more powerful than I, and I can not make them change their >> behavior, and I need to handle those URLs anyway. > > This is a sad day, they won. > > Next time, scream from the rooftops, unless you already signed your > free speech away, that is. > > Printing an RFC and highlighting the relevant clauses and handing that > to a level higher than disgruntled developers is often relatively > effective too. > _______________________________________________ > Mongrel-users mailing list > Mongrel-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-users >-- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu
David Vrensk
2009-Jan-08 19:03 UTC
[Mongrel] HTTP parse error due to an extra percent sign
On Thu, Jan 8, 2009 at 18:51, Jonathan Rochkind <rochkind at jhu.edu> wrote:> I can scream to the rooftops all I want. [...] > > I complained both privately and publically about this. But the world is not > always as we would like it, and sometimes our software needs to deal with > incoming data that is not standards compliant, just the way it is. >Absolutely true. And I think the solution you devised is a good one: take a piece of software that (for reasons unknown to me) accepts malformed input, have it clean up the input and pass it on. No reason to disable checks in a tool that actually does what it should. Out of curiosity: what did this company respond when you asked them to provide protocol-compliant data? I''d like to think that they at least apologised profusely for being unable to keep the tubes clean, as it were, instead of saying "standards shmandards". BR, /David -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/mongrel-users/attachments/20090108/51f3e00a/attachment.html>
James Tucker
2009-Jan-09 12:29 UTC
[Mongrel] HTTP parse error due to an extra percent sign
On 8 Jan 2009, at 17:51, Jonathan Rochkind wrote:> We have contracts with literally hundreds of such providers.Most of which do the right thing.> This one provider in particular that sends the bad URL is a > particular large company (EBSCO), with billions of dollars in > revenue, and thousands of customers of which we are just one.It should generally take one, or a handful of lines of code to properly escape urls. It sounds like they can definitely afford to do this.> Most of their other customers are not using mongrel-fronted (or > Rails at all) solutions for OpenURL link resolving; the solutions > they are using manage to deal with the faulty URLs.There are plenty of stories of how mongrels conformance to the RFC has raised similar concerns. Indeed this happened with Google too. Some of the stories can be found on Zeds blog. AFAIK, Google fixed their stuff.> Now ours does too, with some ugly apache/perl hacking in front of > mongrel.As David said, your solution is a valid one. Changing mongrel is not.> I complained both privately and publically about this. But the world > is not always as we would like it, and sometimes our software needs > to deal with incoming data that is not standards compliant, just the > way it is.Understood, nonetheless, you should keep trying, as we all should, to improve this world.