I just ran into a stupid client that put the username in the http_URL field, making the first line of the HTTP request look like this: GET http://username at localhost:8080/mojombo/grit HTTP/1.1 Unicorn 500s on this, saying it can''t parse the headers. I''m including a unit test that will die on this, but my question is should Unicorn handle this gracefully by just stripping off the username - parsing it as a ''server'' instead of a ''host''? It seems that most other webservers do, even though it doesn''t appear to be the spec. Thanks, Scott --- ?test/unit/test_http_parser.rb | ? 18 ++++++++++++++++++ ?1 files changed, 18 insertions(+), 0 deletions(-) diff --git a/test/unit/test_http_parser.rb b/test/unit/test_http_parser.rb index 1b3faaf..7d32e5e 100644 --- a/test/unit/test_http_parser.rb +++ b/test/unit/test_http_parser.rb @@ -298,6 +298,24 @@ class HttpParserTest < Test::Unit::TestCase ? ? assert ! parser.keepalive? ? end + ?# some dumb clients add users because they''re stupid + ?def test_absolute_uri_w_user + ? ?parser = HttpParser.new + ? ?req = {} + ? ?http = "GET http://user at example.com/foo?q=bar HTTP/1.0\r\n\r\n" + ? ?assert_equal req, parser.headers(req, http) + ? ?assert_equal ''http'', req[''rack.url_scheme''] + ? ?assert_equal ''/foo?q=bar'', req[''REQUEST_URI''] + ? ?assert_equal ''/foo'', req[''REQUEST_PATH''] + ? ?assert_equal ''q=bar'', req[''QUERY_STRING''] + + ? ?assert_equal ''example.com'', req[''HTTP_HOST''] + ? ?assert_equal ''example.com'', req[''SERVER_NAME''] + ? ?assert_equal ''80'', req[''SERVER_PORT''] + ? ?assert_equal "", http + ? ?assert ! parser.keepalive? + ?end + ? def test_absolute_uri ? ? parser = HttpParser.new ? ? req = {} -- 1.6.6.rc1
Scott Chacon <schacon at gmail.com> wrote:> I just ran into a stupid client that put the username in the http_URL > field, making the first line of the HTTP request look like this: > > GET http://username at localhost:8080/mojombo/grit HTTP/1.1 > > Unicorn 500s on this, saying it can''t parse the headers. I''m > including a unit test that will die on this, but my question is should > Unicorn handle this gracefully by just stripping off the username - > parsing it as a ''server'' instead of a ''host''? It seems that most > other webservers do, even though it doesn''t appear to be the spec.Hi Scott, Other servers (Mongrel) fell back to URI.parse which allowed this. Since Mongrel allowed it (possibly on accident), Unicorn should probably allow it, too... The following change should fix things for you, but I''m not sure about the list of allowed characters for the user and don''t have time to check the RFCs right now. Which client is doing this? Any hope of fixing it there? But yeah, definitely not in rfc2616 from what I remember. Also scp-ed the C source up to http://unicorn.bogomips.org/unicorn_parser.c in case you don''t have Ragel. diff --git a/ext/unicorn_http/unicorn_http_common.rl b/ext/unicorn_http/unicorn_http_common.rl index 041dfec..4842972 100644 --- a/ext/unicorn_http/unicorn_http_common.rl +++ b/ext/unicorn_http/unicorn_http_common.rl @@ -28,6 +28,7 @@ scheme = ( "http"i ("s"i)? ) $downcase_char >mark %scheme; hostname = (alnum | "-" | "." | "_")+; host_with_port = (hostname (":" digit*)?) >mark %host; + user = ((alnum | "_" | ".")+ "@")*; path = ( pchar+ ( "/" pchar* )* ) ; query = ( uchar | reserved )* %query_string ; @@ -36,7 +37,7 @@ rel_path = (path? (";" params)? %request_path) ("?" %start_query query)?; absolute_path = ( "/"+ rel_path ); path_uri = absolute_path > mark %request_uri; - Absolute_URI = (scheme "://" host_with_port path_uri); + Absolute_URI = (scheme "://" user host_with_port path_uri); Request_URI = ((absolute_path | "*") >mark %request_uri) | Absolute_URI; Fragment = ( uchar | reserved )* >mark %fragment; -- Eric Wong
Eric Wong <normalperson at yhbt.net> wrote:> Also scp-ed the C source up to > http://unicorn.bogomips.org/unicorn_parser.c in case you don''t have > Ragel.Err, http://unicorn.bogomips.org/unicorn_http.c that is
Hey, On Thu, Dec 17, 2009 at 5:23 PM, Eric Wong <normalperson at yhbt.net> wrote:> The following change should fix things for you, but I''m not sure about > the list of allowed characters for the user and don''t have time to check > the RFCs right now. ?Which client is doing this? ?Any hope of fixing it > there? ?But yeah, definitely not in rfc2616 from what I remember.Thanks a lot. I do have Ragel, I''ll test this out tomorrow. It is the Versions SVN client in DAV mode that is doing this, and I hate them a little for it. However, I figured other clients must be doing it too. Thanks a bunch for the patch. Scott
Hey guys, I think the <resource_type>://<username>:<password>@<host>/<path> scheme is not "illegal". There are examples of this in the URL RFC, just no explicit HTTP example. This probably a vague area. Its not in the http rfc and its not explicitly mentioned in the http auth rfc either but in combination with the URL RFC there is at least room for it. I haven''t found the paragraph yet which says: no username:password stuff allowed in http urls. But I was just searching through these things ? there are good chances I missed it. http://en.wikipedia.org/wiki/URI_scheme http://tools.ietf.org/html/rfc2617 http://www.ietf.org/rfc/rfc1738.txt Anyway, I came across such urls a lot. Often I use them for giving people easy access to an otherwise basic authed resource - in a chat conversation for example. I know apache and nginx support this - IIS does not. Hrm - tough call ;) Kind regards, John
John-Paul Bader <hukl at berlin.ccc.de> wrote:> Hey guys, > > I think the <resource_type>://<username>:<password>@<host>/<path> > scheme is not "illegal". There are examples of this in the URL RFC, > just no explicit HTTP example. > > This probably a vague area. Its not in the http rfc and its not > explicitly mentioned in the http auth rfc either but in combination > with the URL RFC there is at least room for it. I haven''t found the > paragraph yet which says: no username:password stuff allowed in http > urls. But I was just searching through these things ? there are good > chances I missed it.Hi, Yes, I''ve come to the same conclusion. rfc2616 just seems to defer to rfc2396 (which superceded rfc1738 and is superceded by rfc3986).> http://en.wikipedia.org/wiki/URI_scheme > http://tools.ietf.org/html/rfc2617 > http://www.ietf.org/rfc/rfc1738.txt > > Anyway, I came across such urls a lot. Often I use them for giving > people easy access to an otherwise basic authed resource - in a chat > conversation for example. I know apache and nginx support this - IIS > does not. > > Hrm - tough call ;)Yup, definitely precedence for supporting it (along with Mongrel). I''ve updated the Ragel parser with everything URI.parse("http://..") supports and pushed out the change. I''ve been meaning to make a few more small documentation updates and do a 0.95.3 release tomorrow when I''m more awake. -- Eric Wong