Jimmy Soho
2012-Oct-09 00:39 UTC
Is a client uploading a file a slow client from unicorn''s point of view?
Hi All, I was wondering what would happen when large files were uploaded to our system in parallel to endpoints that don''t process file uploads. In particular I was wondering if we''re vulnerable to a simple DoS attack. The setup I tested with was nginx v1.2.4 with upload module (v2.2.0) configured only for location /uploads with 2 unicorn (v4.3.1) workers with timeout 30 secs, all running on 1 small unix box. In a few terminals I started this command 3 times in parallel: $ curl -i -F importer_input=@/Users/admin/largefile.tar.gz https://mywebserver.com/doesnotexist In a browser I then tried to go a page that would be served by a unicorn worker. My expectation was that I would not get to see the web page as all unicorn workers would be busy with receiving / saving the upload. As discussed in for example this article: http://stackoverflow.com/questions/9592664/unicorn-rails-large-uploads. Or as https://github.com/dwilkie/carrierwave_direct describes it: "Processing and saving file uploads are typically long running tasks and should be done in a background process." But I don''t see this. The page is served just fine in my setup. The requests for the file uploads appear in the nginx access log at the same time the curl upload command eventually finishes minutes later client side, and then it''s handed off to a unicorn/rack worker process, which quickly returns a 404 page not found. Response times of less than 50ms. What am I missing here? I''m starting to wonder what''s the use of the nginx upload module? My understanding was that it''s use was to keep unicorn workers available as long as a file upload was in progress, but it seems that without that module it does the same thing. Another question (more an nginx question though I guess): is there a way to kill an upload request as early as possible if the request is not made against known / accepted URI locations, instead of waiting for it to be completely uploaded to our system and/or waiting for it to reach the unicorn workers? Cheers, Jim
Eric Wong
2012-Oct-09 01:58 UTC
Is a client uploading a file a slow client from unicorn''s point of view?
Jimmy Soho <jimmy.soho at gmail.com> wrote:> Hi All, > > I was wondering what would happen when large files were uploaded to > our system in parallel to endpoints that don''t process file uploads. > In particular I was wondering if we''re vulnerable to a simple DoS > attack.nginx will protect you by buffering large requests to disk, so slow requests are taken care of (of course you may still run out of disk space)> The setup I tested with was nginx v1.2.4 with upload module (v2.2.0) > configured only for location /uploads with 2 unicorn (v4.3.1) workers > with timeout 30 secs, all running on 1 small unix box. > > In a few terminals I started this command 3 times in parallel: > > $ curl -i -F importer_input=@/Users/admin/largefile.tar.gz > https://mywebserver.com/doesnotexist > > In a browser I then tried to go a page that would be served by a unicorn worker. > > My expectation was that I would not get to see the web page as all > unicorn workers would be busy with receiving / saving the upload. As > discussed in for example this article: > http://stackoverflow.com/questions/9592664/unicorn-rails-large-uploads. > Or as https://github.com/dwilkie/carrierwave_direct describes it: > > "Processing and saving file uploads are typically long running tasks > and should be done in a background process."That is true. It''s good to move slow jobs to background processes if possible if the bottleneck is either: a) your application processing b) the storage destination of your app (e.g. cloud storage) However, if your only bottleneck is client <-> your app, then nginx will take care of that part for you.> But I don''t see this. The page is served just fine in my setup. The > requests for the file uploads appear in the nginx access log at the > same time the curl upload command eventually finishes minutes later > client side, and then it''s handed off to a unicorn/rack worker > process, which quickly returns a 404 page not found. Response times of > less than 50ms. > > What am I missing here? I''m starting to wonder what''s the use of the > nginx upload module? My understanding was that it''s use was to keep > unicorn workers available as long as a file upload was in progress, > but it seems that without that module it does the same thing.I''m not familiar with the nginx upload module, but stock nginx will already do full request buffering for you. It looks like the nginx upload module[1] is mostly meant for standalone apps written for nginx, and not when nginx is used as a proxy for Rails app... [1] http://www.grid.net.ru/nginx/upload.en.html> Another question (more an nginx question though I guess): is there a > way to kill an upload request as early as possible if the request is > not made against known / accepted URI locations, instead of waiting > for it to be completely uploaded to our system and/or waiting for it > to reach the unicorn workers?I''m not sure if nginx has this functionality, but unicorn lazily buffers uploads. So your upload will be fully read by nginx, but unicorn will only read the uploaded request body if your application wants to read it. Unfortunately, I think most application frameworks (e.g. Rails) will attempt to do all the multipart parsing up front. To get around this, you''ll probably want some middleware along the following lines (and placed in front of whichever part of your stack calls Rack::Multipart.parse_multipart) class BadUploadStopper def initialize(app) @app = app end def call(env) case env["REQUEST_METHOD"] when "POST", "PUT" case env["PATH_INFO"] when "/upload_allowed" @app.call(env) # forward to the app else # bad path, don''t waste time with @app.call [ 403, {}, [ "Go away\n" ] ] end else @app.call(env) # forward to the app end end end ------------------- config.ru --------------------- use BadUploadStopper run YourApp.new
Laas Toom
2012-Oct-09 06:31 UTC
Is a client uploading a file a slow client from unicorn''s point of view?
On 09.10.2012, at 4:58, Eric Wong <normalperson at yhbt.net> wrote:> I''m not familiar with the nginx upload module, but stock nginx will > already do full request buffering for you. It looks like the nginx > upload module[1] is mostly meant for standalone apps written for > nginx, and not when nginx is used as a proxy for Rails app...AFAIK the upload module will give you two things: 1) handle the whole body parsing up to the point of storing the file to disk in correct place. Then it strips the file from POST request and replaces with reference to the location on disk. 2) make the upload progress available, so e.g. AJAX-powered upload forms can show progressbar, which is really neat. No need for Flash-based uploaders. I have a Rails app that accepts media uploads. All the processing happens in background, front-end handles only the actual upload and stores it to disk. But with uploads as large as 1.4 GB, I''ve seen Rails response times > 200 secs. This starts to give timeouts in weird places. Eric, correct me if I''m wrong, but doesn''t Nginx-Unicorn-Rails stack write the whole file up to 3 times to disk: 1) Nginx buffers the body (in encoded state) 2) Unicorn parses the body and writes to TMP folder (as requested by Rails) 3) if Rails accepts the file, it moves it to actual storage. But as /tmp is often different device from storage, this is actually a full copy. In such a situation the upload module would help out, because instead of simply buffering the body on disk, it actually parses the body. And it is implemented in C, which should make it faster. Afterwards it will only handle out the file location and Rails can complete it''s work a lot faster, freeing up workers. Unicorn won''t even see the file and Rails has the responsibility to delete the file if it''s invalid. Best, Laas
Eric Wong
2012-Oct-09 20:03 UTC
Is a client uploading a file a slow client from unicorn''s point of view?
Laas Toom <laas at toom.ee> wrote:> On 09.10.2012, at 4:58, Eric Wong <normalperson at yhbt.net> wrote: > > I''m not familiar with the nginx upload module, but stock nginx will > > already do full request buffering for you. It looks like the nginx > > upload module[1] is mostly meant for standalone apps written for > > nginx, and not when nginx is used as a proxy for Rails app... > > AFAIK the upload module will give you two things: > > 1) handle the whole body parsing up to the point of storing the file > to disk in correct place. Then it strips the file from POST request > and replaces with reference to the location on disk.That sounds awesome performance-wise.> 2) make the upload progress available, so e.g. AJAX-powered upload forms can show progressbar, which is really neat. No need for Flash-based uploaders.It does? I''m not seeing it in the documentation, but I know there''s a separate upload progress module, though: http://wiki.nginx.org/HttpUploadProgressModule A side note on upload progress: I wrote upr[1] back in the day since I wanted to share upload progress state via memcached for multi-machine configurations: http://upr.bogomips.org/> I have a Rails app that accepts media uploads. All the processing happens in background, front-end handles only the actual upload and stores it to disk. > But with uploads as large as 1.4 GB, I''ve seen Rails response times > 200 secs. This starts to give timeouts in weird places.Yikes. I assume you''re constrained by disk I/O there? For some of the large file situations under Linux, I find it beneficial to lower the dirty_*ratio/*bytes drastically to avoid large, sudden bursts of disk activity and instead favor smaller writes. I get lower throughput, but more consistent performance.> Eric, correct me if I''m wrong, but doesn''t Nginx-Unicorn-Rails stack > write the whole file up to 3 times to disk: > > 1) Nginx buffers the body (in encoded state)Correct.> 2) Unicorn parses the body and writes to TMP folder (as requested by Rails)Rack does multipart parsing. Unicorn itself doesn''t do body parsing other than handling Transfer-Encoding:chunked (which hardly anybody sends).> 3) if Rails accepts the file, it moves it to actual storage. But as /tmp is often different device from storage, this is actually a full copy.Depends on the Rack/Rails app, but usually this is the case. For my use, all uploads are PUT requests with "curl -T", so there''s no multipart parsing involved and much faster :)> In such a situation the upload module would help out, because instead > of simply buffering the body on disk, it actually parses the body. And > it is implemented in C, which should make it faster.Yep.> Afterwards it will only handle out the file location and Rails can > complete it''s work a lot faster, freeing up workers. > > Unicorn won''t even see the file and Rails has the responsibility to > delete the file if it''s invalid.I think the only problem with this approach is it won''t work well on setups where nginx is on separate machines from unicorn. Shared storage would be required, but that ends up adding to network I/O, too...
Laas Toom
2012-Oct-09 23:06 UTC
Is a client uploading a file a slow client from unicorn''s point of view?
On 09.10.2012, at 23:03, Eric Wong <normalperson at yhbt.net> wrote:>> 2) make the upload progress available, so e.g. AJAX-powered upload forms can show progressbar, which is really neat. No need for Flash-based uploaders. > > It does? I''m not seeing it in the documentation, but I know there''s > a separate upload progress module, though: > http://wiki.nginx.org/HttpUploadProgressModuleI must have mixed them up then. :-)>> I have a Rails app that accepts media uploads. All the processing happens in background, front-end handles only the actual upload and stores it to disk. >> But with uploads as large as 1.4 GB, I''ve seen Rails response times > 200 secs. This starts to give timeouts in weird places. > > Yikes. I assume you''re constrained by disk I/O there?Might be. Additionally, the disk is SAN, so network activity there too.> For some of the large file situations under Linux, I find it beneficial > to lower the dirty_*ratio/*bytes drastically to avoid large, sudden > bursts of disk activity and instead favor smaller writes. I get lower > throughput, but more consistent performance.I shall look into it when I get to fixing this issue.>> Afterwards it will only handle out the file location and Rails can >> complete it''s work a lot faster, freeing up workers. >> >> Unicorn won''t even see the file and Rails has the responsibility to >> delete the file if it''s invalid. > > I think the only problem with this approach is it won''t work well on > setups where nginx is on separate machines from unicorn. Shared > storage would be required, but that ends up adding to network I/O, > too...But won''t (almost) the same network I/O be evident anyway, because of nginx transferring the data to Unicorn over network (as they are on different machines)? Thanks for clarifying, Laas
Eric Wong
2012-Oct-09 23:54 UTC
Is a client uploading a file a slow client from unicorn''s point of view?
Laas Toom <laas at toom.ee> wrote:> On 09.10.2012, at 23:03, Eric Wong <normalperson at yhbt.net> wrote: > > Laas Toom <laas at toom.ee> wrote: > >> Afterwards it will only handle out the file location and Rails can > >> complete it''s work a lot faster, freeing up workers. > >> > >> Unicorn won''t even see the file and Rails has the responsibility to > >> delete the file if it''s invalid. > > > > I think the only problem with this approach is it won''t work well on > > setups where nginx is on separate machines from unicorn. Shared > > storage would be required, but that ends up adding to network I/O, > > too... > > But won''t (almost) the same network I/O be evident anyway, because of > nginx transferring the data to Unicorn over network (as they are on > different machines)?It depends on your shared storage implementation. It''ll likely be a win if the shared storage is on the same server as nginx (but that might mean you can only have one nginx server). But I think it''ll be a loss if there needs to be multiple nginx servers (and multiple unicorn servers)... * With nginx_upload_module + shared storage: nginx server ------ shared storage -------- unicorn server ------------------------------------------------------------------ 1. sequential write to shared storage 2. file could remain cached do processing on file parts on nginx server, even if remotely, network latency we''ll never need to read from reads (and possible cache it again coherency checks on rereads) 3. unlink on error unlink/rename/copy on success * Without nginx_upload_module: nginx server -------------------------- unicorn server ------------------------------------------------------------------ 1. sequential write of tempfile 2. sequential read of tempfile ----------> sequential write by Rack 3. unlink (able to free up cache) do processing on file locally (no remote cache coherency checks) The benefit of this approach is there''s only 2 components interacting at any one time, and the network costs are paid in full up front Basically, it''s the message passing concurrency model vs shared memory+locking. There''s no clear winner, it just depends on the situation. 99% of the time I get away with keeping everything on one machine :)
Laas Toom
2012-Oct-10 06:59 UTC
Is a client uploading a file a slow client from unicorn''s point of view?
On 10.10.2012, at 2:54, Eric Wong <normalperson at yhbt.net> wrote:> Basically, it''s the message passing concurrency model vs shared > memory+locking. There''s no clear winner, it just depends on the > situation. 99% of the time I get away with keeping everything on > one machine :)Exactly. For now we have nginx and unicorn on the same machine, both interacting w/ the storage over network, so there is no difference if the initial write is done by Rails or Nginx. Later, Resque will pick up the background processing task, which again could be on totally separate machine(s). And in the most complex scenario - there could be multiple Nginx, multiple Unicorn and multiple Resque nodes, each doing it''s part. In such a situation, the Unicorn node can skip the network I/O, if it can base it''s upload validation solely on filename and attrs, disregarding the file data itself. Best, Laas