thr3ads.net - freebsd stable - Varnish proxy goes catatonic under heavy load [Nov 2014]

If this information is useful, please help other people find it:
Share via:

Matthew Seaman

2014-Nov-05 11:49 UTC

Varnish proxy goes catatonic under heavy load

Dear all,

We had an unfortunate set of circumstances which resulted in several
million people all trying to download about 1.5MB worth of images from
our servers over the course of a few hours. Or, at least, it would have
been a few hours, except that our three varnish proxies just crumbled
under the load within 10 minutes.

Now, that's bad enough, but we could have just about coped if the
proxies stopped serving requests for a few minutes. What actually
happened was that all three servers went catatonic on the network *and
stayed that way*: even when we shunted the traffic away from one, we
still couldn't access it via ssh or any network protocol. And it stayed
like that for sufficiently long time that we had no recourse other than
to get the servers rebooted.

Can anyone explain what was happening here? Not having the servers
recover accessibility for an extended period even after the excess
traffic was stopped is unacceptable. We're also struggling to recreate
the effect in the lab: any clues about how to do so, and any suggestions
about how to prevent the 'going catatonic' response would be greatly
appreciated.

Servers are amd64 running FreeBSD 9.1 or 9.2 and Varnish 3.0.5.

Cheers,

Matthew

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 949 bytes
Desc: OpenPGP digital signature
URL:
<http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20141105/a3169849/attachment.sig>

Steven Hartland

2014-Nov-05 12:00 UTC

head link

Varnish proxy goes catatonic under heavy load

As a guess you exhausted all mbufs, 10 has much better defaults for 
these so I'd recommend updating.

If you can get in via IPMI or something similar you should be able to 
confirm.

A trick I've used in the past to recover from such a issue is to hard 
bounce the nic ports on the switch which seemed to free enough to be 
able to ssh in.

On 05/11/2014 11:49, Matthew Seaman wrote:> Dear all,
>
> We had an unfortunate set of circumstances which resulted in several
> million people all trying to download about 1.5MB worth of images from
> our servers over the course of a few hours.  Or, at least, it would have
> been a few hours, except that our three varnish proxies just crumbled
> under the load within 10 minutes.
>
> Now, that's bad enough, but we could have just about coped if the
> proxies stopped serving requests for a few minutes.  What actually
> happened was that all three servers went catatonic on the network *and
> stayed that way*: even when we shunted the traffic away from one, we
> still couldn't access it via ssh or any network protocol.  And it
stayed
> like that for sufficiently long time that we had no recourse other than
> to get the servers rebooted.
>
> Can anyone explain what was happening here?  Not having the servers
> recover accessibility for an extended period even after the excess
> traffic was stopped is unacceptable.  We're also struggling to recreate
> the effect in the lab: any clues about how to do so, and any suggestions
> about how to prevent the 'going catatonic' response would be
greatly
> appreciated.
>
> Servers are amd64 running FreeBSD 9.1 or 9.2 and Varnish 3.0.5.
>
>
> 	Cheers,
>
> 	Matthew
>
>
>

freebsd stable - Nov 2014 - Varnish proxy goes catatonic under heavy load

Varnish proxy goes catatonic under heavy load

Varnish proxy goes catatonic under heavy load