This patch is based on one which has been in XenServer for a very long. To keep the trend of documentation going, it also corrects the new command line document. -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, citrix.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org lists.xen.org/xen-devel
>>> On 07.03.12 at 17:55, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > This patch is based on one which has been in XenServer for a very long.I''m not sure this is a good idea - people with systems where the watchdog doesn''t work will likely start complaining about why it doesn''t work, as that''s going to be always visible in the log. Furthermore, a timeout of 5 min is just unacceptably large. If we really still have problems with the 5 sec timeout, we should fix those rather than bumping the timeout - no timer events for 5 sec is already way too long. Jan
On 07/03/2012 16:55, "Andrew Cooper" <andrew.cooper3@citrix.com> wrote:> This patch is based on one which has been in XenServer for a very long. > > To keep the trend of documentation going, it also corrects the new > command line document.This does not only enable the watchdog by default, but also changes the timeout from 5 seconds to 5 *minutes*! Even if that is a good idea in some cases, we''d at least have to make the timeout configurable. Developers will not appreciate having to wait for 5 minutes for their lockups to produce useful trace output. -- Keir
On Wed, Mar 7, 2012 at 5:11 PM, Keir Fraser <keir@xen.org> wrote:> On 07/03/2012 16:55, "Andrew Cooper" <andrew.cooper3@citrix.com> wrote: > >> This patch is based on one which has been in XenServer for a very long. >> >> To keep the trend of documentation going, it also corrects the new >> command line document. > > This does not only enable the watchdog by default, but also changes the > timeout from 5 seconds to 5 *minutes*! > > Even if that is a good idea in some cases, we''d at least have to make the > timeout configurable. Developers will not appreciate having to wait for 5 > minutes for their lockups to produce useful trace output.Yes, the 5 minute timeout is what we ship for production boxes, so that we can catch actual deadlocks while being *really really* sure we don''t take down a customer''s system unless it''s *really* dead. I think leaving the timeout as it is and making it configurable is probably the best option. -George> > -- Keir > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > lists.xen.org/xen-devel
On 07/03/2012 17:35, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote:> On Wed, Mar 7, 2012 at 5:11 PM, Keir Fraser <keir@xen.org> wrote: >> On 07/03/2012 16:55, "Andrew Cooper" <andrew.cooper3@citrix.com> wrote: >> >>> This patch is based on one which has been in XenServer for a very long. >>> >>> To keep the trend of documentation going, it also corrects the new >>> command line document. >> >> This does not only enable the watchdog by default, but also changes the >> timeout from 5 seconds to 5 *minutes*! >> >> Even if that is a good idea in some cases, we''d at least have to make the >> timeout configurable. Developers will not appreciate having to wait for 5 >> minutes for their lockups to produce useful trace output. > > Yes, the 5 minute timeout is what we ship for production boxes, so > that we can catch actual deadlocks while being *really really* sure we > don''t take down a customer''s system unless it''s *really* dead. I > think leaving the timeout as it is and making it configurable is > probably the best option.A patch to do that would be acceptable. With that we may as well keep NMI watchdog disabled by default, and default timeout of 5 seconds. That then means that XenServer carries a command-line option for its watchdog settings, rather than a patch. -- Keir> -George > >> >> -- Keir >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> lists.xen.org/xen-devel
Here is a patch which allows a command line parameter to set the watchdog timeout. -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, citrix.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org lists.xen.org/xen-devel
>>> On 07.03.12 at 19:14, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > Here is a patch which allows a command line parameter to set the > watchdog timeout.Wouldn''t it make sense to simply modify the watchdog= semantics slightly: When given a numeric value, it specifies the timeout (and zero disables as before), while when given any value parse_bool() accepts the timeout remains at the default and the watchdog just gets turned on. The only ambiguity with the current possible values would be watchdog=1, and I think having the meaning of this changed is acceptable. Jan
On Thu, 2012-03-08 at 10:06 +0000, Jan Beulich wrote:> >>> On 07.03.12 at 19:14, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > > Here is a patch which allows a command line parameter to set the > > watchdog timeout. > > Wouldn''t it make sense to simply modify the watchdog= semantics > slightly: When given a numeric value, it specifies the timeout (and zero > disables as before), while when given any value parse_bool() accepts > the timeout remains at the default and the watchdog just gets turned > on.I had thought of this too, but I don''t think it actually simplifies it. It makes the parsing code more complicated, and it makes the interface actually more complicated and less predictable, for pretty much no good reason. It''s not like adding extra command-line options is really onerous, either in terms of users or in terms of our code. I think it''s much more important to have consistency and predictability; the new interpretation "watchdog=1" is exactly the kind of thing we need to avoid. -George> The only ambiguity with the current possible values would be > watchdog=1, and I think having the meaning of this changed is > acceptable. > > Jan >
Seemingly Similar Threads
- [PATCH V3] vmx/nmi: Do not use self_nmi() in VMEXIT handler
- [PATCH] kexec/noreboot: Don't kexec_crash() if noreboot has been requested.
- [PATCH v2] xen: fix initialization of wallclock time for PVHVM on migration
- [PATCH] IRQ: Group IRQ_MOVE_CLEANUP_VECTOR with other hypervisor IPIs
- [Patch v3 0/4] Xen stack trace printing improvements