From: root <root@bguthro-desktop.(none)> This test attempts to have an initial pass at introducing a test to catch regressions in S3. It currently just suspends for N seconds, and checks xl dmesg for a partiular message printed when S3 is complete. --- ts-host-suspend | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) create mode 100755 ts-host-suspend diff --git a/ts-host-suspend b/ts-host-suspend new file mode 100755 index 0000000..9fe38d5 --- /dev/null +++ b/ts-host-suspend @@ -0,0 +1,51 @@ +#!/usr/bin/perl -w + +use strict qw(vars); + +use Osstest; +use Osstest::TestSupport; + +tsreadconfig(); + +my $timeout = 30; +if (@ARGV && $ARGV[0] =~ m/^--timeout=([0-9]*)$/) { + $timeout = $1; + shift @ARGV; +} + +our ($whhost) = @ARGV; +$whhost ||= ''host''; +our $ho = selecthost($whhost); + +my $RTC = "/sys/class/rtc/rtc0"; + +# get RTC NOW +my $epoch = target_cmd_output_root($ho, "cat " . $RTC . "/since_epoch"); + +# Clear the wake alarm +target_cmd_root($ho, "echo 0 > " . $RTC . "/wakealarm"); + +# Set the wake alarm to NOW + time +my $t2 = $epoch + $timeout; +target_cmd_root($ho, "echo " . ($epoch + $timeout) . " > ". $RTC . "/wakealarm"); + +# Put the machine to sleep +target_cmd_root($ho, "pm-suspend"); + +# Give the machine some time to go to sleep. +sleep (5 + $timeout); + +# check log for resume message +poll_loop(4*$timeout, 2, ''s3-confirm-resumed'', + target_cmd_output($ho,"xl dmesg | grep ''ACPI S'' | tail -1 | " . + "grep -n ''Finishing wakeup from S3 state''")); + +# TODO: +# - Check pcpu state +# - Affinity has been restored +# - C-states are not lost +# - CPU pools are all correct +# - Check timer queues are correct +# - vcpu_singleshot_timer on every pcpu +# - Check for kernel Oops +# - Check for Xen WARN -- 1.7.9.5
Ben Guthro writes ("[PATCH] Introduce an s3 test"):> From: root <root@bguthro-desktop.(none)> > > This test attempts to have an initial pass at introducing a test to catch regressions in S3. > It currently just suspends for N seconds, and checks xl dmesg for a partiular message printed > when S3 is complete.Thanks. Most of this looks plausible. I have some comments:> +# Put the machine to sleep > +target_cmd_root($ho, "pm-suspend"); > + > +# Give the machine some time to go to sleep. > +sleep (5 + $timeout); > + > +# check log for resume message > +poll_loop(4*$timeout, 2, ''s3-confirm-resumed'', > + target_cmd_output($ho,"xl dmesg | grep ''ACPI S'' | tail -1 | " . > + "grep -n ''Finishing wakeup from S3 state''"));Why does this need a poll loop ? Surely after the machine comes out of suspend it should be up right away ?> +# TODO: > +# - Check pcpu state > +# - Affinity has been restored > +# - C-states are not lost > +# - CPU pools are all correctWe don''t do any cpu affinity testing at all right now. Leaving this as a TODO here is fine.> +# - Check timer queues are correct > +# - vcpu_singleshot_timer on every pcpuI''m not sure I follow this. Wouldn''t messed up timer queues cause other trouble in the guest ?> +# - Check for kernel Oops > +# - Check for Xen WARNThese are a good idea but should perhaps be a separate test step. Ian.
On Wed, 2013-05-01 at 11:56 +0100, Ian Jackson wrote:> Ben Guthro writes ("[PATCH] Introduce an s3 test"): > > From: root <root@bguthro-desktop.(none)> > > > > This test attempts to have an initial pass at introducing a test to catch regressions in S3. > > It currently just suspends for N seconds, and checks xl dmesg for a partiular message printed > > when S3 is complete. > > Thanks. Most of this looks plausible. I have some comments: > > > +# Put the machine to sleep > > +target_cmd_root($ho, "pm-suspend"); > > + > > +# Give the machine some time to go to sleep. > > +sleep (5 + $timeout); > > + > > +# check log for resume message > > +poll_loop(4*$timeout, 2, ''s3-confirm-resumed'', > > + target_cmd_output($ho,"xl dmesg | grep ''ACPI S'' | tail -1 | " . > > + "grep -n ''Finishing wakeup from S3 state''")); > > Why does this need a poll loop ? Surely after the machine comes out > of suspend it should be up right away ?Not immediately I expect, but what happens if the ssh fails -- is there retries at that level which make the poll loop redundant? Do we need to handle the case where the s3 resume fails such that subsequent tests can work? i.e. finish up with an explicit reboot/powercycle? Ian.
On 05/01/2013 06:56 AM, Ian Jackson wrote:> Ben Guthro writes ("[PATCH] Introduce an s3 test"): >> From: root <root@bguthro-desktop.(none)> >> >> This test attempts to have an initial pass at introducing a test to catch regressions in S3. >> It currently just suspends for N seconds, and checks xl dmesg for a partiular message printed >> when S3 is complete. > > Thanks. Most of this looks plausible. I have some comments: > >> +# Put the machine to sleep >> +target_cmd_root($ho, "pm-suspend"); >> + >> +# Give the machine some time to go to sleep. >> +sleep (5 + $timeout); >> + >> +# check log for resume message >> +poll_loop(4*$timeout, 2, ''s3-confirm-resumed'', >> + target_cmd_output($ho,"xl dmesg | grep ''ACPI S'' | tail -1 | " . >> + "grep -n ''Finishing wakeup from S3 state''")); > > Why does this need a poll loop ? Surely after the machine comes out > of suspend it should be up right away ?This is a bit of a "first pass" in a test environment I''ve never used before. I modeled this after other tests I found in the same dir. If this is inappropriate, then I suspect you are correct. I put it in the loop for the case of networking taking some time to come back online, so if the ssh command failed it would be retried. Additionally, I have found that the RTC wakeup mechanism is not very accurate in its timing.> >> +# TODO: >> +# - Check pcpu state >> +# - Affinity has been restored >> +# - C-states are not lost >> +# - CPU pools are all correct > > We don''t do any cpu affinity testing at all right now. Leaving > this as a TODO here is fine. > >> +# - Check timer queues are correct >> +# - vcpu_singleshot_timer on every pcpu > > I''m not sure I follow this. Wouldn''t messed up timer queues cause > other trouble in the guest ?Yes, but it has been a common point of failure / problems after S3. I put this here as a placeholder to verify that everything is still as it should be.> >> +# - Check for kernel Oops >> +# - Check for Xen WARN > > These are a good idea but should perhaps be a separate test step.Wouldn''t you want a warning/oops that was provoked by S3 to be associated with that test?> > Ian. >
Ben Guthro writes ("Re: [PATCH] Introduce an s3 test"):> On 05/01/2013 06:56 AM, Ian Jackson wrote: > >> +# check log for resume message > >> +poll_loop(4*$timeout, 2, ''s3-confirm-resumed'', > >> + target_cmd_output($ho,"xl dmesg | grep ''ACPI S'' | tail -1 | " . > >> + "grep -n ''Finishing wakeup from S3 state''")); > > > > Why does this need a poll loop ? Surely after the machine comes out > > of suspend it should be up right away ? > > This is a bit of a "first pass" in a test environment I''ve never used > before. I modeled this after other tests I found in the same dir. If > this is inappropriate, then I suspect you are correct.Maybe you should be using guest_check_up ?> I put it in the loop for the case of networking taking some time to come > back online, so if the ssh command failed it would be retried.How long is it supposed to take to come back online ? "4*$timeout" seems (a) a bit arbitrary (b) rather long with your existing value of $timeout.> Additionally, I have found that the RTC wakeup mechanism is not very > accurate in its timing.How unfortunate.> > I''m not sure I follow this. Wouldn''t messed up timer queues cause > > other trouble in the guest ? > > Yes, but it has been a common point of failure / problems after S3. I > put this here as a placeholder to verify that everything is still as it > should be.Err, OK.> >> +# - Check for kernel Oops > >> +# - Check for Xen WARN > > > > These are a good idea but should perhaps be a separate test step. > > Wouldn''t you want a warning/oops that was provoked by S3 to be > associated with that test?Hrm. Well in principle this is surely true of any test. Can we make warnings/oopses fatal ? Ian.
On May 2, 2013, at 11:06 AM, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:> Ben Guthro writes ("Re: [PATCH] Introduce an s3 test"): >> On 05/01/2013 06:56 AM, Ian Jackson wrote: >>>> +# check log for resume message >>>> +poll_loop(4*$timeout, 2, ''s3-confirm-resumed'', >>>> + target_cmd_output($ho,"xl dmesg | grep ''ACPI S'' | tail -1 | " . >>>> + "grep -n ''Finishing wakeup from S3 state''")); >>> >>> Why does this need a poll loop ? Surely after the machine comes out >>> of suspend it should be up right away ? >> >> This is a bit of a "first pass" in a test environment I''ve never used >> before. I modeled this after other tests I found in the same dir. If >> this is inappropriate, then I suspect you are correct. > > Maybe you should be using guest_check_up ?I''ll own up to the fact that I wasn''t really able to test the infrastructure portions of this script. I was unsuccessful in getting them to run, even using the "standalone" branch. It would really help if someone who has access to the test infrastructure could take my script as a starting point, and adapt it to whatever is necessary for that test environment.> >> I put it in the loop for the case of networking taking some time to come >> back online, so if the ssh command failed it would be retried. > > How long is it supposed to take to come back online ? "4*$timeout" > seems (a) a bit arbitrary (b) rather long with your existing value of > $timeout.For all devices to come back online, it can sometimes take up to 20s. This value was arbitrary, but chosen with the RTC variance + devices coming on line. This should probably be a tunable value.> >> Additionally, I have found that the RTC wakeup mechanism is not very >> accurate in its timing. > > How unfortunate.Indeed. We frequently see sleeping machines for 1m sometimes results in sometimes results in machines waking up 30s later - others 3m later.> >>> I''m not sure I follow this. Wouldn''t messed up timer queues cause >>> other trouble in the guest ? >> >> Yes, but it has been a common point of failure / problems after S3. I >> put this here as a placeholder to verify that everything is still as it >> should be. > > Err, OK. >I see automated testing as a resource to be able to confirm that problems that occurred in the past do not re-emerge from new development, rather than strictly functional testing. If you disagree with this, feel free to remove it. I don''t feel strongly about this particular point.>>>> +# - Check for kernel Oops >>>> +# - Check for Xen WARN >>> >>> These are a good idea but should perhaps be a separate test step. >> >> Wouldn''t you want a warning/oops that was provoked by S3 to be >> associated with that test? > > Hrm. Well in principle this is surely true of any test. > > Can we make warnings/oopses fatal ? >That seems like it would be prudent, if possible. As I mentioned above, I had difficulty configuring this test environment, so it may be trivial, and I am just not familiar enough with this environment. Ben