thr3ads.net - Xen devel - [PATCH] Introduce an s3 test [Apr 2013]

If this information is useful, please help other people find it:
Share via:

Ben Guthro

2013-Apr-26 17:26 UTC

[PATCH] Introduce an s3 test

From: root <root@bguthro-desktop.(none)>

This test attempts to have an initial pass at introducing a test to catch
regressions in S3.
It currently just suspends for N seconds, and checks xl dmesg for a partiular
message printed
when S3 is complete.
---
 ts-host-suspend |   51 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)
 create mode 100755 ts-host-suspend

diff --git a/ts-host-suspend b/ts-host-suspend
new file mode 100755
index 0000000..9fe38d5
--- /dev/null
+++ b/ts-host-suspend
@@ -0,0 +1,51 @@
+#!/usr/bin/perl -w
+
+use strict qw(vars);
+
+use Osstest;
+use Osstest::TestSupport;
+
+tsreadconfig();
+
+my $timeout = 30;
+if (@ARGV && $ARGV[0] =~ m/^--timeout=([0-9]*)$/) {
+    $timeout = $1;
+    shift @ARGV;
+}
+
+our ($whhost) = @ARGV;
+$whhost ||= ''host'';
+our $ho = selecthost($whhost);
+
+my $RTC = "/sys/class/rtc/rtc0";
+
+# get RTC NOW
+my $epoch = target_cmd_output_root($ho, "cat " . $RTC .
"/since_epoch");
+
+# Clear the wake alarm
+target_cmd_root($ho, "echo 0 > " . $RTC . "/wakealarm");
+
+# Set the wake alarm to NOW + time
+my $t2 = $epoch + $timeout;
+target_cmd_root($ho, "echo " . ($epoch + $timeout) . " >
". $RTC . "/wakealarm");
+
+# Put the machine to sleep
+target_cmd_root($ho, "pm-suspend");
+
+# Give the machine some time to go to sleep.
+sleep (5 + $timeout);
+
+# check log for resume message
+poll_loop(4*$timeout, 2, ''s3-confirm-resumed'', 
+	target_cmd_output($ho,"xl dmesg | grep ''ACPI S'' | tail
-1 | " .
+		"grep -n ''Finishing wakeup from S3 state''"));
+
+# TODO:
+# - Check pcpu state
+#   - Affinity has been restored
+#   - C-states are not lost
+#   - CPU pools are all correct
+# - Check timer queues are correct
+#   - vcpu_singleshot_timer on every pcpu
+# - Check for kernel Oops
+# - Check for Xen WARN 
-- 
1.7.9.5

Ian Jackson

2013-May-01 10:56 UTC

head link

Re: [PATCH] Introduce an s3 test

Ben Guthro writes ("[PATCH] Introduce an s3
test"):> From: root <root@bguthro-desktop.(none)>
> 
> This test attempts to have an initial pass at introducing a test to catch
regressions in S3.
> It currently just suspends for N seconds, and checks xl dmesg for a
partiular message printed
> when S3 is complete.
Thanks.  Most of this looks plausible.  I have some comments:
> +# Put the machine to sleep
> +target_cmd_root($ho, "pm-suspend");
> +
> +# Give the machine some time to go to sleep.
> +sleep (5 + $timeout);
> +
> +# check log for resume message
> +poll_loop(4*$timeout, 2, ''s3-confirm-resumed'', 
> +	target_cmd_output($ho,"xl dmesg | grep ''ACPI S'' |
tail -1 | " .
> +		"grep -n ''Finishing wakeup from S3
state''"));
Why does this need a poll loop ?  Surely after the machine comes out
of suspend it should be up right away ?
> +# TODO:
> +# - Check pcpu state
> +#   - Affinity has been restored
> +#   - C-states are not lost
> +#   - CPU pools are all correct
We don''t do any cpu affinity testing at all right now.  Leaving
this as a TODO here is fine.
> +# - Check timer queues are correct
> +#   - vcpu_singleshot_timer on every pcpu
I''m not sure I follow this.  Wouldn''t messed up timer queues
cause
other trouble in the guest ?
> +# - Check for kernel Oops
> +# - Check for Xen WARN 
These are a good idea but should perhaps be a separate test step.

Ian.

Ian Campbell

2013-May-01 11:23 UTC

head link

Re: [PATCH] Introduce an s3 test

On Wed, 2013-05-01 at 11:56 +0100, Ian Jackson wrote:> Ben Guthro writes ("[PATCH] Introduce an s3 test"):
> > From: root <root@bguthro-desktop.(none)>
> > 
> > This test attempts to have an initial pass at introducing a test to
catch regressions in S3.
> > It currently just suspends for N seconds, and checks xl dmesg for a
partiular message printed
> > when S3 is complete.
> 
> Thanks.  Most of this looks plausible.  I have some comments:
> 
> > +# Put the machine to sleep
> > +target_cmd_root($ho, "pm-suspend");
> > +
> > +# Give the machine some time to go to sleep.
> > +sleep (5 + $timeout);
> > +
> > +# check log for resume message
> > +poll_loop(4*$timeout, 2, ''s3-confirm-resumed'', 
> > +	target_cmd_output($ho,"xl dmesg | grep ''ACPI
S'' | tail -1 | " .
> > +		"grep -n ''Finishing wakeup from S3
state''"));
> 
> Why does this need a poll loop ?  Surely after the machine comes out
> of suspend it should be up right away ?
Not immediately I expect, but what happens if the ssh fails -- is there
retries at that level which make the poll loop redundant?

Do we need to handle the case where the s3 resume fails such that
subsequent tests can work? i.e. finish up with an explicit
reboot/powercycle?

Ian.

Ben Guthro

2013-May-01 11:58 UTC

head link

Re: [PATCH] Introduce an s3 test

On 05/01/2013 06:56 AM, Ian Jackson wrote:> Ben Guthro writes ("[PATCH] Introduce an s3 test"):
>> From: root <root@bguthro-desktop.(none)>
>>
>> This test attempts to have an initial pass at introducing a test to
catch regressions in S3.
>> It currently just suspends for N seconds, and checks xl dmesg for a
partiular message printed
>> when S3 is complete.
>
> Thanks.  Most of this looks plausible.  I have some comments:
>
>> +# Put the machine to sleep
>> +target_cmd_root($ho, "pm-suspend");
>> +
>> +# Give the machine some time to go to sleep.
>> +sleep (5 + $timeout);
>> +
>> +# check log for resume message
>> +poll_loop(4*$timeout, 2, ''s3-confirm-resumed'',
>> +	target_cmd_output($ho,"xl dmesg | grep ''ACPI
S'' | tail -1 | " .
>> +		"grep -n ''Finishing wakeup from S3
state''"));
>
> Why does this need a poll loop ?  Surely after the machine comes out
> of suspend it should be up right away ?
This is a bit of a "first pass" in a test environment I''ve
never used
before. I modeled this after other tests I found in the same dir. If 
this is inappropriate, then I suspect you are correct.

I put it in the loop for the case of networking taking some time to come 
back online, so if the ssh command failed it would be retried. 
Additionally, I have found that the RTC wakeup mechanism is not very 
accurate in its timing.
>
>> +# TODO:
>> +# - Check pcpu state
>> +#   - Affinity has been restored
>> +#   - C-states are not lost
>> +#   - CPU pools are all correct
>
> We don''t do any cpu affinity testing at all right now.  Leaving
> this as a TODO here is fine.
>
>> +# - Check timer queues are correct
>> +#   - vcpu_singleshot_timer on every pcpu
>
> I''m not sure I follow this.  Wouldn''t messed up timer
queues cause
> other trouble in the guest ?
Yes, but it has been a common point of failure / problems after S3. I 
put this here as a placeholder to verify that everything is still as it 
should be.
>
>> +# - Check for kernel Oops
>> +# - Check for Xen WARN
>
> These are a good idea but should perhaps be a separate test step.
Wouldn''t you want a warning/oops that was provoked by S3 to be 
associated with that test?
>
> Ian.
>

Ian Jackson

2013-May-02 15:06 UTC

head link

Re: [PATCH] Introduce an s3 test

Ben Guthro writes ("Re: [PATCH] Introduce an s3
test"):> On 05/01/2013 06:56 AM, Ian Jackson wrote:
> >> +# check log for resume message
> >> +poll_loop(4*$timeout, 2, ''s3-confirm-resumed'',
> >> +	target_cmd_output($ho,"xl dmesg | grep ''ACPI
S'' | tail -1 | " .
> >> +		"grep -n ''Finishing wakeup from S3
state''"));
> >
> > Why does this need a poll loop ?  Surely after the machine comes out
> > of suspend it should be up right away ?
> 
> This is a bit of a "first pass" in a test environment
I''ve never used
> before. I modeled this after other tests I found in the same dir. If 
> this is inappropriate, then I suspect you are correct.
Maybe you should be using guest_check_up ?
> I put it in the loop for the case of networking taking some time to come 
> back online, so if the ssh command failed it would be retried. 
How long is it supposed to take to come back online ?  "4*$timeout"
seems (a) a bit arbitrary (b) rather long with your existing value of
$timeout.
> Additionally, I have found that the RTC wakeup mechanism is not very 
> accurate in its timing.
How unfortunate.
> > I''m not sure I follow this.  Wouldn''t messed up
timer queues cause
> > other trouble in the guest ?
> 
> Yes, but it has been a common point of failure / problems after S3. I 
> put this here as a placeholder to verify that everything is still as it 
> should be.
Err, OK.
> >> +# - Check for kernel Oops
> >> +# - Check for Xen WARN
> >
> > These are a good idea but should perhaps be a separate test step.
> 
> Wouldn''t you want a warning/oops that was provoked by S3 to be 
> associated with that test?
Hrm.  Well in principle this is surely true of any test.

Can we make warnings/oopses fatal ?

Ian.

Ben Guthro

2013-May-02 20:28 UTC

head link

Re: [PATCH] Introduce an s3 test

On May 2, 2013, at 11:06 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>
 wrote:
> Ben Guthro writes ("Re: [PATCH] Introduce an s3 test"):
>> On 05/01/2013 06:56 AM, Ian Jackson wrote:
>>>> +# check log for resume message
>>>> +poll_loop(4*$timeout, 2,
''s3-confirm-resumed'',
>>>> +	target_cmd_output($ho,"xl dmesg | grep ''ACPI
S'' | tail -1 | " .
>>>> +		"grep -n ''Finishing wakeup from S3
state''"));
>>> 
>>> Why does this need a poll loop ?  Surely after the machine comes
out
>>> of suspend it should be up right away ?
>> 
>> This is a bit of a "first pass" in a test environment
I''ve never used
>> before. I modeled this after other tests I found in the same dir. If 
>> this is inappropriate, then I suspect you are correct.
> 
> Maybe you should be using guest_check_up ?
I''ll own up to the fact that I wasn''t really able to test the
infrastructure portions of this script.
I was unsuccessful in getting them to run, even using the "standalone"
branch.

It would really help if someone who has access to the test infrastructure could
take my script as a starting point, and adapt it to whatever is necessary for
that test environment.
> 
>> I put it in the loop for the case of networking taking some time to
come
>> back online, so if the ssh command failed it would be retried. 
> 
> How long is it supposed to take to come back online ? 
"4*$timeout"
> seems (a) a bit arbitrary (b) rather long with your existing value of
> $timeout.
For all devices to come back online, it can sometimes take up to 20s.

This value was arbitrary, but chosen with the RTC variance + devices coming on
line.
This should probably be a tunable value.
> 
>> Additionally, I have found that the RTC wakeup mechanism is not very 
>> accurate in its timing.
> 
> How unfortunate.
Indeed. We frequently see sleeping machines for 1m sometimes results in
sometimes results in machines waking up 30s later - others 3m later.
> 
>>> I''m not sure I follow this.  Wouldn''t messed up
timer queues cause
>>> other trouble in the guest ?
>> 
>> Yes, but it has been a common point of failure / problems after S3. I 
>> put this here as a placeholder to verify that everything is still as it
>> should be.
> 
> Err, OK.
> 
I see automated testing as a resource to be able to confirm that problems that
occurred in the past do not re-emerge from new development, rather than strictly
functional testing.
If you disagree with this, feel free to remove it. I don''t feel
strongly about this particular point.

>>>> +# - Check for kernel Oops
>>>> +# - Check for Xen WARN
>>> 
>>> These are a good idea but should perhaps be a separate test step.
>> 
>> Wouldn''t you want a warning/oops that was provoked by S3 to be
>> associated with that test?
> 
> Hrm.  Well in principle this is surely true of any test.
> 
> Can we make warnings/oopses fatal ?
> 
That seems like it would be prudent, if possible.
As I mentioned above, I had difficulty configuring this test environment, so it
may be trivial, and I am just not familiar enough with this environment.


Ben

Xen devel - Apr 2013 - [PATCH] Introduce an s3 test

[PATCH] Introduce an s3 test

Re: [PATCH] Introduce an s3 test

Re: [PATCH] Introduce an s3 test

Re: [PATCH] Introduce an s3 test

Re: [PATCH] Introduce an s3 test

Re: [PATCH] Introduce an s3 test