James Song
2008-Nov-27  05:57 UTC
RE: RE: Re: RE: Re: Re: [Xen-devel] when timer go back in dom0 save and restore ormigrate, PV domain hung
Kevin,
     Ok, I find we talk about different time_resume ;-) , time_resume I
mentioned is in xen/arch/x86/time.c. but that of you mentioned lies in
dom0''s kernel. Ok, I also think we can modify something in
dom0''s kernel
whcich can also resolve this problem. 
     Let me check, which is better.
Thanks
--James 
>>> "Tian, Kevin" <kevin.tian@intel.com> 08/11/27/ PM
13:37 >>>No,
time_resume is for sure invoked. You should look at machine_reboot.c
which is the whole path for s/r and lm.
 
"date" will change since by default wall clock in guest is synced to
real. Maybe independent_wallclock is something you want to start with,
which is not cared at s/r for now.
 
Thanks,
Kevin
  From: James Song [mailto:jsong@novell.com]   
Sent: Thursday, November 27, 2008 1:10 PM
To:   keir.fraser@eu.citrix.com; Tian, Kevin;  
xen-devel@lists.xensource.com
Subject: Re: RE: Re: Re: [Xen-devel]   when timer go back in dom0 save
and restore ormigrate, PV domain   hung
  
F.Y.I
>>> "Tian, Kevin"   <kevin.tian@intel.com>08.11.27. 
11:50 >>>   Sorry
for a typo. I did mean domU instead of dom0. :-)   The point here is
that time_resume will sync to new system time and wall clock   at
restore, and thus pv guest should be able to continue... Xen system time
is   not wallclock time which just counts up from power up. As Keir
points   out, only its progress is used to drive internal jiffies.
  --- Actually,   save/restore or migrate will not call time_resume,
this function mybe only be   called in power saving.
  Then what do you mean for "system time stop"   here? TOD at user
level, or within kernel you observe xen system time   never changing?
  --- If you run command "date" in   user mode, you will find the date
of output never change until a time interval   equal to the value of
time delay. And also, you can run some applicatin   without many
relation with time. such as vi,cd...etc, but if you run ping   x.x.x.x
you will find only one line''s respose and never go on.   
 Thanks
 --James
  
      From: James Song [mailto:jsong@novell.com]     
Sent: Thursday, November 27, 2008 11:20 AM
To:     keir.fraser@eu.citrix.com; Tian, Kevin;    
xen-devel@lists.xensource.com
Subject: 答复: Re: [Xen-devel] when     timer go back in dom0 save and
restore ormigrate, PV domain     hung
    
Hi,
    yes, there is a patch before to fix     problem wc_sec/wc_nsec in
xc_domain_restore.c, but it still missed     something.
If constucting dom0 or restoring of a PV dom. Guest os will     read the
local wc_sec from xen as it base time.wc_sec is initialized with    
CMOS data. There were some case which wc_sec will be changed. One is
that go     back dom0''s system-time will change dom0''s time
and wc_sec
smaller which is     both Guest os and Xen. Actually, we can do a simple
test, starting a pv     domain, then change dom0''s time, and you will
find the system time of guest     os stopped. That because you change
wc_sec of both xen and guest os.     
    This patch only consider the case of save/restore. I     still not
sure the policy of this case that is when dom0''s system-time go    
back. what VMs should do?  So, I have add this case to this     patch
   By the way, Kevin, Guest OS will hang not dom0 ;-) and     also the
time of hang just is equivlant to the time interval you go back in    
dom0 or new machine you migrate.
 Thanks
  --     James
>>> Keir Fraser <keir.fraser@eu.citrix.com>08?11?26?     ?? 22:58
>>> So
what happens if someone changes wallclock using     ''date''?
That''s
basically kind of what will appear to happen when s/r     occurs.
 -- Keir
On 26/11/08 14:32, "Tian, Kevin"     <kevin.tian@intel.com>
wrote:
    hrtimer supports two timer bases: CLOCK_MONOTONIC and      
CLOCK_REALTIME. wall_to_monotonic is only added in former case, and for 
     latter instead TOD is used directly per my reading. I did a quick
search,       and it looks that futex and ntp are using CLOCK_REALTIME.
Also there''s one       vsyscall gate which can pass CLOCK_REALTIME from
caller       too.
Thanks,
Kevin
      
 
        mailto:keir.fraser@eu.citrix.com]         
Sent: Wednesday, November 26,  2008 10:26         PM
To: Tian, Kevin; ''James Song'';         
xen-devel@lists.xensource.com
Subject: Re: [Xen-devel]         when timer go  back in dom0 save and
restore or migrate, PV domain         hung
 
hrtimers add         wall_to_monotonic to xtime to get a  timesource
that doesn''t (or         shouldn''t!) warp.
 -- Keir
On  26/11/08 14:20,         "Tian, Kevin" <kevin.tian@intel.com>
 wrote:
 
        how about hrtimers? one mode is CLOCK_REALTIME, which uses      
     getnstimeofday as expiration. Once system time is changed either   
       in local or  new machine, that expiration can''t be adjusted. but
         i''m not sure whether it  still makes sense to try hrtimers in
a
          guest.
Thanks
Kevin
 
          
 
 
            mailto:keir.fraser@eu.citrix.com]              
Sent: Wednesday, November 26,  2008 10:11             PM
To:  Tian, Kevin; ''James Song'';              
xen-devel@lists.xensource.com
Subject: Re:             [Xen-devel]  when timer go  back in dom0 save
and restore             or migrate, PV domain  hung
 
The  problem             hasn''t been fully explained, but I can say 
that PV guests              expect system time to jump across s/r and
deal with that. For               example, Linux doesn''t use Xen system
time internally,             but uses its  progress  to periodically
update jiffies,             which does not warp across  s/r.
We have  had             problems corrupting wc_sec/wc_nsec in 
xc_domain_restore.c, but             that was  fixed some time  ago.
 --             Keir
On 26/11/08 14:00, "Tian,  Kevin"             
<kevin.tian@intel.com>
         wrote:
 
 
            This is not a s/r or lm specific               issue. For
example, system  time  can be changed even               when pv guest
is running. Your patch only  hacks restore                point once,
and wc_sec can still be changed later  when               system time is
 changed on-the-fly                again.
IIRC, pv guest can catch up wall               clock change in timer 
interrupt,  and time_resume will               sync internal processed
system  time with new system                time after restored. But
I''m
not sure whether  it''s               enough. Actually the more 
interesting is the uptime                difference. For example, timer
with expiration                calculated on  previous system time may
wait nearly               infinite if uptime among  two  boxes vary a
lot. But I               think such issue should have been considered  
already,               e.g. some user tool assistance. I think Keir can
comment                better  here.
BTW, do you happen to know what               exactly dom0 hangs on? In 
some  busy loop to catch up               time, or long delay to some
critical  timer                expiration?
Thanks,
Kevin
 
 
              
 
 
 
                mailto:xen-devel-bounces@lists.xensource.com]           
       On Behalf Of James  Song
Sent:                 Tuesday,  November 25,  2008 4:02 PM
To:                    xen-devel@lists.xensource.com
Subject:                  [Xen-devel] when  timer go  back in dom0 save 
               and restore or  migrate, PV domain                  hung
 
Hi,
   I                   find PV domin hung, When we take those steps     
              
         1,                  save PV  domain                   
         2,                   change system time of  PV domain back     
             
         3,                  restore   a PV domain                  
        or                    
         1,                  migrate  a PV domain  from Machine A to
Machine                   B
         2,                  the system   time of Machine B is slower
than                 Machine  A.
   the  problem is                  wc_sec will be  change when
system-time chanaged in                 dom0  or restore in a  
slower-system-time                 machine, but when restoring, xen 
don''t  restore the                 wc_sec  of share_info from xenstore
and use native                   one. So guest os will hang.  
this patch                 will work for  this                  issue.
 Thanks
 -- Song                   Wei
diff -r  a5ed0dbc829f                  tools/libxc/xc_domain_restore.c
---                    a/tools/libxc/xc_domain_restore.c                
   Tue  Nov 18  14:34:14 2008                  +0800
+++  b/tools/libxc/xc_domain_restore.c                     Fri Nov 21  
17:34:15 2008                 +0800
@@ -328,6  +328,16                   @@
 
     /* For                 info   only                  */
     nr_pfns = 0;
+                       //jsong@novell.com, james                 song
+      memset(&domctl, 0,                   sizeof(domctl));
+                     domctl.domain =   dom;
+                     domctl.cmd    =                   
XEN_DOMCTL_restoredomain;
+                    frc =   do_domctl(xc_handle,                 
&domctl);
+     if ( frc                  != 0 )
+      {
+                               ERROR("Unable                   to set
flag of  restore.");
+                               goto                   out;
+                      }
 
     if                  (   read_exact(io_fd, &p2m_size,               
 sizeof(unsigned long))                    )
     {
@@                 -1120,6 +1130,8                    @@
 
     /*                 restore  saved  vcpu_info and arch  specific
info                   */
     MEMCPY_FIELD(new_shared_info,                    old_shared_info,
vcpu_info);
+                       MEMCPY_FIELD(new_shared_info,                 
old_shared_info,   wc_nsec);
+                     MEMCPY_FIELD(new_shared_info,                   
old_shared_info,                   wc_sec);
      MEMCPY_FIELD(new_shared_info,                   old_shared_info,  
                 arch);
 
     /*                 clear  any  pending events and  the selector    
            */
diff -r  a5ed0dbc829f                  xen/arch/x86/time.c
---                   a/xen/arch/x86/time.c     Tue                 Nov
18  14:34:14 2008 +0800
+++                   b/xen/arch/x86/time.c     Fri                 Nov
21 17:34:15 2008  +0800
@@   -689,7 +689,6                   @@
      wmb();
     (*version)++;
 }
-
 void                    update_vcpu_system_time(struct vcpu            
      *v)
 {
      struct                  cpu_time                        *t;
@@ -703,7                  +702,6                   @@
 
     if (                   u->tsc_timestamp ==  t->local_tsc_stamp     
             )
          return;
-
      version_update_begin(&u->version);
 
      u->tsc_timestamp                       =                
t->local_tsc_stamp;
@@   -713,14  +711,19                   @@
 
      version_update_end(&u->version);
 }
-
 void                    update_domain_wallclock_time(struct domain     
              *d)
 {
      spin_lock(&wc_lock);
+                      if(d->after_restore                  )
+      {
+                           d->after_restore                  =  0;
+                        goto   out;                  //jsong@novell.com
+                      }
      version_update_begin(&shared_info(d,                   
wc_version));
     shared_info(d,                   wc_sec)  =  wc_sec +              
    d->time_offset_seconds;
     shared_info(d,                    wc_nsec) =                  
wc_nsec;
      version_update_end(&shared_info(d,                   
wc_version));
+out:
      spin_unlock(&wc_lock);
 }
 
@@                   -751,7 +754,6                  @@
     u64                   x;
     u32 y,                  _wc_sec,                   _wc_nsec;
     struct                 domain                    *d;
-
     x =                 (secs *  1000000000ULL)  + (u64)nsecs -        
          system_time_base;
     y                  =  do_div(x,  1000000000);
 
@@                 -1050,7 +1052,6   @@
 struct tm                    wallclock_time(void)
 {
     uint64_t                    seconds;
-
     if                 (  !wc_sec                    )
         return                   (struct tm) { 0  };
 
diff -r                 a5ed0dbc829f   xen/common/domctl.c
---                  a/xen/common/domctl.c      Tue                 Nov
18 14:34:14 2008 +0800
+++                    b/xen/common/domctl.c    Fri                 Nov
21  17:34:15 2008  +0800
@@  -24,7 +24,6                 @@
 #include                   <asm/current.h>
 #include                    <public/domctl.h>
 #include                    <xsm/xsm.h>
-
 extern long                    arch_do_domctl(
     struct                  xen_domctl  *op, 
XEN_GUEST_HANDLE(xen_domctl_t)                  u_domctl);
 
@@  -315,6 +314,16                    @@
         ret                  =                    0;
     }
      break;
+                     case XEN_DOMCTL_restoredomain:
+                     {
+                          struct                 domain   *d;
+                         if ( (d  =                  
rcu_lock_domain_by_id(op->domain)) == NULL                   )
+                               break;
+                           
+                          d->after_restore                 =    1;
+                           rcu_unlock_domain(d);
+                           break;
+                     }
 
     case                    XEN_DOMCTL_createdomain:
     {
diff                   -r a5ed0dbc829f                 
xen/include/public/domctl.h
---                    a/xen/include/public/domctl.h                   
Tue Nov 18  14:34:14  2008                  +0800
+++ b/xen/include/public/domctl.h                      Fri Nov 21 
17:34:15 2008                 +0800
@@  -61,6 +61,7  @@
 #define                  XEN_DOMCTL_destroydomain                      
 2
 #define                    XEN_DOMCTL_pausedomain                      
    3
 #define                   XEN_DOMCTL_unpausedomain                     
  4
+#define                   XEN_DOMCTL_restoredomain                     
   51
 #define                   XEN_DOMCTL_resumedomain                      
  27
 
 #define                    XEN_DOMCTL_getdomaininfo                    
  5
diff -r                   a5ed0dbc829f  xen/include/xen/sched.h
---                   a/xen/include/xen/sched.h                     Tue
Nov 18 14:34:14 2008                   +0800
+++  b/xen/include/xen/sched.h                    Fri Nov 21  17:34:15  
2008                 +0800
@@ -231,6 +231,7                   @@
      * cause a                   deadlock.  Acquirers don''t spin
waiting; they                    preempt.
      */
      spinlock_t                   hypercall_deadlock_mutex;
+    int                  after_restore;                   
//jsong@novell.com
 };
 
 struct                    domain_setup_info
---------------------------------------------------------------------------------------------
 Thanks
--Song                    wei
</keir.fraser@eu.citrix.com></kevin.tian@intel.com></kevin.tian@intel.com>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel