thr3ads.net - Xen devel - [Xen-devel] Why I cannot reconnect blk backend [Nov 2008]

If this information is useful, please help other people find it:
Share via:

Wayne Gong

2008-Nov-19 09:41 UTC

[Xen-devel] Why I cannot reconnect blk backend

Hi,

I am implementing save/restore feature for windows pv driver. After 
system resumed, my frontend blk driver cannot connect backend. Here is 
my work flow:

1, Set frontend state to XenbusStateClosing, then set state to 
XenbusStateClosed. At the end, set frontend state to XenbusStateInitiazing.
2, Release blk device relevant memory.
3, Shutdown xenbus and release relevant memory.
4, Call hypercall to let system suspend.
......
(resume)
5. Reinit xenbus.
6. Get grant table, init share ring, allocate event channel for blk device.
7. Set frontend state to XenbusStateConnected. <-- got issue here.

When set frontend state to connect, my backend state watcher tell me 
that backend state changed to closing and then closed.

I am not very family with blk backend driver, so I want to know which 
scenarios cause blk backend state change to closing/closed. Any info I 
gave to xenstore is wrong?

I use xen 3.1.3, win2k3. I can provide me info if you need.

Thanks
Wayne


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

James Harper

2008-Nov-19 09:57 UTC

head link

RE: [Xen-devel] Why I cannot reconnect blk backend

> 
> Hi,
> 
> I am implementing save/restore feature for windows pv driver. After
system> resumed, my frontend blk driver cannot connect backend. Here is my
work> flow:
> 
> 1, Set frontend state to XenbusStateClosing, then set state to
> XenbusStateClosed. At the end, set frontend state to
> XenbusStateInitiazing.
> 2, Release blk device relevant memory.
> 3, Shutdown xenbus and release relevant memory.
> 4, Call hypercall to let system suspend.
> ......
> (resume)
> 5. Reinit xenbus.
> 6. Get grant table, init share ring, allocate event channel for blk
> device.
> 7. Set frontend state to XenbusStateConnected.  <-- got issue here.
> 
> When set frontend state to connect, my backend state watcher tell me
that> backend state changed to closing and then closed.
> 
> I am not very family with blk backend driver, so I want to know which
> scenarios cause blk backend state change to closing/closed. Any info I
> gave to xenstore is wrong?
> 
> I use xen 3.1.3, win2k3. I can provide me info if you need.
> 
Assuming your source is still mostly based on an earlier version of
mine, have a look at the current hg tree - save/restore is all working
there again as of a week or so ago.

It is possible to turn tracing on for xenstore too - under Debian I
added "export XENSTORED_TRACE=1" so it looks like this:

case "$1" in
  start)
        export XENSTORED_TRACE=1
        [ -d "$XENSTORED_RUN_DIR" ] || mkdir -p
"$XENSTORED_RUN_DIR"
        log_daemon_msg "Starting $DESC" "xend"
        start && log_end_msg 0 || log_end_msg 1
        ;;

That way you can watch exactly what goes on between the frontend and
backend, including all the state transitions.

How are you communicating with the backend driver during restore? In my
first version of xenvbd I relied on the fact that some of the init calls
to the scsiport driver were called at PASSIVE_LEVEL and so it was safe
to make calls to the xenbus routine. Once things are up and running
though, all scsiport code runs at DIRQL (hardware IRQ level) and you
can''t call any xenbus code from there as it involves
KeWaitForSingleObject etc.

To work around that I make the pci driver put xenvbd into a mode where
it doesn''t process anything (set a flag and fire an irq to xenvbd then
wait for an acknowledgement), and the pci driver itself does all the
xenbus setup for xenvbd, and then enables xenvbd again via the same
mechanism. It works well and the scsiport driver can act more like a
physical hardware device driver - it doesn''t need to know anything
about
xenbus etc.

Windows is a bit of a pain to work with sometimes - it has better
documentation but its limits are absolutely set in stone!

James


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Wayne Gong

2008-Nov-19 14:26 UTC

head link

Re: [Xen-devel] Why I cannot reconnect blk backend

> Assuming your source is still mostly based on an earlier version of
> mine, have a look at the current hg tree - save/restore is all working
> there again as of a week or so ago.
>   I get what you have done for save/restore in current hg tree. My design 
have some differences.> How are you communicating with the backend driver during restore? In my
> first version of xenvbd I relied on the fact that some of the init calls
> to the scsiport driver were called at PASSIVE_LEVEL and so it was safe
> to make calls to the xenbus routine. Once things are up and running
> though, all scsiport code runs at DIRQL (hardware IRQ level) and you
> can''t call any xenbus code from there as it involves
> KeWaitForSingleObject etc.
>
> To work around that I make the pci driver put xenvbd into a mode where
> it doesn''t process anything (set a flag and fire an irq to xenvbd
then
> wait for an acknowledgement), and the pci driver itself does all the
> xenbus setup for xenvbd, and then enables xenvbd again via the same
> mechanism. It works well and the scsiport driver can act more like a
> physical hardware device driver - it doesn''t need to know anything
about
> xenbus etc.
>
> Windows is a bit of a pain to work with sometimes - it has better
> documentation but its limits are absolutely set in stone!In my implementation, I share two kernel event between pci and vbd 
driver. One is suspend event and the other is resume event.

To suspend:
Pci driver and vbd driver are both register watcher for 
''control/shutdown''. When they get ''suspend'',
pci driver will initialize
suspend event and use this event to wait for all child device change 
state to suspend. Meanwhile, when vbd/vif driver get
''suspend'', it will
set a flat to fail all scsi command and interrupt. (I know this approach 
needs to improve since StartIo/Interrupt routine IRQL are higher then 
watcher routine.) After that, vbd driver will set frontend state to 
closing->closed->initializing.Then it will set suspend event to notify 
pci driver that this child device has prepared to suspend.  Then vbd 
''control/shutdown'' watcher will wait for resume event to clean
flag to
let driver can process interrupt and scsi command.  <---- this area woks 
fine, blk backend state can change state properly.
When pci driver realize that all child device suspend, it will terminal 
some thread, shutdown xenbus, disable/clean event channel, then call 
hypercall to let system suspend.   <---this area also works fine. System 
can suspend succeed.

To resume:
When system resume, pci driver will reinitialize xenbus, enable event 
channel,.... Then set resume event to notify all child device to change 
state to resume. When vbd driver get resume event, it will reinitialize 
vbd device. Allocate memory, get event channel, get grant entry, 
initialize share ring. Then it will try to set frontend state to 
connected to notify backend that frontend are prepared to work. If 
succeed, it will clean flag to let vbd driver can deal with interrupt 
and scsi command. Now, I get a issue when set frontend state to 
connected. Blk backend will change state to closing and then closed 
instead of connect.

Another info, after resume, vbd driver can read/write/watch xenstore key 
properly. It also can get event channel, get grant entry and initialize 
share ring and write proper info to xenstore. The key problem is that I 
don''t know why blk backend change state to closing/closed. I check 
xenstore and find that ring-ref and event channel in DomU looks like OK. 
Where can I get some log/info to figure out why?

And any guys who family with blk backend drivers, please give me some 
guide/info about which case will cause blk backend change state to 
InitializingWait to closing/closed. Does any info I write to xenstore I 
need to check?

Thanks,
Wayne


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Wayne Gong

2008-Nov-19 14:34 UTC

head link

Re: [Xen-devel] Why I cannot reconnect blk backend

And I from your code in xenpci_pdo.c as follow:

    if (XenPci_ChangeFrontendState(xppdd, XenbusStateConnected, 
XenbusStateConnected, 30000) != STATUS_SUCCESS)
    {
      _// this is definitely an unrecoverable situation..._
      KdPrint((__DRIVER_NAME "     Failed to change frontend state to 
connected\n"));
      FUNCTION_ERROR_EXIT();
      return STATUS_UNSUCCESSFUL;
    }

I think I got this situation. How did you avoid that?

Thanks,
Wayne


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

James Harper

2008-Nov-19 22:27 UTC

head link

RE: [Xen-devel] Why I cannot reconnect blk backend

> And I from your code in xenpci_pdo.c as follow:
> 
>     if (XenPci_ChangeFrontendState(xppdd, XenbusStateConnected,
> XenbusStateConnected, 30000) != STATUS_SUCCESS)
>     {
>       // this is definitely an unrecoverable situation...
>       KdPrint((__DRIVER_NAME "     Failed to change frontend state to
> connected\n"));
>       FUNCTION_ERROR_EXIT();
>       return STATUS_UNSUCCESSFUL;
>     }
> 
> I think I got this situation. How did you avoid that?
> 
Wayne,

All I can suggest is that your turn on the logging in xenstore and watch
what happens - looking for differences between what happens then xenvbd
first starts up and when it resumes.

James

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Wayne Gong

2008-Nov-20 05:56 UTC

head link

Re: [Xen-devel] Why I cannot reconnect blk backend

> All I can suggest is that your turn on the logging in xenstore and watch
> what happens - looking for differences between what happens then xenvbd
> first starts up and when it resumes.blkback: ring-ref 335, event-channel 5, protocol 1 (unspecified, 
assuming native)
vbd vbd-3-768: 1 mapping ring-ref 335 port 5
blkback: ring-ref 335, event-channel 5, protocol 1 (unspecified, 
assuming native)
vbd vbd-3-768: 1 mapping ring-ref 335 port 5
blkback: ring-ref 335, event-channel 5, protocol 1 (unspecified, 
assuming native)
vbd vbd-3-768: 1 mapping ring-ref 335 port 5

What''s this mean?
I guess when vbd device resume, I use a mapped ring-ref to initialize 
device. When vbd device resume, I allocate some new memory and call 
SHARED_RING_INIT to init them and then save the ring-ref to xenstore. 
After that, xenstore pops a warning as above. How can I release that 
ring-ref when suspend vbd device, or how to reallocate a new ring-ref 
for vbd device?

Thanks
Wayne

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

James Harper

2008-Nov-20 10:25 UTC

head link

RE: [Xen-devel] Why I cannot reconnect blk backend

> 
> > All I can suggest is that your turn on the logging in xenstore and
watch> > what happens - looking for differences between what happens then
xenvbd> > first starts up and when it resumes.
> blkback: ring-ref 335, event-channel 5, protocol 1 (unspecified,
> assuming native)
> vbd vbd-3-768: 1 mapping ring-ref 335 port 5
> blkback: ring-ref 335, event-channel 5, protocol 1 (unspecified,
> assuming native)
> vbd vbd-3-768: 1 mapping ring-ref 335 port 5
> blkback: ring-ref 335, event-channel 5, protocol 1 (unspecified,
> assuming native)
> vbd vbd-3-768: 1 mapping ring-ref 335 port 5
> 
> What''s this mean?
> I guess when vbd device resume, I use a mapped ring-ref to initialize
> device. When vbd device resume, I allocate some new memory and call
> SHARED_RING_INIT to init them and then save the ring-ref to xenstore.
> After that, xenstore pops a warning as above. How can I release that
> ring-ref when suspend vbd device, or how to reallocate a new ring-ref
> for vbd device?
> 
Not sure what that means, but that''s not the logging I was referring
to.
Did you follow the instructions in my previous email?

James

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Wayne Gong

2008-Nov-20 11:27 UTC

head link

Re: [Xen-devel] Why I cannot reconnect blk backend

> Not sure what that means, but that''s not the logging I was
referring to.
> Did you follow the instructions in my previous email?No, I am not exactly know what''s your mean in that mail. I just get
some
help from our team to let then open xenstore trace log.  So I will try 
to use your guide to get more trace info.
Another way, how do you think about the ring-ref for a vbd device when 
save and restore?

Thanks
Wayne


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Nov 2008 - Why I cannot reconnect blk backend

[Xen-devel] Why I cannot reconnect blk backend

RE: [Xen-devel] Why I cannot reconnect blk backend

Re: [Xen-devel] Why I cannot reconnect blk backend

Re: [Xen-devel] Why I cannot reconnect blk backend

RE: [Xen-devel] Why I cannot reconnect blk backend

Re: [Xen-devel] Why I cannot reconnect blk backend

RE: [Xen-devel] Why I cannot reconnect blk backend

Re: [Xen-devel] Why I cannot reconnect blk backend