Gerd Jakobovitsch
2011-Aug-10 12:24 UTC
[Xen-devel] Tapdisk devices too strongly attached?
Dear all,
I am having problems with tapdisk devices:
- When shutting down the virtual machine, the tapdisk process continues
running, and the device is still present at /sys/class/blktap2. It can
be removed, though, issuing echo 1 >
/sys/class/blktap2/blktap<id>/remove.
- I tried to duplicate the snapshot process implemented in
tools/blktap2/drivers/xmsnap, but using vhd snapshot instead of qcow.
The process seemed to work, but changes continue to be written to the
renamed disk, not to the snapshot. It seems that the tapdisk process
keeps the association to the opened file, even when moving it.
I''m using xen on a CentOS 5 distro, with xen and kernel compiled from
xen''s own baselines. I noticed the same behavior in xen 4.0.2.rc3 /
kernel 2.6.32.36+fix and in xen 4.1.2.rc1-pre / kernel 2.6.32.43.
Info from a xl.log file:
cat /var/log/xen/xl-teste020.log.2
Waiting for domain teste020 (domid 11) to die [pid 7352]
Domain 11 is dead
Unknown shutdown reason code 255. Destroying domain.
Action for shutdown reason code 255 is destroy
Domain 11 needs to be cleaned up: destroying the domain
libxl: error: libxl.c:734:libxl_domain_destroy xc_domain_pause failed for 11
libxl: error: libxl_dm.c:747:libxl__destroy_device_model Couldn''t find
device model''s pid: No such file or directory
libxl: error: libxl.c:738:libxl_domain_destroy
libxl__destroy_device_model failed for 11
libxl: error: libxl_dom.c:603:userdata_path unable to find domain info
for domain 11: No such file or directory
libxl: error: libxl.c:755:libxl_domain_destroy xc_domain_destroy failed
for 11
Done. Exiting now
As a hint, some months ago I posted at xen-devel a bug report related to
tapdisk failures, which was solved with a fix related to spinlocks,
recently delivered to 2.6.32 pvops kernel baseline. At that point,
Daniel Stodden, who identified the needed fix, wrote:
"It''s the only pending bugfix, quite an obvious one actually.
It''s been
rare enough *unless provoked like Gerd did*, but we found it first in
XCP so it actually tends to happen."
Actually, I''m not sure how I could be provoking any different behavior
from tapdisk, but it seems that some configuration I''m using is leading
tapdisk to some unexpected behavior.
The whole message exchange:
On Thu, 2011-04-14 at 12:38 -0400, Daniel Stodden wrote:
> > On Thu, 2011-04-14 at 09:15 -0400, Konrad Rzeszutek Wilk wrote:
>> > > On Wed, Apr 13, 2011 at 06:02:13PM -0300, Gerd
Jakobovitsch wrote:
>>> > > > I''m trying to run several VMs (linux
hvm, with tapdisk:aio disks at
>>> > > > a storage over nfs) on a CentOS system, using
the up-to-date version
>>> > > > of xen 4.0 / kernel pvops 2.6.32.x stable.
With a configuration
>>> > > > without (most of) debug activated, I can
start several instances -
>>> > > > I''m running 7 of them - but shortly
afterwards the system stops
>>> > > > responding. I can''t find any
information on this.
>> > >
>> > > First time I see it.
>>> > > >
>>> > > > Activating several debug configuration items,
among them
>>> > > > DEBUG_PAGEALLOC, I get an exception as soon
as I try to start up a
>>> > > > VM. The system reboots.
>> > >
>> > > Oooh, and is the log below from that situation?
>> > >
>> > > Daniel, any thoughs?
> >
> > ---
> > Unmap pages from the kernel linear mapping after free_pages().
> > This results in a large slowdown, but helps to find certain
types
> > of memory corruption.
> >
> > Stunning. Our I/O page allocator is a sort of twisted mempool.
Unless
> > the allocation is explicitly modified in sysfs/, everything
should stay
> > pinned. We might be just tripping over debug code alone, but I
didn''t
> > figure it out yet.
Ah, that''s just missing Dominic''s spinlock fix.
http://xenbits.xen.org/gitweb/?p=people/dstodden/linux.git;a=commit;h=a765257af7e28c41bd776c3e03615539597eb592
Daniel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel