Hi all I setup two Xen servers with Debian etch. Both servers have 800gb of storage. I decided to split this storage into two partitions (each with 400gb) and mirror the two partitions through drbd. I created a LVM PV/VG on top of each drbd block device. I also added Heartbeat to the setup with some self made scripts. As a result I now have two Xen servers, each one being primary for one of the two drbd devices and some DomUs. In case of a failure the fail-over works fine, resulting on one server taking over all the DomUs. During testing I found out that it''s easily possible to crash a server if I run bonnie++ in one of the DomUs. The DomU has 300MB RAM assigned and I invoke: bonnie++ -d . -s 1024 -u nobody Several seconds after the command my whole Xen server (Dom0 + DomUs) just hangs. I''m not able to enter any more commands on the local console, neither am I able to login through SSH. The machine still answers Heartbeat packets. The same happens if I run other, rather disk I/O intensive, benchmarking tools inside the DomU. It looks like the problem is specific to running the DomU on a VG that runs on drbd. I tried several other cases with the same benchmarking tool invocations: - DomU with an image file stored on the servers /-filesystem (which is not inside LVM and drbd): no crash - running the benchmarking tools on a filesystem created on a LV and mounted in the Dom0: no crash - running the benchmarking tools on a DomU running on LVM without drbd: no crash Somehow, only the combination drbd + LVM + DomU running in on a VG there seems to trigger this problem. Has anyone of you experienced the same problem? Currently, I''m a bit confused about how to further debug this problem. Does anyone of you has advice about how to further debug this problem (ie, adding some debug switches to Xen to see why the whole machine crashes)? Regards, Thomas. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Sat, Aug 18, 2007 at 08:58:32AM +0200, Thomas Bader wrote:> Somehow, only the combination drbd + LVM + DomU running in > on a VG there seems to trigger this problem. Has anyone of > you experienced the same problem?I''m not sure if that is the same problem, but there was a discussion on drbd just a couple days ago. Try replacing "phy:" prefix in your domU configs with "tap:aio:". -- Marcin Owsiany <marcin@owsiany.pl> http://marcin.owsiany.pl/ GnuPG: 1024D/60F41216 FE67 DA2D 0ACA FC5E 3F75 D6F6 3A0D 8AA0 60F4 1216 "Every program in development at MIT expands until it can read mail." -- Unknown _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
* Marcin Owsiany <marcin@owsiany.pl> [070818 11:14]:> On Sat, Aug 18, 2007 at 08:58:32AM +0200, Thomas Bader wrote: > > Somehow, only the combination drbd + LVM + DomU running in > > on a VG there seems to trigger this problem. Has anyone of > > you experienced the same problem? > > I''m not sure if that is the same problem, but there was a discussion > on drbd just a couple days ago. Try replacing "phy:" prefix in your domU > configs with "tap:aio:".Good hint! I searched a bit about that and found many similar postings. Unfortunatly, I was yet not able to try the above mentioned workaround, since Debian Etch packages of Xen ship without blktap driver: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=402969 Looks like I would need to build my own packages to try the workaround - or did anyone need blktap too and has found out a shorter way to integrate blktap? Regards, Thomas. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Thomas Bader wrote:> Looks like I would need to build my own packages to try the > workaround - or did anyone need blktap too and has found out > a shorter way to integrate blktap? >I''ve tried about a dozen ways.. the problem seems to be that blktap needs a patch that was rejected by the kernel maintainers in favour of an other incompatible one (and I guess that''s debians problem - although from the response you got from the maintainer it sounds more like they just don''t give a crap). I''ve managed to get a debianised xen kernel as far as 2.6.21 (basically by cannibalising patches from the redhat kernels) but blktap just doesn''t work.. ''Couldn''t get fd for AIO poll support'' Maybe I''ll work out how to apply http://lists.xensource.com/archives/html/xen-devel/2007-04/txtj55WHkYDWz.txt - none of my kernels have this file though so it could be tricky, and I don''t really have the time to fart around with kernel builds that much. Might just wait until xen is in the kernel properly. Tony _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Tue, Aug 21, 2007 at 10:03:39PM +0100, Tony Hoyle wrote:> Thomas Bader wrote: > > >Looks like I would need to build my own packages to try the > >workaround - or did anyone need blktap too and has found out > >a shorter way to integrate blktap? > > > I''ve tried about a dozen ways.. the problem seems to be that blktap > needs a patch that was rejected by the kernel maintainers in favour of > an other incompatible one (and I guess that''s debians problem - although > from the response you got from the maintainer it sounds more like they > just don''t give a crap). I''ve managed to get a debianised xen kernel as > far as 2.6.21 (basically by cannibalising patches from the redhat > kernels) but blktap just doesn''t work.. ''Couldn''t get fd for AIO poll > support''The problem is that Xen includse code in userspace that is dependant on features which don''t exist in the LKML kernels. For Fedora we ripped this code out of Xen userspace & replaced it with code which works against what is actually available in the kernel. That patch is now merged in the 3.1 release of Xen userspace IIRC. The changeset you need in userspace is changeset: 15385:eeeb77195ac2 user: kfraser@localhost.localdomain date: Tue Jun 19 16:32:28 2007 +0100 files: tools/blktap/drivers/Makefile tools/blktap/drivers/block-aio.c tools/blktap/drivers/block-qcow.c tools/blktap/drivers/tapaio.c tools/blktap/drivers/tapaio.h description: blktap: Add fallback code to blktap for missing poll-on-aio support. blktap requires a xen specific kernel AIO ABI which has been vetoed by upstream in favour of another approach. Rather than include this ABI, Fedora has been carrying a patch which makes tap:aio use a thread to poll for aio events and notify the main thread via a pipe. The upstream approach of allowing io_getevents() poll normal file descriptors via epoll is still progressing: http://lkml.org/lkml/2007/1/3/16 but when that does make it upstream, blktap will require significant re-working to use that approach. In the meantime, here''s a patch which uses the poll-in-a-thread approach only if AIO poll support isn''t available. It also hides the details behind a simple abstraction and makes both tap:aio and tap:qcow use it. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Artur Linhart - Linux communication
2007-Aug-27 20:22 UTC
RE: [Xen-users] Xen + LVM + drbd crash
Hi Thomas, Maybe it is a crazy idea, but - how many memory do You use for Dom0? If the LVM is used - as I''ve read here some month ago - there should be enough memory in Dom0... Possibly in the combination also with the memory needs for drbd, this could lead to some memory shortage, which could lead to strange behavior... Only idea, if there would be no other ;-) Good luck Artur. -----Original Message----- From: xen-users-bounces@lists.xensource.com [mailto:xen-users-bounces@lists.xensource.com] On Behalf Of Thomas Bader Sent: Saturday, August 18, 2007 8:59 AM To: xen-users@lists.xensource.com Subject: [Xen-users] Xen + LVM + drbd crash Hi all I setup two Xen servers with Debian etch. Both servers have 800gb of storage. I decided to split this storage into two partitions (each with 400gb) and mirror the two partitions through drbd. I created a LVM PV/VG on top of each drbd block device. I also added Heartbeat to the setup with some self made scripts. As a result I now have two Xen servers, each one being primary for one of the two drbd devices and some DomUs. In case of a failure the fail-over works fine, resulting on one server taking over all the DomUs. During testing I found out that it''s easily possible to crash a server if I run bonnie++ in one of the DomUs. The DomU has 300MB RAM assigned and I invoke: bonnie++ -d . -s 1024 -u nobody Several seconds after the command my whole Xen server (Dom0 + DomUs) just hangs. I''m not able to enter any more commands on the local console, neither am I able to login through SSH. The machine still answers Heartbeat packets. The same happens if I run other, rather disk I/O intensive, benchmarking tools inside the DomU. It looks like the problem is specific to running the DomU on a VG that runs on drbd. I tried several other cases with the same benchmarking tool invocations: - DomU with an image file stored on the servers /-filesystem (which is not inside LVM and drbd): no crash - running the benchmarking tools on a filesystem created on a LV and mounted in the Dom0: no crash - running the benchmarking tools on a DomU running on LVM without drbd: no crash Somehow, only the combination drbd + LVM + DomU running in on a VG there seems to trigger this problem. Has anyone of you experienced the same problem? Currently, I''m a bit confused about how to further debug this problem. Does anyone of you has advice about how to further debug this problem (ie, adding some debug switches to Xen to see why the whole machine crashes)? Regards, Thomas. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users __________ Informace od NOD32 2468 (20070817) __________ Tato zprava byla proverena antivirovym systemem NOD32. http://www.nod32.cz _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users