I don’t know if this is a drbd or a xen issue so I will post this on both lists. I have 2 HP Proliant servers with SLES10 SP2 installed. There are 8 disks on the system, 2 of the disks are in a hardware raid1 and the rest is raid5, also hardware. The servers have several network cards; one is exclusively for drbd replication. I use the drbd 0.7.22-42 that ships with SLES10 and kernel 2.6.16.60-0.23-xen. The Xen host is installed on a non-lvm2 partition of 20 GB on the raid1. The rest of the raid1 are a volume group called system. I have several logical volumes on the volume group system, called mail-root, mail-swap, webmail-root, webmail-swap, etc. The *-root lvs are to be the drbd devices where the domUs systems are going to be installed. The *-swap are just for swap which I don’t want to replicate. I have tested the setup without the domUs running but with mounted drbd devices and pulled the network cable that manages the replication. Everything works as it should; both servers continue working and replication starts as soon as I connect the cable again. I have tested to install the domUs on file systems on the drbd devices, i.e. with the drbd devices mounted and it also works as it should. But when I install the domUs on physical devices /dev/drbdx for the domUs and pull the network cable between the replicating nics the Xen host with the domUs running hangs. Only a cold boot gets it started again. The replication works as it should as long as the cable is there or as long as both servers are up, everything gets replicated through the dedicated nic. But if I reboot the passive server the active server hangs, i.e. the same situation as when I pull the network cable. I have all the domUs on one of the servers, the other server is just passive. I haven''t installed heart beat to avoid adding complexity to the situation. Can somebody help me figure out why it I can’t get it to work when the domUs are installed on physical devices? -- Gabriele Kalus IT-Manager Lund University, Physics Department _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hejsan Garbriele, Just trying to get clear on when you are having this issue. So you basically experience this when your DomU is directly using the DRBD device? I assume you use "drbd:RESOURCE" on the disk config line? Have you tried if you get the same behaviour with "phy:/dev/drbdx"? Do you see anything in the log files when you pull the cable or does it hang instantly? //Daniel> > I have tested to install the domUs on file systems on the drbd devices, i.e. with the drbd devices mounted and it also works as it should. > > But when I install the domUs on physical devices /dev/drbdx for the domUs and pull the network cable between the replicating nics the Xen host with the domUs running hangs._______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi Daniel, I get this behaviour when the DomUs are installed on a drbd device as a block device and the DomUs are active on them. I can''t use the "drbd:RESOURCE" since I''m using the drbd v0.7.22 shipped with SLES10 and not drbd 8.x. I am using "phy:/dev/drbdx" when the Xen host dies on me. I don''t get anything in the log files since the Xen host hangs instantly, I don''t get anything in the passive Xen host either, except some normal lines that it lost contact with the other Xen host. Jun 13 11:20:44 vh2 kernel: drbd0: PingAck did not arrive in time. Jun 13 11:20:44 vh2 kernel: drbd0: drbd0_asender [17646]: cstate Connected --> NetworkFailure Jun 13 11:20:44 vh2 kernel: drbd0: asender terminated Jun 13 11:20:44 vh2 kernel: drbd0: drbd0_receiver [17590]: cstate NetworkFailure --> BrokenPipe Jun 13 11:20:44 vh2 kernel: drbd0: short read expecting header on sock: r=-512 Jun 13 11:20:44 vh2 kernel: drbd0: worker terminated Jun 13 11:20:44 vh2 kernel: drbd0: drbd0_receiver [17590]: cstate BrokenPipe --> Unconnected Jun 13 11:20:44 vh2 kernel: drbd0: Connection lost. Jun 13 11:20:44 vh2 kernel: drbd0: drbd0_receiver [17590]: cstate Unconnected --> WFConnection etc Gabriele Daniel Asplund wrote:> Hejsan Garbriele, > > Just trying to get clear on when you are having this issue. So you > basically experience this when your DomU is directly using the DRBD > device? I assume you use "drbd:RESOURCE" on the disk config line? Have > you tried if you get the same behaviour with "phy:/dev/drbdx"? > > Do you see anything in the log files when you pull the cable or does > it hang instantly? > > //Daniel > > > >> I have tested to install the domUs on file systems on the drbd devices, i.e. with the drbd devices mounted and it also works as it should. >> >> But when I install the domUs on physical devices /dev/drbdx for the domUs and pull the network cable between the replicating nics the Xen host with the domUs running hangs. >> > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users >-- Gabriele Kalus, Ph.D. IT-Manager Lund University, Physics Department Box 118 SE-22100 Lund, SWEDEN Phone: +46-462229675 Mobil: 0702-901227 Fax: +46-462224709 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
What other troubleshooting have you performed? Did you try DRBD over another NIC? Has this setup never worked on this hardware or did the problem appear after an upgrade? I have never used SLES and also never v0.7 drbd, only debian and ubuntu. But I guess a basic config like this should work in any system... ;) But have you checked the bug reports if something is listed? What version of Xen are you using? //Daniel On Sun, Jun 15, 2008 at 8:39 AM, Gabriele kalus <gabriele.kalus@fysik.lu.se> wrote:> Hi Daniel, > > I get this behaviour when the DomUs are installed on a drbd device as a > block device and the DomUs are active on them. I can''t use the > "drbd:RESOURCE" since I''m using the drbd v0.7.22 shipped with SLES10 and not > drbd 8.x. I am using "phy:/dev/drbdx" when the Xen host dies on me. > > I don''t get anything in the log files since the Xen host hangs instantly, I > don''t get anything in the passive Xen host either, except some normal lines > that it lost contact with the other Xen host. > > Jun 13 11:20:44 vh2 kernel: drbd0: PingAck did not arrive in time. > Jun 13 11:20:44 vh2 kernel: drbd0: drbd0_asender [17646]: cstate Connected > --> NetworkFailure > Jun 13 11:20:44 vh2 kernel: drbd0: asender terminated > Jun 13 11:20:44 vh2 kernel: drbd0: drbd0_receiver [17590]: cstate > NetworkFailure --> BrokenPipe > Jun 13 11:20:44 vh2 kernel: drbd0: short read expecting header on sock: > r=-512 > Jun 13 11:20:44 vh2 kernel: drbd0: worker terminated > Jun 13 11:20:44 vh2 kernel: drbd0: drbd0_receiver [17590]: cstate BrokenPipe > --> Unconnected > Jun 13 11:20:44 vh2 kernel: drbd0: Connection lost. > Jun 13 11:20:44 vh2 kernel: drbd0: drbd0_receiver [17590]: cstate > Unconnected --> WFConnection > > etc > > Gabriele >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users