Shriram Rajagopalan
2012-Feb-16 18:06 UTC
Re: Remus crashes only with Windows Server 2003 - tap2 issue
On Sat, Feb 11, 2012 at 5:17 PM, Antonio Colin <dftonywhite@hotmail.com> wrote:> > PS: If you need further information or want me to test something please let me know. > > Tony. > > ________________________________ > From: rshriram@cs.ubc.ca > > Date: Fri, 10 Feb 2012 11:52:04 -0800 > To: dftonywhite@hotmail.com > CC: xen-users@lists.xensource.com > Subject: Re: [Xen-users] Remus crashes only with Windows Server 2003 > > On Thu, Feb 9, 2012 at 10:29 AM, Antonio Colin <dftonywhite@hotmail.com> wrote: > > Hi again Shriram, > > Thank you for your reply and explanation. You are right I need a different port, may be 9001 in that case, but see... > That was the full test but in fact I tested everything with one disk "(Unit C:)" and the same thing happens... if you think > that doing it that way would save more useful information in the logs I can save them again :). > > The NFS mount is in /mnt/domus only to begin testing remus. I put one VM image there... start remus with --no-net and everything is fine. > The directory /home/remus is just to work with remus and disk replication and is not and NFS mount. > > It is so strange that it works only for Linux!! (both are HVM) > > And yes, if that directory was shared that might corrupt my disk and I also need DRBD to replicate the image... is that possible for img files? > and just one last question... after failover how can I get back the execution of the VM from the backup to the primary host once it is ready ? > > > Let me investigate the blktap2 issue first. > DRBD does not replicate img files. You would have to put them in a partition or lvm volume and > replicate that volume to the backup host. Whether you want to write the image directly to the volume or > create a File system in that volume and drop the image file there, is upto you. > > shriram > > Thank you so much!!! > > Tony. > > > ________________________________ > From: rshriram@cs.ubc.ca > Date: Thu, 9 Feb 2012 00:35:15 -0800 > > Subject: Re: [Xen-users] Remus crashes only with Windows Server 2003 > To: dftonywhite@hotmail.com > CC: xen-users@lists.xensource.com > > > On Wed, Feb 8, 2012 at 1:56 AM, Antonio Colin <dftonywhite@hotmail.com> wrote: > > Hello Shriram, > > Just comming back to Remus HA, three weeks ago I sent this thread and the situation hasn''t changed. You are right, > remus works properly with --no-net option. > > There is actually this tapdisk related error in the syslog file in the primary host: > Jan 17 17:28:58 xen-backup tapdisk2[5795]: remus: could not bind server socket 11 to 192.168.2.4:9000: 98 Address already in use > > > Thanks for the logs. > The first thing that pops out is: > [''tap2'', [''uname'', ''tap2:remus:192.168.2.4:9000|aio:/home/remus/win2k3-exchange.img''], [''dev'', ''ioemu:hda''], [''mode'', ''w'']], > [''tap2'', [''uname'', ''tap2:remus:192.168.2.4:9000|aio:/home/remus/win2k3-exchange-d.img''], [''dev'', ''ioemu:hdb''], [''mode'', ''w'']], > > You have two tapdisk devices, but on the same port ? Each disk needs a different port, as a tcp connection is > established between primary and backup for each replicated disk. > > > > Also when I boot up the VM (Windows Server 2003) from NFS > > > from NFS ? just to make sure that we are on same page, is the above directory /home/remus an NFS mount ? > i.e. is that win2k3-exchange.img "shared" between the primary and backup host ? > If so, then remus disk replication will not work, as its based on a shared-nothing model. > In fact, it could corrupt your disk badly. If disk consistency is not an issue, then you are better off > running remus without disk replication (though there is no guarantee that the domain will failover properly). > > > > and without remus or disk replication, in both the primary and the backup > there is in fact a vif attached to it which is bind to the bridge in the two cases. > I have the sch_plug module installed correctly in both hosts and everything works perfect for Linux systems. > > > Oh great. So network buffering is out of the picture. If it works for linux, it should work for windows too. > > > But it just cannot come true > for Windows. > > I attach xend.log and syslog from primary and backup if you''d like to see further information in order to help me. > > Thank you a lot!! > > Tony. > > > From: rshriram@cs.ubc.ca > > Date: Fri, 13 Jan 2012 09:54:35 -0800 > > To: xen-users@lists.xensource.com > > CC: dftonywhite@hotmail.com > > Subject: Re: [Xen-users] Remus crashes only with Windows Server 2003 > > > > > On Fri, Jan 13, 2012 at 9:05 AM, <xen-users-request@lists.xensource.com> wrote: > > > I have setup Remus on Debian Squeeze and kernel 3.1.5. Remus and disk replication works perfect for Ubuntu systems, > > > but when I start Remus for Windows Sever 2003 (running Microsoft Exchange Enterprise 2003) it crashes giving the > > > following error: > > > > > > > Is that Ubuntu VM a PV or HVM ? > > I presume that remus with --no-net works properly ? > > > > > root@neutrino:~/working-remus# xm create exchange-hvm.cfg > > > root@neutrino:~/working-remus# remus exchange-hvm 192.168.2.4 > > > qemu logdirty mode: enable > > > xc: error: Error when writing to state file (4a) (errno 104) (104 = Connection reset by peer): Internal error > > > qemu logdirty mode: disable > > > PROF: resumed at 1326315866.106150 > > > resuming QEMU > > > tc filter del dev vif3.0 parent ffff: proto ip pref 10 u32 > > > RTNETLINK answers: Invalid argument > > > We have an error talking to the kernel > > > Exception xen.remus.util.PipeException: PipeException(''tc failed: 2, No such file or directory'',) in <bound method BufferedNIC.__del__ of <xen.remus.device.BufferedNIC object at 0x24b7510>> ignored > > > > This error tells me nothing. "Connection reset by peer" could result > > from a lot of issues. > > A. check the syslog in primary and backup, for errors related to tapdisk > > B. Check the xend.log file in backup > > C. If your system works with --no-net, then try to boot up the VM > > without remus, and make sure that > > there is a vif interface for the VM. And make sure that interface is > > on the bridge (if you have bridging enabled). > > Remus tries to install a network buffer (sch_plug) to the vif interface. > > > > > > > > > root@neutrino:~/working-remus# > > > > > > It seems that on the backup remus or Xen cannot assign a vif1.0 to the DomU since #ifconfig -a doesn''t show a new vif there > > > when starting remus. > > > > > > Any help would be highly appreciated! > > > > > > Tony. > > > > _______________________________________________ > > Xen-users mailing list > > Xen-users@lists.xensource.com > > http://lists.xensource.com/xen-users > > > > > _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-usersTony & Dimitrios, Both of you seem to have faced issues with blktap2 based disk replication, while running remus. If you are interested in gettting blktap2 based replication running, can you guys try the patch below and let me know if it resolves the issue ? The patch basically increases the timeouts on both the disk and memory checkpoint receivers (block-remus.c & xc_domain_restore.c respectively) I have tested Remus on a Windows 7 HVM with blktap2 based replication (tap2:remus:<host>:<port>|aio:... format) Things seemed to run fine. shriram --- diff -r 34dec1562a45 tools/blktap2/drivers/block-remus.c --- a/tools/blktap2/drivers/block-remus.c Sat Jun 18 20:52:33 2011 -0700 +++ b/tools/blktap2/drivers/block-remus.c Sat Jun 18 20:52:43 2011 -0700 @@ -59,7 +59,7 @@ #include <sys/stat.h> /* timeout for reads and writes in ms */ -#define HEARTBEAT_MS 1000 +#define HEARTBEAT_MS 5000 #define RAMDISK_HASHSIZE 128 /* connect retry timeout (seconds) */ diff -r 34dec1562a45 tools/libxc/xc_domain_restore.c --- a/tools/libxc/xc_domain_restore.c Sat Jun 18 20:52:33 2011 -0700 +++ b/tools/libxc/xc_domain_restore.c Sat Jun 18 20:52:43 2011 -0700 @@ -47,7 +47,7 @@ struct domain_info_context dinfo; }; -#define HEARTBEAT_MS 1000 +#define HEARTBEAT_MS 5000 #define SUPERPAGE_PFN_SHIFT 9 #define SUPERPAGE_NR_PFNS (1UL << SUPERPAGE_PFN_SHIFT) diff -r 34dec1562a45 tools/python/xen/lowlevel/checkpoint/libcheckpoint.c --- a/tools/python/xen/lowlevel/checkpoint/libcheckpoint.c Sat Jun 18 20:52:33 2011 -0700 +++ b/tools/python/xen/lowlevel/checkpoint/libcheckpoint.c Sat Jun 18 20:52:43 2011 -0700 @@ -504,7 +504,7 @@ FD_ZERO(&rfds); FD_SET(fd, &rfds); - tv.tv_sec = 0; + tv.tv_sec = 5; tv.tv_usec = 500000; rc = select(fd + 1, &rfds, NULL, NULL, &tv);
Antonio Colin
2012-Feb-27 20:38 UTC
Re: Remus crashes only with Windows Server 2003 - tap2 issue
Hello Shriram, Thanks so much for your patch, I have been trying to apply it but there is a problem when doing it, here I send you the errors thrown. Any advice on how to do it properly?? Thanks a lot! Tony. ---- root@neutrino:~/xen-4.1.1# xm list Name ID Mem VCPUs State Time(s) Domain-0 0 2649 1 r----- 88.2 root@neutrino:~/xen-4.1.1# patch -p0 < timeouts.patch (Stripping trailing CRs from patch.) patching file b/tools/blktap2/drivers/block-remus.c Hunk #1 FAILED at 59. 1 out of 1 hunk FAILED -- saving rejects to file b/tools/blktap2/drivers/block-remus.c.rej (Stripping trailing CRs from patch.) patching file b/tools/libxc/xc_domain_restore.c Hunk #1 FAILED at 47. 1 out of 1 hunk FAILED -- saving rejects to file b/tools/libxc/xc_domain_restore.c.rej (Stripping trailing CRs from patch.) patching file b/tools/python/xen/lowlevel/checkpoint/libcheckpoint.c Hunk #1 FAILED at 504. 1 out of 1 hunk FAILED -- saving rejects to file b/tools/python/xen/lowlevel/checkpoint/libcheckpoint.c.rej root@neutrino:~/xen-4.1.1# __________________________________________________________________________________________________________> From: rshriram@cs.ubc.ca > Date: Thu, 16 Feb 2012 10:06:56 -0800 > Subject: Re: [Xen-users] Remus crashes only with Windows Server 2003 - tap2 issue > To: dftonywhite@hotmail.com; dimitrios.melissovas@epfl.ch > CC: xen-users@lists.xensource.com > > On Sat, Feb 11, 2012 at 5:17 PM, Antonio Colin <dftonywhite@hotmail.com> wrote: > > > > PS: If you need further information or want me to test something please let me know. > > > > Tony. > > > > ________________________________ > > From: rshriram@cs.ubc.ca > > > > Date: Fri, 10 Feb 2012 11:52:04 -0800 > > To: dftonywhite@hotmail.com > > CC: xen-users@lists.xensource.com > > Subject: Re: [Xen-users] Remus crashes only with Windows Server 2003 > > > > On Thu, Feb 9, 2012 at 10:29 AM, Antonio Colin <dftonywhite@hotmail.com> wrote: > > > > Hi again Shriram, > > > > Thank you for your reply and explanation. You are right I need a different port, may be 9001 in that case, but see... > > That was the full test but in fact I tested everything with one disk "(Unit C:)" and the same thing happens... if you think > > that doing it that way would save more useful information in the logs I can save them again :). > > > > The NFS mount is in /mnt/domus only to begin testing remus. I put one VM image there... start remus with --no-net and everything is fine. > > The directory /home/remus is just to work with remus and disk replication and is not and NFS mount. > > > > It is so strange that it works only for Linux!! (both are HVM) > > > > And yes, if that directory was shared that might corrupt my disk and I also need DRBD to replicate the image... is that possible for img files? > > and just one last question... after failover how can I get back the execution of the VM from the backup to the primary host once it is ready ? > > > > > > Let me investigate the blktap2 issue first. > > DRBD does not replicate img files. You would have to put them in a partition or lvm volume and > > replicate that volume to the backup host. Whether you want to write the image directly to the volume or > > create a File system in that volume and drop the image file there, is upto you. > > > > shriram > > > > Thank you so much!!! > > > > Tony. > > > > > > ________________________________ > > From: rshriram@cs.ubc.ca > > Date: Thu, 9 Feb 2012 00:35:15 -0800 > > > > Subject: Re: [Xen-users] Remus crashes only with Windows Server 2003 > > To: dftonywhite@hotmail.com > > CC: xen-users@lists.xensource.com > > > > > > On Wed, Feb 8, 2012 at 1:56 AM, Antonio Colin <dftonywhite@hotmail.com> wrote: > > > > Hello Shriram, > > > > Just comming back to Remus HA, three weeks ago I sent this thread and the situation hasn''t changed. You are right, > > remus works properly with --no-net option. > > > > There is actually this tapdisk related error in the syslog file in the primary host: > > Jan 17 17:28:58 xen-backup tapdisk2[5795]: remus: could not bind server socket 11 to 192.168.2.4:9000: 98 Address already in use > > > > > > Thanks for the logs. > > The first thing that pops out is: > > [''tap2'', [''uname'', ''tap2:remus:192.168.2.4:9000|aio:/home/remus/win2k3-exchange.img''], [''dev'', ''ioemu:hda''], [''mode'', ''w'']], > > [''tap2'', [''uname'', ''tap2:remus:192.168.2.4:9000|aio:/home/remus/win2k3-exchange-d.img''], [''dev'', ''ioemu:hdb''], [''mode'', ''w'']], > > > > You have two tapdisk devices, but on the same port ? Each disk needs a different port, as a tcp connection is > > established between primary and backup for each replicated disk. > > > > > > > > Also when I boot up the VM (Windows Server 2003) from NFS > > > > > > from NFS ? just to make sure that we are on same page, is the above directory /home/remus an NFS mount ? > > i.e. is that win2k3-exchange.img "shared" between the primary and backup host ? > > If so, then remus disk replication will not work, as its based on a shared-nothing model. > > In fact, it could corrupt your disk badly. If disk consistency is not an issue, then you are better off > > running remus without disk replication (though there is no guarantee that the domain will failover properly). > > > > > > > > and without remus or disk replication, in both the primary and the backup > > there is in fact a vif attached to it which is bind to the bridge in the two cases. > > I have the sch_plug module installed correctly in both hosts and everything works perfect for Linux systems. > > > > > > Oh great. So network buffering is out of the picture. If it works for linux, it should work for windows too. > > > > > > But it just cannot come true > > for Windows. > > > > I attach xend.log and syslog from primary and backup if you''d like to see further information in order to help me. > > > > Thank you a lot!! > > > > Tony. > > > > > From: rshriram@cs.ubc.ca > > > Date: Fri, 13 Jan 2012 09:54:35 -0800 > > > To: xen-users@lists.xensource.com > > > CC: dftonywhite@hotmail.com > > > Subject: Re: [Xen-users] Remus crashes only with Windows Server 2003 > > > > > > > > On Fri, Jan 13, 2012 at 9:05 AM, <xen-users-request@lists.xensource.com> wrote: > > > > I have setup Remus on Debian Squeeze and kernel 3.1.5. Remus and disk replication works perfect for Ubuntu systems, > > > > but when I start Remus for Windows Sever 2003 (running Microsoft Exchange Enterprise 2003) it crashes giving the > > > > following error: > > > > > > > > > > Is that Ubuntu VM a PV or HVM ? > > > I presume that remus with --no-net works properly ? > > > > > > > root@neutrino:~/working-remus# xm create exchange-hvm.cfg > > > > root@neutrino:~/working-remus# remus exchange-hvm 192.168.2.4 > > > > qemu logdirty mode: enable > > > > xc: error: Error when writing to state file (4a) (errno 104) (104 = Connection reset by peer): Internal error > > > > qemu logdirty mode: disable > > > > PROF: resumed at 1326315866.106150 > > > > resuming QEMU > > > > tc filter del dev vif3.0 parent ffff: proto ip pref 10 u32 > > > > RTNETLINK answers: Invalid argument > > > > We have an error talking to the kernel > > > > Exception xen.remus.util.PipeException: PipeException(''tc failed: 2, No such file or directory'',) in <bound method BufferedNIC.__del__ of <xen.remus.device.BufferedNIC object at 0x24b7510>> ignored > > > > > > This error tells me nothing. "Connection reset by peer" could result > > > from a lot of issues. > > > A. check the syslog in primary and backup, for errors related to tapdisk > > > B. Check the xend.log file in backup > > > C. If your system works with --no-net, then try to boot up the VM > > > without remus, and make sure that > > > there is a vif interface for the VM. And make sure that interface is > > > on the bridge (if you have bridging enabled). > > > Remus tries to install a network buffer (sch_plug) to the vif interface. > > > > > > > > > > > > > root@neutrino:~/working-remus# > > > > > > > > It seems that on the backup remus or Xen cannot assign a vif1.0 to the DomU since #ifconfig -a doesn''t show a new vif there > > > > when starting remus. > > > > > > > > Any help would be highly appreciated! > > > > > > > > Tony. > > > > > > _______________________________________________ > > > Xen-users mailing list > > > Xen-users@lists.xensource.com > > > http://lists.xensource.com/xen-users > > > > > > > > > > _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users > > Tony & Dimitrios, > Both of you seem to have faced issues with blktap2 based > disk replication, while running remus. If you are interested in > gettting blktap2 based replication > running, can you guys try the patch below and let me know if it > resolves the issue ? > > The patch basically increases the timeouts on both the disk and > memory checkpoint receivers > (block-remus.c & xc_domain_restore.c respectively) > I have tested Remus on a Windows 7 HVM with blktap2 based replication > (tap2:remus:<host>:<port>|aio:... format) > Things seemed to run fine. > > shriram > --- > diff -r 34dec1562a45 tools/blktap2/drivers/block-remus.c > --- a/tools/blktap2/drivers/block-remus.c Sat Jun 18 20:52:33 2011 -0700 > +++ b/tools/blktap2/drivers/block-remus.c Sat Jun 18 20:52:43 2011 -0700 > @@ -59,7 +59,7 @@ > #include <sys/stat.h> > > /* timeout for reads and writes in ms */ > -#define HEARTBEAT_MS 1000 > +#define HEARTBEAT_MS 5000 > #define RAMDISK_HASHSIZE 128 > > /* connect retry timeout (seconds) */ > diff -r 34dec1562a45 tools/libxc/xc_domain_restore.c > --- a/tools/libxc/xc_domain_restore.c Sat Jun 18 20:52:33 2011 -0700 > +++ b/tools/libxc/xc_domain_restore.c Sat Jun 18 20:52:43 2011 -0700 > @@ -47,7 +47,7 @@ > struct domain_info_context dinfo; > }; > > -#define HEARTBEAT_MS 1000 > +#define HEARTBEAT_MS 5000 > > #define SUPERPAGE_PFN_SHIFT 9 > #define SUPERPAGE_NR_PFNS (1UL << SUPERPAGE_PFN_SHIFT) > diff -r 34dec1562a45 tools/python/xen/lowlevel/checkpoint/libcheckpoint.c > --- a/tools/python/xen/lowlevel/checkpoint/libcheckpoint.c Sat Jun 18 > 20:52:33 2011 -0700 > +++ b/tools/python/xen/lowlevel/checkpoint/libcheckpoint.c Sat Jun 18 > 20:52:43 2011 -0700 > @@ -504,7 +504,7 @@ > FD_ZERO(&rfds); > FD_SET(fd, &rfds); > > - tv.tv_sec = 0; > + tv.tv_sec = 5; > tv.tv_usec = 500000; > > rc = select(fd + 1, &rfds, NULL, NULL, &tv);_______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Shriram Rajagopalan
2012-Feb-27 23:57 UTC
Re: Remus crashes only with Windows Server 2003 - tap2 issue
its patch -p1 On Mon, Feb 27, 2012 at 12:38 PM, Antonio Colin <dftonywhite@hotmail.com>wrote:> Hello Shriram, > > Thanks so much for your patch, I have been trying to apply it but there is > a problem when doing it, > here I send you the errors thrown. > > Any advice on how to do it properly?? > > Thanks a lot! > > Tony. > ---- > > root@neutrino:~/xen-4.1.1# xm list > Name ID Mem VCPUs State > Time(s) > Domain-0 0 2649 1 r----- > 88.2 > root@neutrino:~/xen-4.1.1# patch -p0 < timeouts.patch > (Stripping trailing CRs from patch.) > patching file b/tools/blktap2/drivers/block-remus.c > Hunk #1 FAILED at 59. > 1 out of 1 hunk FAILED -- saving rejects to file > b/tools/blktap2/drivers/block-remus.c.rej > (Stripping trailing CRs from patch.) > patching file b/tools/libxc/xc_domain_restore.c > Hunk #1 FAILED at 47. > 1 out of 1 hunk FAILED -- saving rejects to file > b/tools/libxc/xc_domain_restore.c.rej > (Stripping trailing CRs from patch.) > patching file b/tools/python/xen/lowlevel/checkpoint/libcheckpoint.c > Hunk #1 FAILED at 504. > 1 out of 1 hunk FAILED -- saving rejects to file > b/tools/python/xen/lowlevel/checkpoint/libcheckpoint.c.rej > root@neutrino:~/xen-4.1.1# > > > __________________________________________________________________________________________________________ > > From: rshriram@cs.ubc.ca > > Date: Thu, 16 Feb 2012 10:06:56 -0800 > > Subject: Re: [Xen-users] Remus crashes only with Windows Server 2003 - > tap2 issue > > To: dftonywhite@hotmail.com; dimitrios.melissovas@epfl.ch > > CC: xen-users@lists.xensource.com > > > > > On Sat, Feb 11, 2012 at 5:17 PM, Antonio Colin <dftonywhite@hotmail.com> > wrote: > > > > > > PS: If you need further information or want me to test something > please let me know. > > > > > > Tony. > > > > > > ________________________________ > > > From: rshriram@cs.ubc.ca > > > > > > Date: Fri, 10 Feb 2012 11:52:04 -0800 > > > To: dftonywhite@hotmail.com > > > CC: xen-users@lists.xensource.com > > > Subject: Re: [Xen-users] Remus crashes only with Windows Server 2003 > > > > > > On Thu, Feb 9, 2012 at 10:29 AM, Antonio Colin < > dftonywhite@hotmail.com> wrote: > > > > > > Hi again Shriram, > > > > > > Thank you for your reply and explanation. You are right I need a > different port, may be 9001 in that case, but see... > > > That was the full test but in fact I tested everything with one disk > "(Unit C:)" and the same thing happens... if you think > > > that doing it that way would save more useful information in the logs > I can save them again :). > > > > > > The NFS mount is in /mnt/domus only to begin testing remus. I put one > VM image there... start remus with --no-net and everything is fine. > > > The directory /home/remus is just to work with remus and disk > replication and is not and NFS mount. > > > > > > It is so strange that it works only for Linux!! (both are HVM) > > > > > > And yes, if that directory was shared that might corrupt my disk and I > also need DRBD to replicate the image... is that possible for img files? > > > and just one last question... after failover how can I get back the > execution of the VM from the backup to the primary host once it is ready ? > > > > > > > > > Let me investigate the blktap2 issue first. > > > DRBD does not replicate img files. You would have to put them in a > partition or lvm volume and > > > replicate that volume to the backup host. Whether you want to write > the image directly to the volume or > > > create a File system in that volume and drop the image file there, is > upto you. > > > > > > shriram > > > > > > Thank you so much!!! > > > > > > Tony. > > > > > > > > > ________________________________ > > > From: rshriram@cs.ubc.ca > > > Date: Thu, 9 Feb 2012 00:35:15 -0800 > > > > > > Subject: Re: [Xen-users] Remus crashes only with Windows Server 2003 > > > To: dftonywhite@hotmail.com > > > CC: xen-users@lists.xensource.com > > > > > > > > > On Wed, Feb 8, 2012 at 1:56 AM, Antonio Colin <dftonywhite@hotmail.com> > wrote: > > > > > > Hello Shriram, > > > > > > Just comming back to Remus HA, three weeks ago I sent this thread and > the situation hasn''t changed. You are right, > > > remus works properly with --no-net option. > > > > > > There is actually this tapdisk related error in the syslog file in the > primary host: > > > Jan 17 17:28:58 xen-backup tapdisk2[5795]: remus: could not bind > server socket 11 to 192.168.2.4:9000: 98 Address already in use > > > > > > > > > Thanks for the logs. > > > The first thing that pops out is: > > > [''tap2'', [''uname'', ''tap2:remus:192.168.2.4:9000|aio:/home/remus/win2k3-exchange.img''], > [''dev'', ''ioemu:hda''], [''mode'', ''w'']], > > > [''tap2'', [''uname'', ''tap2:remus:192.168.2.4:9000|aio:/home/remus/win2k3-exchange-d.img''], > [''dev'', ''ioemu:hdb''], [''mode'', ''w'']], > > > > > > You have two tapdisk devices, but on the same port ? Each disk needs a > different port, as a tcp connection is > > > established between primary and backup for each replicated disk. > > > > > > > > > > > > Also when I boot up the VM (Windows Server 2003) from NFS > > > > > > > > > from NFS ? just to make sure that we are on same page, is the above > directory /home/remus an NFS mount ? > > > i.e. is that win2k3-exchange.img "shared" between the primary and > backup host ? > > > If so, then remus disk replication will not work, as its based on a > shared-nothing model. > > > In fact, it could corrupt your disk badly. If disk consistency is not > an issue, then you are better off > > > running remus without disk replication (though there is no guarantee > that the domain will failover properly). > > > > > > > > > > > > and without remus or disk replication, in both the primary and the > backup > > > there is in fact a vif attached to it which is bind to the bridge in > the two cases. > > > I have the sch_plug module installed correctly in both hosts and > everything works perfect for Linux systems. > > > > > > > > > Oh great. So network buffering is out of the picture. If it works for > linux, it should work for windows too. > > > > > > > > > But it just cannot come true > > > for Windows. > > > > > > I attach xend.log and syslog from primary and backup if you''d like to > see further information in order to help me. > > > > > > Thank you a lot!! > > > > > > Tony. > > > > > > > From: rshriram@cs.ubc.ca > > > > Date: Fri, 13 Jan 2012 09:54:35 -0800 > > > > To: xen-users@lists.xensource.com > > > > CC: dftonywhite@hotmail.com > > > > Subject: Re: [Xen-users] Remus crashes only with Windows Server 2003 > > > > > > > > > > > On Fri, Jan 13, 2012 at 9:05 AM, < > xen-users-request@lists.xensource.com> wrote: > > > > > I have setup Remus on Debian Squeeze and kernel 3.1.5. Remus and > disk replication works perfect for Ubuntu systems, > > > > > but when I start Remus for Windows Sever 2003 (running Microsoft > Exchange Enterprise 2003) it crashes giving the > > > > > following error: > > > > > > > > > > > > > Is that Ubuntu VM a PV or HVM ? > > > > I presume that remus with --no-net works properly ? > > > > > > > > > root@neutrino:~/working-remus# xm create exchange-hvm.cfg > > > > > root@neutrino:~/working-remus# remus exchange-hvm 192.168.2.4 > > > > > qemu logdirty mode: enable > > > > > xc: error: Error when writing to state file (4a) (errno 104) (104 > = Connection reset by peer): Internal error > > > > > qemu logdirty mode: disable > > > > > PROF: resumed at 1326315866.106150 > > > > > resuming QEMU > > > > > tc filter del dev vif3.0 parent ffff: proto ip pref 10 u32 > > > > > RTNETLINK answers: Invalid argument > > > > > We have an error talking to the kernel > > > > > Exception xen.remus.util.PipeException: PipeException(''tc failed: > 2, No such file or directory'',) in <bound method BufferedNIC.__del__ of > <xen.remus.device.BufferedNIC object at 0x24b7510>> ignored > > > > > > > > This error tells me nothing. "Connection reset by peer" could result > > > > from a lot of issues. > > > > A. check the syslog in primary and backup, for errors related to > tapdisk > > > > B. Check the xend.log file in backup > > > > C. If your system works with --no-net, then try to boot up the VM > > > > without remus, and make sure that > > > > there is a vif interface for the VM. And make sure that interface is > > > > on the bridge (if you have bridging enabled). > > > > Remus tries to install a network buffer (sch_plug) to the vif > interface. > > > > > > > > > > > > > > > > > root@neutrino:~/working-remus# > > > > > > > > > > It seems that on the backup remus or Xen cannot assign a vif1.0 to > the DomU since #ifconfig -a doesn''t show a new vif there > > > > > when starting remus. > > > > > > > > > > Any help would be highly appreciated! > > > > > > > > > > Tony. > > > > > > > > _______________________________________________ > > > > Xen-users mailing list > > > > Xen-users@lists.xensource.com > > > > http://lists.xensource.com/xen-users > > > > > > > > > > > > > > > _______________________________________________ Xen-users mailing list > Xen-users@lists.xensource.com http://lists.xensource.com/xen-users > > > > Tony & Dimitrios, > > Both of you seem to have faced issues with blktap2 based > > disk replication, while running remus. If you are interested in > > gettting blktap2 based replication > > running, can you guys try the patch below and let me know if it > > resolves the issue ? > > > > The patch basically increases the timeouts on both the disk and > > memory checkpoint receivers > > (block-remus.c & xc_domain_restore.c respectively) > > I have tested Remus on a Windows 7 HVM with blktap2 based replication > > (tap2:remus:<host>:<port>|aio:... format) > > Things seemed to run fine. > > > > shriram > > --- > > diff -r 34dec1562a45 tools/blktap2/drivers/block-remus.c > > --- a/tools/blktap2/drivers/block-remus.c Sat Jun 18 20:52:33 2011 -0700 > > +++ b/tools/blktap2/drivers/block-remus.c Sat Jun 18 20:52:43 2011 -0700 > > @@ -59,7 +59,7 @@ > > #include <sys/stat.h> > > > > /* timeout for reads and writes in ms */ > > -#define HEARTBEAT_MS 1000 > > +#define HEARTBEAT_MS 5000 > > #define RAMDISK_HASHSIZE 128 > > > > /* connect retry timeout (seconds) */ > > diff -r 34dec1562a45 tools/libxc/xc_domain_restore.c > > --- a/tools/libxc/xc_domain_restore.c Sat Jun 18 20:52:33 2011 -0700 > > +++ b/tools/libxc/xc_domain_restore.c Sat Jun 18 20:52:43 2011 -0700 > > @@ -47,7 +47,7 @@ > > struct domain_info_context dinfo; > > }; > > > > -#define HEARTBEAT_MS 1000 > > +#define HEARTBEAT_MS 5000 > > > > #define SUPERPAGE_PFN_SHIFT 9 > > #define SUPERPAGE_NR_PFNS (1UL << SUPERPAGE_PFN_SHIFT) > > diff -r 34dec1562a45 tools/python/xen/lowlevel/checkpoint/libcheckpoint.c > > --- a/tools/python/xen/lowlevel/checkpoint/libcheckpoint.c Sat Jun 18 > > 20:52:33 2011 -0700 > > +++ b/tools/python/xen/lowlevel/checkpoint/libcheckpoint.c Sat Jun 18 > > 20:52:43 2011 -0700 > > @@ -504,7 +504,7 @@ > > FD_ZERO(&rfds); > > FD_SET(fd, &rfds); > > > > - tv.tv_sec = 0; > > + tv.tv_sec = 5; > > tv.tv_usec = 500000; > > > > rc = select(fd + 1, &rfds, NULL, NULL, &tv); >_______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users