Philippe Lang
2010-May-07 08:32 UTC
[Xen-users] Concurrent local write error in DRBD with GPL PV drivers
Hi, We are about to start using in production Windows 2003 64bits XEN VMs running on top of a DRBD cluster. We have installed the latest GPL PV optimized drivers. (gplpv_2003x64_0.11.0.213.msi) Sometimes, when the VM writes on disk, we can see the following error in the DRBD log: May 6 17:45:48 s3 kernel: [198887.626841] drbd0: blkback.35.hda[3549] Concurrent local write detected! [DISCARD L] new: 26745079s +4096; pending: 26745079s +4096 This error is discussed in the following thread of the DRBD mailing list: http://lists.linbit.com/pipermail/drbd-user/2009-April/011873.html Basically, it is due to the fact that a write occurs at a specific location, while another "in-flight" write is taking place at the same location. In order to avoid a cluster desynchronization, DRBD drops the second write. We were able to reproduce this problem, with the help of a Windows program called "PerformanceTest" from Passmark Software. When doing a "Disk Random Seek +RW" test, the logs gets filled with the error mentioned at the top of this message. We have tested a VM *without* the gplpv drivers, and *no error* appears. We have tested previous drivers versions (0.11.0.188, 0.10.0.142), and the same error appears. So, we have the feeling there is some kind of error in the driver, although we have never experienced a single VM crash. Can we safely ignore the "Concurrent local write error" mentioned in the log, or is that really a bug that should be corrected before using the driver in production? Best regards, ------------------------------------------------------------- Attik System web : http://www.attiksystem.ch Philippe Lang phone: +41 26 422 13 75 rte de la Fonderie 2 gsm : +41 79 351 49 94 1700 Fribourg pgp : http://keyserver.pgp.com _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Philippe Lang
2010-May-12 06:02 UTC
[Xen-users] RE: Concurrent local write error in DRBD with GPL PV drivers
Hi, There''s was no answer to my post of last week. Does anyone know how to contact directly the maintainers of the GPL PV driver maybe? Before starting using our new servers in production, we''d like to know if the drbd error mentioned is not problem at all, or if we have better wait until this case is being investigated. One more precision: we use Debian Lenny for the dom0 OS. Best regards, ------------------------------------------------------------- Attik System web : http://www.attiksystem.ch Philippe Lang phone: +41 26 422 13 75 rte de la Fonderie 2 gsm : +41 79 351 49 94 1700 Fribourg pgp : http://keyserver.pgp.com> -----Message d''origine----- > De : xen-users-bounces@lists.xensource.com [mailto:xen-users- > bounces@lists.xensource.com] De la part de Philippe Lang > Envoyé : vendredi 7 mai 2010 10:33 > À : xen-users@lists.xensource.com > Objet : [Xen-users] Concurrent local write error in DRBD with GPL PV > drivers > > Hi, > > We are about to start using in production Windows 2003 64bits XEN VMs > running on top of a DRBD cluster. We have installed the latest GPL PV > optimized drivers. (gplpv_2003x64_0.11.0.213.msi) > > Sometimes, when the VM writes on disk, we can see the following error > in the DRBD log: > > May 6 17:45:48 s3 kernel: [198887.626841] drbd0: blkback.35.hda[3549] > Concurrent local write detected! [DISCARD L] new: 26745079s +4096; > pending: 26745079s +4096 > > This error is discussed in the following thread of the DRBD mailing > list: > > http://lists.linbit.com/pipermail/drbd-user/2009-April/011873.html > > Basically, it is due to the fact that a write occurs at a specific > location, while another "in-flight" write is taking place at the same > location. In order to avoid a cluster desynchronization, DRBD drops the > second write. > > We were able to reproduce this problem, with the help of a Windows > program called "PerformanceTest" from Passmark Software. When doing a > "Disk Random Seek +RW" test, the logs gets filled with the error > mentioned at the top of this message. > > We have tested a VM *without* the gplpv drivers, and *no error* > appears. We have tested previous drivers versions (0.11.0.188, > 0.10.0.142), and the same error appears. > > So, we have the feeling there is some kind of error in the driver, > although we have never experienced a single VM crash. Can we safely > ignore the "Concurrent local write error" mentioned in the log, or is > that really a bug that should be corrected before using the driver in > production? > > Best regards, > > ------------------------------------------------------------- > Attik System web : http://www.attiksystem.ch > Philippe Lang phone: +41 26 422 13 75 > rte de la Fonderie 2 gsm : +41 79 351 49 94 > 1700 Fribourg pgp : http://keyserver.pgp.com_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
James Harper
2010-May-12 06:37 UTC
RE: [Xen-users] RE: Concurrent local write error in DRBD with GPL PVdrivers
> Hi, > > There's was no answer to my post of last week. Does anyone know how to contact > directly the maintainers of the GPL PV driver maybe?That would be me. Sorry I didn't see your original email.> > Before starting using our new servers in production, we'd like to know if the > drbd error mentioned is not problem at all, or if we have better wait until > this case is being investigated. > > One more precision: we use Debian Lenny for the dom0 OS. >I have a similar configuration (dual-primary drbd) and have seen similar drbd messages but have never investigated. I have never had any filesystem corruption even under high usage but that doesn't mean it isn't possible. After reading the stuff you posted and linked to it looks like something that is worth investigating. Can you tell me more about your non-PV setup? Are you still using phy:? One difference between PV and non-PV drivers is that GPLPV uses the scsiport interface which does allow multiple outstanding requests. When not using PV drivers, windows uses the qemu emulated IDE drivers which probably don't allow more than one outstanding request, so this situation could never arise. What I need to know is if Windows is giving me requests in a broken way, or if GPLPV is handling them in a broken way... I'll post on the ntdev mailing list and see if someone there knows. After accumulating a bunch of these errors in your test environment, can you do a chkdsk and see what comes up? Thanks James> Best regards, > > ------------------------------------------------------------- > Attik System web : http://www.attiksystem.ch > Philippe Lang phone: +41 26 422 13 75 > rte de la Fonderie 2 gsm : +41 79 351 49 94 > 1700 Fribourg pgp : http://keyserver.pgp.com > > > > -----Message d'origine----- > > De : xen-users-bounces@lists.xensource.com [mailto:xen-users- > > bounces@lists.xensource.com] De la part de Philippe Lang > > Envoyé : vendredi 7 mai 2010 10:33 > > À : xen-users@lists.xensource.com > > Objet : [Xen-users] Concurrent local write error in DRBD with GPL PV > > drivers > > > > Hi, > > > > We are about to start using in production Windows 2003 64bits XEN VMs > > running on top of a DRBD cluster. We have installed the latest GPL PV > > optimized drivers. (gplpv_2003x64_0.11.0.213.msi) > > > > Sometimes, when the VM writes on disk, we can see the following error > > in the DRBD log: > > > > May 6 17:45:48 s3 kernel: [198887.626841] drbd0: blkback.35.hda[3549] > > Concurrent local write detected! [DISCARD L] new: 26745079s +4096; > > pending: 26745079s +4096 > > > > This error is discussed in the following thread of the DRBD mailing > > list: > > > > http://lists.linbit.com/pipermail/drbd-user/2009-April/011873.html > > > > Basically, it is due to the fact that a write occurs at a specific > > location, while another "in-flight" write is taking place at the same > > location. In order to avoid a cluster desynchronization, DRBD drops the > > second write. > > > > We were able to reproduce this problem, with the help of a Windows > > program called "PerformanceTest" from Passmark Software. When doing a > > "Disk Random Seek +RW" test, the logs gets filled with the error > > mentioned at the top of this message. > > > > We have tested a VM *without* the gplpv drivers, and *no error* > > appears. We have tested previous drivers versions (0.11.0.188, > > 0.10.0.142), and the same error appears. > > > > So, we have the feeling there is some kind of error in the driver, > > although we have never experienced a single VM crash. Can we safely > > ignore the "Concurrent local write error" mentioned in the log, or is > > that really a bug that should be corrected before using the driver in > > production? > > > > Best regards, > > > > ------------------------------------------------------------- > > Attik System web : http://www.attiksystem.ch > > Philippe Lang phone: +41 26 422 13 75 > > rte de la Fonderie 2 gsm : +41 79 351 49 94 > > 1700 Fribourg pgp : http://keyserver.pgp.com_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Philippe Lang
2010-May-12 07:07 UTC
RE: [Xen-users] RE: Concurrent local write error in DRBD with GPL PVdrivers
> That would be me. Sorry I didn''t see your original email.No problem! Thanks for replying so quickly.> I have a similar configuration (dual-primary drbd) and have seen > similar drbd messages but have never investigated. I have never had any > filesystem corruption even under high usage but that doesn''t mean it > isn''t possible. After reading the stuff you posted and linked to it > looks like something that is worth investigating. > > Can you tell me more about your non-PV setup? Are you still using phy:?Here is the complete configuration file. It looks like we are using phy:, yes. --------------------------------------------------------------------- s3:/etc/xen# more test.cfg name = ''test'' kernel = ''/usr/lib/xen-3.2-1/boot/hvmloader'' builder = ''hvm'' memory = ''2048'' device_model=''/usr/lib/xen-3.2-1/bin/qemu-dm'' vcpus = 1 disk = [ ''phy:/dev/drbd0,ioemu:hda,w'', ''file:/misc/isostar/windows/Windows Server 2003 R2 Std X64 - 2.iso,ioemu:hdc:cdrom,r'' ] #disk = [ ''phy:/dev/drbd0,hda,w'' ] boot = ''c'' # According to http://wiki.xensource.com/xenwiki/XenWindowsGplPv # "In your machine configuration, make sure you don''t use the ioemu network driver." # # vif = [ ''type=ioemu,bridge=br50'' ] vif = [ ''bridge=br50'' ] # VNC vnc = 1 vncviewer = 1 vncdisplay = 1100 sdl = 0 # Solve mouse position problem in the console. usbdevice = ''tablet'' ---------------------------------------------------------------------> One difference between PV and non-PV drivers is that GPLPV uses the > scsiport interface which does allow multiple outstanding requests. When > not using PV drivers, windows uses the qemu emulated IDE drivers which > probably don''t allow more than one outstanding request, so this > situation could never arise. > > What I need to know is if Windows is giving me requests in a broken > way, or if GPLPV is handling them in a broken way... I''ll post on the > ntdev mailing list and see if someone there knows. > > After accumulating a bunch of these errors in your test environment, > can you do a chkdsk and see what comes up?I''ll do that and post the results here, ok. Thanks for your help, and best regards! Philippe Lang ------------------------------------------------------------- Attik System web : http://www.attiksystem.ch Philippe Lang phone: +41 26 422 13 75 rte de la Fonderie 2 gsm : +41 79 351 49 94 1700 Fribourg pgp : http://keyserver.pgp.com _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users