I have two vmware guests that share a OCFS2 partition through VMWare VMFS. It resides on a SAN. Each host has the following error in the kernel log: [2037805.922718] end_request: I/O error, dev sdb, sector 1735 [2037805.922974] (0,0):o2hb_bio_end_io:225 ERROR: IO Error -5 [2037805.923370] (27506,0):o2hb_do_disk_heartbeat:753 ERROR: status = -5 We also see the following on both machines: (1888,0):o2hb_do_disk_heartbeat:753 ERROR: status = -5 [202381.822030] (1888,0):o2hb_do_disk_heartbeat:762 ERROR: Device "sdb1": another node is heartbeating in our slot! I notice the sector is the same on both machines: 1735. Is this an issue with vmware? thanks, -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20101209/285f23b5/attachment.html
On Thu, Dec 09, 2010 at 04:02:14PM -0600, brad hancock wrote:> I have two vmware guests that share a OCFS2 partition through VMWare VMFS. > It resides on a SAN.Let me see if I understand the configuration. You have a SAN. On that SAN is a LUN. You have one VMWare host in this configuration. The VMWare host has formatted that LUN for VMFS. There is a disk image on the VMFS that both guests see as sdb. Is this correct?> Each host has the following error in the kernel log: > > [2037805.922718] end_request: I/O error, dev sdb, sector 1735 > [2037805.922974] (0,0):o2hb_bio_end_io:225 ERROR: IO Error -5 > [2037805.923370] (27506,0):o2hb_do_disk_heartbeat:753 ERROR: status = -5 > > > We also see the following on both machines: > > (1888,0):o2hb_do_disk_heartbeat:753 ERROR: status = -5 > [202381.822030] (1888,0):o2hb_do_disk_heartbeat:762 ERROR: Device "sdb1": > another node is heartbeating in our slot! > > I notice the sector is the same on both machines: 1735. > > Is this an issue with vmware?The read errors ("another node is heartbeating in our slot") sound like VMWare is not allowing the nodes to see each other's activity. The write error is more surprising, but it could be part of the same issue. Joel -- "Every day I get up and look through the Forbes list of the richest people in America. If I'm not there, I go to work." - Robert Orben Joel Becker Senior Development Manager Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127
On Fri, 10 Dec 2010 06:26:06 -0800, ocfs2-users-request at oracle.com wrote:> > My setup has the SCSI controller set to Physical so the guest can be on > different hosts, but I do not have the disk setup as Independent. I am > going > to change that setting in VMware and see if it makes a difference. > > > [2037805.922718] end_request: I/O error, dev sdb, sector 1735 > > [2037805.922974] (0,0):o2hb_bio_end_io:225 ERROR: IO Error -5 > > [2037805.923370] (27506,0):o2hb_do_disk_heartbeat:753 ERROR: status -5Brad, I have had the same issue for over a year on ESX 3.5 as well as on vSphere 4.0. I have not tried yet on 4.1. The error occurs when I put the shared disk on either SATA or FC LUNs on our SAN. It also doesn't matter if the virtual machines are on the same physical host or not (with independent disks). The only problem that has come from it is the occasional reboot of one of the VMs, which for me is tolerable. I keep hoping to upgrade to a new SAN thinking that might fix it. The vSphere 4.0 release IOPS capability is higher than the SAN (it's 5 years old) so I didn't think it was VMware's fault. If you have fairly new hardware, maybe there is a real bug somewhere. I don't get I/O errors in any of my other implementations on this SAN. I sent a post like yours to the list when I first built it, but never opened a bug report with either OCFS or VMware. If you create a bug report I could add information from my implementation as well. (I actually have two of these setups and they both have the same errors.) Of course, if you find a solution, please post that as well. Thanks, Kevin