Hi, We carried few tests under FreeBSD using our LSI HBA after enabling multipathing. During the test of verifying the fail over, inconsistently we are observing data corruption while removing the active path. The way we enabled multipathing and steps that we had followed for the test are as below Procedure to Enable the Multipath: 1. Load the Multipath using the following cmd - "gmultipah load" 2. Make the following entry in the /boot/loder.conf - geom_multipath_load="YES" 3. Assign the label for the drives using the following cmd - gmultipah label -v mpta /dev/da1 /dev/da2 4. Create the file system using the following cmd - newfs /dev/multipath/mpta 5. Mount the file system to the /mnt/ location 6. Run the IO's on the mounted file system Observation details: 1. Connect the Enclosure with 2 to 3 SAS drives to system which has multipath enabled. 2. Create the file system and run the IO's using the above procedure. 3. Unplug the Active path from the enclosure. 4. Sometimes the fail over on the passive path happens. But at times Data corruption is seen on any one of drive while pulling out the active path from the enclosure. On Other drives the IO still continue to run. 5. After removing the active path the multipath status will be shown up as degraded. Again on inserting the multipath the status will come up as optimal. The IOs are still running but the drive on which data corruption occurs the IOs never continue on that drive. The testing was done on two different enclosure and details are as follows. Server Enclosure Observation IBM x3650 M4 Camden 1 out 5 iterations Data corruption on any one of the drive. IBM x3650 M4 DELL Power Vault MD1220 1 out 5 iterations Data corruption on any one of the drive. Kindly let us know if there any issues similar to this has been reported to the community. Or does it look like a fresh and does it really seem to be potential defect? Best Regards Subramani
On Mon, 17 Sep 2012 04:14:14 -0500, Pasupathy, Subramani <Subramani.Pasupathy@lsi.com> wrote:> > Kindly let us know if there any issues similar to this has been reported > to the community. Or does it look like a fresh and does it really seem > to be potential defect?I did extensive testing of the new(er) multipath code earlier (January-March) this year before putting our ZFS SANs into production. I was never able to produce corruption by breaking the multipath during my tests. There would be errors, but it always seemed to recover. I also would have expected ZFS to notice write errors or something during scrubbing. Hopefully this info is useful to someone out there...