Hi, guys, I have a little question about ZFS. If the firmware of a SCSI disk device can provide a unique ID (WWN or Serial number), sd driver will create a devid for that device. And the devid uniquely identifies this device. Image that we have two HBAs in a system. And we put a SCSI disk under control of a HBA, and have ZFS use this disk. And then, we move this disk from current HBA to another HBA, and reboot. In this case, the device path and probably instance number of that disk will change. Can ZFS find that disk via devid and work well? If the answer is NO, can I understand devid for ZFS in this way: ZFS just uses devid to do one thing. When it wants to open a disk via device path (/dev/rdsk/cxtxdx), it makes sure opened disk device is the right one by comparing devid. Happy Christmas! thanks in advance, minskey
Hello Chao-Hong, Wednesday, December 28, 2005, 7:23:50 AM, you wrote: CHMG> Hi, guys, CHMG> I have a little question about ZFS. CHMG> If the firmware of a SCSI disk device can provide a unique ID CHMG> (WWN or Serial number), sd driver will create a devid for that CHMG> device. And the devid uniquely identifies this device. CHMG> Image that we have two HBAs in a system. And we put a SCSI disk CHMG> under control of a HBA, and have ZFS use this disk. And then, CHMG> we move this disk from current HBA to another HBA, and reboot. CHMG> In this case, the device path and probably instance number of CHMG> that disk will change. Can ZFS find that disk via devid and work CHMG> well? It will work. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
On Wed, Dec 28, 2005 at 12:38:45PM +0100, Robert Milkowski wrote:> > CHMG> Image that we have two HBAs in a system. And we put a SCSI disk > CHMG> under control of a HBA, and have ZFS use this disk. And then, > CHMG> we move this disk from current HBA to another HBA, and reboot. > CHMG> In this case, the device path and probably instance number of > CHMG> that disk will change. Can ZFS find that disk via devid and work > CHMG> well? > > It will work. >In particular, see vdev_disk_open(): http://cvs.opensolaris.org/source/xref/on/usr/src/uts/common/fs/zfs/vdev_disk.c#49 We first attempt to open the path and compare the devid, since it''s possible to have multiple paths on a system corresponding to the devid. If this doesn''t work, we simply open by devid, which will solve your case above. Note that there is an open bug to update the paths to reflect the new devices in this situation: 6364582 need to fixup paths if they''ve changed Right now the disk will open correctly, but the path displayed by ''zpool status'' and ''zpool iostat -v'' will always be incorrect. This is on my list of bugs to tackle after the holidays. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Quick side question: will zfs load balance between different I/O paths? Casper
On Wed, Dec 28, 2005 at 05:10:18PM +0100, Casper.Dik at Sun.COM wrote:> > Quick side question: will zfs load balance between different I/O paths? >ZFS will balance I/O across top level devices. Currently, this is done with a slightly skewed preference to "less full" devices. This has the beneficial effect that newly attached devices will (gradually) fill up to the same level (percentage-wise) as the other devices. Future enhancements will likely include latency and bandwidth calculations into the picture. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
>On Wed, Dec 28, 2005 at 05:10:18PM +0100, Casper.Dik at Sun.COM wrote: >> >> Quick side question: will zfs load balance between different I/O paths? >> > >ZFS will balance I/O across top level devices. Currently, this is done >with a slightly skewed preference to "less full" devices. This has the >beneficial effect that newly attached devices will (gradually) fill up >to the same level (percentage-wise) as the other devices. Future >enhancements will likely include latency and bandwidth calculations into >the picture.So in a multi-pathed I/O configuration it will not send I/O through different paths for the same device except in case one path fails? Casper
Casper.Dik at sun.com wrote:>> On Wed, Dec 28, 2005 at 05:10:18PM +0100, Casper.Dik at Sun.COM wrote: >> >>> Quick side question: will zfs load balance between different I/O paths? >>> >>> >> ZFS will balance I/O across top level devices. Currently, this is done >> with a slightly skewed preference to "less full" devices. This has the >> beneficial effect that newly attached devices will (gradually) fill up >> to the same level (percentage-wise) as the other devices. Future >> enhancements will likely include latency and bandwidth calculations into >> the picture. >> > > So in a multi-pathed I/O configuration it will not send I/O > through different paths for the same device except in case one > path fails? >I may be way off here but I think Erik is talking about placement of data and Casper is talking about the path routing of the data. One would be ZFS and the other would be Traffic Manager, aka mpxio.
On Wed, Dec 28, 2005 at 01:24:25PM -0500, Torrey McMahon wrote:> > >So in a multi-pathed I/O configuration it will not send I/O > >through different paths for the same device except in case one > >path fails? > > > > > I may be way off here but I think Erik is talking about placement of > data and Casper is talking about the path routing of the data. One would > be ZFS and the other would be Traffic Manager, aka mpxio.Yep, I misinterpreted Casper''s question. ZFS does nothing special in the face of mpxio, and relies on the underlying configuration to do any load balancing/failover between paths at this level. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Eric Schrock wrote:> On Wed, Dec 28, 2005 at 01:24:25PM -0500, Torrey McMahon wrote: > >>> So in a multi-pathed I/O configuration it will not send I/O >>> through different paths for the same device except in case one >>> path fails? >>> >>> >> I may be way off here but I think Erik is talking about placement of >> data and Casper is talking about the path routing of the data. One would >> be ZFS and the other would be Traffic Manager, aka mpxio. >> > > Yep, I misinterpreted Casper''s question. ZFS does nothing special in > the face of mpxio, and relies on the underlying configuration to do any > load balancing/failover between paths at this level. > >The question then is, should mpxio and zfs do something special? Might be interesting if the two worked together.
Torrey McMahon wrote:> Eric Schrock wrote: > >> On Wed, Dec 28, 2005 at 01:24:25PM -0500, Torrey McMahon wrote: >> >> >>>> So in a multi-pathed I/O configuration it will not send I/O >>>> through different paths for the same device except in case one >>>> path fails? >>>> >>>> >>> >>> I may be way off here but I think Erik is talking about placement of >>> data and Casper is talking about the path routing of the data. One >>> would be ZFS and the other would be Traffic Manager, aka mpxio. >>> >> >> >> Yep, I misinterpreted Casper''s question. ZFS does nothing special in >> the face of mpxio, and relies on the underlying configuration to do any >> load balancing/failover between paths at this level. >> >> > > > The question then is, should mpxio and zfs do something special? Might > be interesting if the two worked together.Looks like the two work together... Or did you have in mind something else ? # zpool status pool: ds4300 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM ds4300 ONLINE 0 0 0 raidz ONLINE 0 0 0 c4t600A0B800019D82F00001484438A6168d0 ONLINE 0 0 0 c4t600A0B800019D82F00001486438A6198d0 ONLINE 0 0 0 c4t600A0B800019D82F0000148243899250d0 ONLINE 0 0 0 c4t600A0B800019DD1B00002A4E43899277d0 ONLINE 0 0 0 c4t600A0B800019DD1B00002A51438A6185d0 ONLINE 0 0 0 c4t600A0B800019DD1B00002A54438A61ADd0 ONLINE 0 0 0 # luxadm disp /dev/rdsk/c4t600A0B800019D82F00001484438A6168d0s2 DEVICE PROPERTIES for disk: /dev/rdsk/c4t600A0B800019D82F00001484438A6168d0s2 Vendor: IBM Product ID: 1722-600 Revision: 0520 Serial Num: 1T53176260 Unformatted capacity: 139501.500 MBytes Write Cache: Enabled Read Cache: Enabled Minimum prefetch: 0x100 Maximum prefetch: 0x100 Device Type: Disk device Path(s): /dev/rdsk/c4t600A0B800019D82F00001484438A6168d0s2 /devices/scsi_vhci/disk at g600a0b800019d82f00001484438a6168:c,raw Controller /dev/cfg/c3 Device Address 200600a0b819d830,2 Host controller port WWN 2100000d6050e7de Class primary State ONLINE Controller /dev/cfg/c3 Device Address 200700a0b819d830,2 Host controller port WWN 2100000d6050e7de Class secondary State STANDBY #> _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Cyril Plisko wrote:> Torrey McMahon wrote: >> Eric Schrock wrote: >> >>> On Wed, Dec 28, 2005 at 01:24:25PM -0500, Torrey McMahon wrote: >>> >>> >>>>> So in a multi-pathed I/O configuration it will not send I/O >>>>> through different paths for the same device except in case one >>>>> path fails? >>>>> >>>>> >>>> >>>> I may be way off here but I think Erik is talking about placement >>>> of data and Casper is talking about the path routing of the data. >>>> One would be ZFS and the other would be Traffic Manager, aka mpxio. >>>> >>> >>> >>> Yep, I misinterpreted Casper''s question. ZFS does nothing special in >>> the face of mpxio, and relies on the underlying configuration to do any >>> load balancing/failover between paths at this level. >>> >>> >> >> >> The question then is, should mpxio and zfs do something special? >> Might be interesting if the two worked together. > > > Looks like the two work together... Or did you have in mind something > else ?The work with each other now. ZFS passes I/O down to a target and mpxio takes it from there. However, ZFS could pass, for lack of a better word, "hints" down the stack. I''m still on vacation so my brain isn''t coming up with any good examples but I''m sure someone can think of one before the year ends. :)