We have a number of Sun J4200 SAS JBOD arrays which we have multipathed using Sun''s MPxIO facility. While this is great for reliability, it results in the /dev/dsk device IDs changing from cXtYd0 to something virtually unreadable like "c4t5000C5000B21AC63d0s3". Since the entries in /dev/{rdsk,dsk} are simply symbolic links anyway, would there be any problem with adding "alias" links to /devices there and building our zpools on them? We''ve tried this and it seems to work fine producing a zpool status similar to the following: ... NAME STATE READ WRITE CKSUM vol01 ONLINE 0 0 0 mirror ONLINE 0 0 0 top00 ONLINE 0 0 0 bot00 ONLINE 0 0 0 mirror ONLINE 0 0 0 top01 ONLINE 0 0 0 bot01 ONLINE 0 0 0 ... Here our aliases are "topnn" and "botnn" to denote the disks in the top and bottom JBODs. The obvious question is "what happens if the alias link disappears?". We''ve tested this, and ZFS seems to handle it quite nicely by finding the "normal" /dev/dsk link and simply working with that (although it''s more difficult to get ZFS to use the alias again once it is recreated). If anyone can think of anything really nasty that we''ve missed, we''d appreciate knowing about it. Alternatively, if there is a better supported means of having ZFS display human-readable device ids we''re all ears :-) Perhaps an MPxIO RFE for "vanity" device names would be in order? -- This message posted from opensolaris.org
Hi Sean, I sympathize with your intentions but providing pseudo-names for these disks might cause more confusion than actual help. The "c4t5..." name isn''t so bad. I''ve seen worse. :-) Here are the issues with using the aliases: - If a device fails on a J4200, a LED will indicate which disk has failed but will not identify the alias name. - To prevent confusion with drive failures, you will need to map your aliases to the disk names, possibly pulling out the disks and relabeling them with the alias name. You might be able to use /usr/lib/fm/fmd/fmtopo -V | grep disks to do these mappings online. - We don''t know what fmdump will indicate if a disk has problems, the dev alias or the real disk name. - We don''t know what else might fail The stress of mapping the alias names to real disk names, might happen under duress, like when a disk fails. The physical disk ID on the disk will be included in the expanded name, not the alias name. The hardest part is getting the right disks in the pool. After that, it gets easier. ZFS does a good job of identifying the devices in a pool and will identify which disk is having problems as will the fmdump command. The LEDs on the disks themselves also help disk replacements. The wheels are turning to make device administration easier. We''re just not there yet. Cindy On 11/16/09 19:17, sean walmsley wrote:> We have a number of Sun J4200 SAS JBOD arrays which we have multipathed using Sun''s MPxIO facility. While this is great for reliability, it results in the /dev/dsk device IDs changing from cXtYd0 to something virtually unreadable like "c4t5000C5000B21AC63d0s3". > > Since the entries in /dev/{rdsk,dsk} are simply symbolic links anyway, would there be any problem with adding "alias" links to /devices there and building our zpools on them? We''ve tried this and it seems to work fine producing a zpool status similar to the following: > > ... > NAME STATE READ WRITE CKSUM > vol01 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > top00 ONLINE 0 0 0 > bot00 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > top01 ONLINE 0 0 0 > bot01 ONLINE 0 0 0 > ... > > Here our aliases are "topnn" and "botnn" to denote the disks in the top and bottom JBODs. > > The obvious question is "what happens if the alias link disappears?". We''ve tested this, and ZFS seems to handle it quite nicely by finding the "normal" /dev/dsk link and simply working with that (although it''s more difficult to get ZFS to use the alias again once it is recreated). > > If anyone can think of anything really nasty that we''ve missed, we''d appreciate knowing about it. Alternatively, if there is a better supported means of having ZFS display human-readable device ids we''re all ears :-) > > Perhaps an MPxIO RFE for "vanity" device names would be in order?
sean walmsley wrote:> We have a number of Sun J4200 SAS JBOD arrays which we have multipathed using Sun''s MPxIO facility. While this is great for reliability, it results in the /dev/dsk device IDs changing from cXtYd0 to something virtually unreadable like "c4t5000C5000B21AC63d0s3". > > Since the entries in /dev/{rdsk,dsk} are simply symbolic links anyway, would there be any problem with adding "alias" links to /devices there and building our zpools on them? We''ve tried this and it seems to work fine producing a zpool status similar to the following: > > ... > NAME STATE READ WRITE CKSUM > vol01 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > top00 ONLINE 0 0 0 > bot00 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > top01 ONLINE 0 0 0 > bot01 ONLINE 0 0 0 > ... > > Here our aliases are "topnn" and "botnn" to denote the disks in the top and bottom JBODs. > > The obvious question is "what happens if the alias link disappears?". We''ve tested this, and ZFS seems to handle it quite nicely by finding the "normal" /dev/dsk link and simply working with that (although it''s more difficult to get ZFS to use the alias again once it is recreated). > > If anyone can think of anything really nasty that we''ve missed, we''d appreciate knowing about it. Alternatively, if there is a better supported means of having ZFS display human-readable device ids we''re all ears :-) > > Perhaps an MPxIO RFE for "vanity" device names would be in order? >I recently raised RFE CR 6901193 "Need a command to list current usage of disks, partitions, and slices". Unfortunately, it''s still in triage so probably not yet visible externally. Part of this RFE relates to a requirement for vanity naming of disks, although your requirement is a little different. If you are on support, you should get yourself added to the RFE, together with your precise requirements as above. -- Andrew Gabriel
Cindy: Thanks for your reply. These units are located at a remote site 300km away, so you''re right about the main issue being able to map the OS and/or ZFS device to a physical disk. The use of alias devices was one way we thought to make this mapping more intuitive, although of couse we''d always attempt to double check via the JBOD LEDs. I know that there are big improvements in Opensolaris related to device enumeration (as per Eric Schrock''s blog), but we''re running running Solaris 10 U6 for which: - fmtopo -V doesn''t (yet) produce any output for disks - fmdump doesn''t produce any "human readable" disk ids, only guids which then have to be correlated via a "zdb -c" Sean>Date: Tue, 17 Nov 2009 16:18:52 -0700 >From: Cindy Swearingen <Cindy.Swearingen at sun.com> >Subject: Re: [zfs-discuss] building zpools on device aliases >To: sean walmsley <sean at fpp.nuclearsafetysolutions.com> >Cc: zfs-discuss at opensolaris.org > >Hi Sean, > >I sympathize with your intentions but providing pseudo-names for these >disks might cause more confusion than actual help. > >The "c4t5..." name isn''t so bad. I''ve seen worse. :-) > >Here are the issues with using the aliases: > >- If a device fails on a J4200, a LED will indicate which disk has >failed but will not identify the alias name. >- To prevent confusion with drive failures, you will need to map your >aliases to the disk names, possibly pulling out the disks and relabeling >them with the alias name. You might be able to use >/usr/lib/fm/fmd/fmtopo -V | grep disks to do these mappings online. >- We don''t know what fmdump will indicate if a disk has problems, the >dev alias or the real disk name. >- We don''t know what else might fail > >The stress of mapping the alias names to real disk names, might happen >under duress, like when a disk fails. The physical disk ID on the disk >will be included in the expanded name, not the alias name. > >The hardest part is getting the right disks in the pool. After that, it >gets easier. ZFS does a good job of identifying the devices in a pool >and will identify which disk is having problems as will the fmdump >command. The LEDs on the disks themselves also help disk replacements. > >The wheels are turning to make device administration easier. We''re just >not there yet. > >Cindy > >On 11/16/09 19:17, sean walmsley wrote: >> We have a number of Sun J4200 SAS JBOD arrays which we have multipathed usingSun''s MPxIO facility. While this is great for reliability, it results in the /dev/dsk device IDs changing from cXtYd0 to something virtually unreadable like "c4t5000C5000B21AC63d0s3".>> >> Since the entries in /dev/{rdsk,dsk} are simply symbolic links anyway, wouldthere be any problem with adding "alias" links to /devices there and building our zpools on them? We''ve tried this and it seems to work fine producing a zpool status similar to the following:>> >> ... >> NAME STATE READ WRITE CKSUM >> vol01 ONLINE 0 0 0 >> mirror ONLINE 0 0 0 >> top00 ONLINE 0 0 0 >> bot00 ONLINE 0 0 0 >> mirror ONLINE 0 0 0 >> top01 ONLINE 0 0 0 >> bot01 ONLINE 0 0 0 >> ... >> >> Here our aliases are "topnn" and "botnn" to denote the disks in the top andbottom JBODs.>> >> The obvious question is "what happens if the alias link disappears?". We''vetested this, and ZFS seems to handle it quite nicely by finding the "normal" /dev/dsk link and simply working with that (although it''s more difficult to get ZFS to use the alias again once it is recreated).>> >> If anyone can think of anything really nasty that we''ve missed, we''dappreciate knowing about it. Alternatively, if there is a better supported means of having ZFS display human-readable device ids we''re all ears :-)>> >> Perhaps an MPxIO RFE for "vanity" device names would be in order?================================================================Sean Walmsley sean at fpp . nuclearsafetysolutions dot com Nuclear Safety Solutions Ltd. 416-592-4608 (V) 416-592-5528 (F) 700 University Ave M/S H04 J19, Toronto, Ontario, M5G 1X6, CANADA