Chuck Munro
2011-Feb-02 07:06 UTC
[CentOS] RHEL-6 vs. CentOS-5.5 (was: Static assignment of, SCSI device names?)
Les Mikesell wrote:> > On 1/30/11 1:37 PM, Chuck Munro wrote: >> > Hello list members, >> > >> > My adventure into udev rules has taken an interesting turn. I did >> > discover a stupid error in the way I was attempting to assign static >> > disk device names on CentOS-5.5, so that's out of the way. >> > >> > But in the process of exploring, I installed a trial copy of RHEL-6 on >> > the new machine to see if anything had changed (since I intend this box >> > to run CentOS-6 anyway). >> > >> > Lots of differences, and it's obvious that RedHat does things a bit >> > differently here and there. My focus has been on figuring out how best >> > to solve my udev challenge, and I found that tools like 'scsi_id' and >> > udev admin/test commands have changed. The udev rules themselves seem >> > to be the same. > Do any of the names under /dev/disk/* work for your static identifiers? You > should be able to use them directly instead of using udev to map them to > something else, making it more obvious what you are doing. And are these names > the same under RHEL6? >I was happy to see that device names (at least for SCSI disks) have not changed. The more I look into the whole problem the more I realize that I've overstated the difficulty, now that I know how to map out the hardware path for any given /dev/sdX I might need to replace. I've never dug as deeply into this before, mostly because I never could find the spare time. I'm happy with simply writing a little script which accepts a /dev/sdX device name argument and reformats the output of: 'udevadm info --query=path --name=/dev/sdX' to extract the hardware path. It's a bit cleaner than the current RHEL-5/CentOS-5 'udevinfo' command. Using the numeric path assumes knowledge of how the motherboard sockets are laid out and the order in which I/O controller channels are discovered, of course. It's then not difficult to trace a failed drive by attaching little tags to the SATA cables from the controller cards. The real key is to carefully label each SATA cable and its associated drive. Then the little mapping script can be used to identify the faulty drive which mdadm reports by its device name. It just occurred to me that whenever mdadm sends an email report, it can also run a script which groks out the path info and puts it in the email message. Problem solved :-) So even though I figured out how to add 'alias' symlink names to each disk drive, I'm not going to bother with it. It was a very useful learning experience, though. Chuck
Lamar Owen
2011-Feb-02 16:14 UTC
[CentOS] RHEL-6 vs. CentOS-5.5 (was: Static assignment of, SCSI device names?)
On Wednesday, February 02, 2011 02:06:15 am Chuck Munro wrote:> The real key is to carefully label each SATA cable and its associated > drive. Then the little mapping script can be used to identify the > faulty drive which mdadm reports by its device name. It just occurred > to me that whenever mdadm sends an email report, it can also run a > script which groks out the path info and puts it in the email message. > Problem solved :-)Ok, perhaps I'm dense, but, if this is not a hot-swap bay you're talking about, wouldn't it be easier to have the drive's serial number (or other identifier found on the label) pulled into the e-mail, and compare with the label physically found on the drive, since you're going to have to open the case anyway? Using something like: hdparm -I $DEVICE | grep Serial.Number works here (the regexp Serial.Number matches the string "Serial Number" without requiring the double quotes....). Use whatever $DEVICE you need to use, as long as it's on a controller compatible with hdparm usage. I have seen cases with a different Linux distribution where the actual module load order was nondeterministic (modules loaded in parallel); while upstream and the CentOS rebuild try to make things more deterministic, wouldn't it be safer to get a really unique, externally visible identifier from the drive? If the drive has failed to the degree that it won't respond to the query, then query all the good drives in the array for their serial numbers, and use a process of elimination. This, IMO, is more robust than relying on the drive detect order to remain deterministic. If in a hotswap or coldswap bay, do some data access to the array, and see which LED's don't blink; that should correspond to the failed drive. If the bay has secondary LED's, you might be able to blink those, too.
Possibly Parallel Threads
- RHEL-6 vs. CentOS-5.5 (was: Static assignment of SCSI device names?)
- Static assignment of SCSI device names?
- Problem with mdadm, raid1 and automatically adds any disk to raid
- Some RAID-6 observations ... RHEL-6 vs CentOS-5.5
- Problem with mdadm, raid1 and automatically adds any disk to raid