thr3ads.net - CentOS - [CentOS] RHEL-6 vs. CentOS-5.5 (was: Static assignment of, SCSI device names?) [Feb 2011]

If this information is useful, please help other people find it:
Share via:

Chuck Munro

2011-Feb-02 07:06 UTC

[CentOS] RHEL-6 vs. CentOS-5.5 (was: Static assignment of, SCSI device names?)

Les Mikesell wrote:>
> On 1/30/11 1:37 PM, Chuck Munro wrote:
>> > Hello list members,
>> >
>> > My adventure into udev rules has taken an interesting turn.  I did
>> > discover a stupid error in the way I was attempting to assign
static
>> > disk device names on CentOS-5.5, so that's out of the way.
>> >
>> > But in the process of exploring, I installed a trial copy of
RHEL-6 on
>> > the new machine to see if anything had changed (since I intend
this box
>> > to run CentOS-6 anyway).
>> >
>> > Lots of differences, and it's obvious that RedHat does things
a bit
>> > differently here and there.  My focus has been on figuring out how
best
>> > to solve my udev challenge, and I found that tools like
'scsi_id' and
>> > udev admin/test commands have changed.  The udev rules themselves
seem
>> > to be the same.
> Do any of the names under /dev/disk/* work for your static identifiers? 
You
> should be able to use them directly instead of using udev to map them to
> something else, making it more obvious what you are doing.  And are these
names
> the same under RHEL6?
>
I was happy to see that device names (at least for SCSI disks) have not 
changed.  The more I look into the whole problem the more I realize that 
I've overstated the difficulty, now that I know how to map out the 
hardware path for any given /dev/sdX I might need to replace.  I've 
never dug as deeply into this before, mostly because I never could find 
the spare time.

I'm happy with simply writing a little script which accepts a /dev/sdX 
device name argument and reformats the output of:
  'udevadm info --query=path --name=/dev/sdX'
to extract the hardware path.  It's a bit cleaner than the current 
RHEL-5/CentOS-5 'udevinfo' command.

Using the numeric path assumes knowledge of how the motherboard sockets 
are laid out and the order in which I/O controller channels are 
discovered, of course.  It's then not difficult to trace a failed drive 
by attaching little tags to the SATA cables from the controller cards.

The real key is to carefully label each SATA cable and its associated 
drive.  Then the little mapping script can be used to identify the 
faulty drive which mdadm reports by its device name.  It just occurred 
to me that whenever mdadm sends an email report, it can also run a 
script which groks out the path info and puts it in the email message. 
Problem solved :-)

So even though I figured out how to add 'alias' symlink names to each 
disk drive, I'm not going to bother with it.  It was a very useful 
learning experience, though.

Chuck

Lamar Owen

2011-Feb-02 16:14 UTC

head link

[CentOS] RHEL-6 vs. CentOS-5.5 (was: Static assignment of, SCSI device names?)

On Wednesday, February 02, 2011 02:06:15 am Chuck Munro
wrote:> The real key is to carefully label each SATA cable and its associated 
> drive.  Then the little mapping script can be used to identify the 
> faulty drive which mdadm reports by its device name.  It just occurred 
> to me that whenever mdadm sends an email report, it can also run a 
> script which groks out the path info and puts it in the email message. 
> Problem solved :-)
Ok, perhaps I'm dense, but, if this is not a hot-swap bay you're talking
about, wouldn't it be easier to have the drive's serial number (or other
identifier found on the label) pulled into the e-mail, and compare with the
label physically found on the drive, since you're going to have to open the
case anyway?  Using something like:

hdparm -I $DEVICE | grep Serial.Number

works here (the regexp Serial.Number matches the string "Serial
Number" without requiring the double quotes....).  Use whatever $DEVICE you
need to use, as long as it's on a controller compatible with hdparm usage.

I have seen cases with a different Linux distribution where the actual module
load order was nondeterministic (modules loaded in parallel); while upstream and
the CentOS rebuild try to make things more deterministic, wouldn't it be
safer to get a really unique, externally visible identifier from the drive?  If
the drive has failed to the degree that it won't respond to the query, then
query all the good drives in the array for their serial numbers, and use a
process of elimination.  This, IMO, is more robust than relying on the drive
detect order to remain deterministic.

If in a hotswap or coldswap bay, do some data access to the array, and see which
LED's don't blink; that should correspond to the failed drive.  If the
bay has secondary LED's, you might be able to blink those, too.

Reasonably Related Threads

Search for more reasonably related threads

CentOS - Feb 2011 - RHEL-6 vs. CentOS-5.5 (was: Static assignment of, SCSI device names?)

[CentOS] RHEL-6 vs. CentOS-5.5 (was: Static assignment of, SCSI device names?)

[CentOS] RHEL-6 vs. CentOS-5.5 (was: Static assignment of, SCSI device names?)

Reasonably Related Threads