McCulloch, Alan
2009-Sep-27 20:08 UTC
[CentOS] SUMMARY : multipath using defaults rather than multipath.conf contents for some devices (?) - why ?
The reason for the behaviour observed below turned out to be that the device entry in /etc/multipath.conf was inadvertently appended *after* the devices section , rather than inside it - so that we had #devices { # device { # blah blah # } (file has a bunch of defaults commented out) # etc #} # # device { our settings } *rather than* devices { device { our settings } } Also - looking more closely at our multipath.conf.defaults, there is an entry for a product pattern HSV2.* That would explain why the multipath settings for the HSV200 looked different to the HSV400 - the HSV200 was picking up a different set of defaults. However still not exactly what we had specified. (sort of like CSS behaviour....."cascading multipath.conf defaults" ! :-) ) You also possibly need to pay attention to the basic whitespace formatting of this file - for example I had noticed the above early on and rebooted with it "fixed" , only to find that this meant the system came up unable to even recognise the ext3 filesystem, complaining about bad superblock etc etc. After recovering from that ( see below) I went back ( a week later) and a) made sure that the whitespace - tabs etc - looked like the other commented out defaults and b) added in a second device entry (from the defaults section - not a device we even have) , just in case there was some bug relating to having only one device entry in the "devices" section. Whether because of those or some other change, the system now comes up fine, and with multipath -ll now reporting the correct settings for the HSV400. So issue resolved finally. Touch wood. Here is a useful tip (not news to gurus but was to me ) : when things turn to custard on reboot and everything including the root filesystem is mounted readonly - so you can't even restore multipath.conf to the backup you made before rebooting - you can mount -n -o remount / to remount / as writeable - then you can cd /etc cp multipath.conf.bu1 multipath.conf and breathe a big sigh of relief ! If there are any multipath developers reading this - it could be handy if multipath could log some diagnostic info about how it parses the conf file , and exactly what entry it ends up using , which would then appear in /var/log/messages and /var/log/dmesg, since it seems relatively easy to end up matching a device spec you didn't expect, and the effects on performance are subtle so easy to overlook this. Cheers AMcC ------------------------ original post ------------------------ hi all We have a rh linux server connected to two HP SAN controllers, one an HSV200 (on the way out), the other an HSV400 (on the way in). (Via a Qlogic HBA). /etc/multipath.conf contains this : device { vendor "(COMPAQ|HP)" product "HSV1[01]1|HSV2[01]0|HSV300|HSV4[05]0" getuid_callout "/sbin/scsi_id -g -u -s /block/%n" prio_callout "/sbin/mpath_prio_alua /dev/%n" hardware_handler "0" path_selector "round-robin 0" path_grouping_policy group_by_prio failback immediate rr_weight uniform no_path_retry 18 rr_min_io 100 path_checker tur } - but our actual multipathing as shown by multipath -ll , and multipath -ll -v 3 looks as though for the HSV400 it is using the defaults rather than these settings. The defaults are #defaults { # udev_dir /dev # polling_interval 10 # selector "round-robin 0" # path_grouping_policy multibus # getuid_callout "/sbin/scsi_id -g -u -s /block/%n" # prio_callout /bin/true # path_checker readsector0 # rr_min_io 100 # rr_weight priorities # failback immediate # no_path_retry fail # user_friendly_name yes and multipath -ll reports : . . [snip other HSV400 paths - all similar] mpath12 (3600508b40007518f0000900000520000) dm-1 HP,HSV400 [size=150G][features=0][hwhandler=0] \_ round-robin 0 [prio=1][active] \_ 0:0:5:9 sdab 65:176 [active][ready] \_ round-robin 0 [prio=1][enabled] \_ 0:0:3:9 sdn 8:208 [active][ready] \_ round-robin 0 [prio=1][enabled] \_ 0:0:4:9 sdu 65:64 [active][ready] mpath11 (3600508b40007518f0000700000370000) dm-6 HP,HSV200 [size=200G][features=1 queue_if_no_path][hwhandler=0] \_ round-robin 0 [prio=50][active] \_ 0:0:1:7 sdd 8:48 [active][ready] \_ round-robin 0 [prio=10][enabled] \_ 0:0:2:7 sdh 8:112 [active][ready] . . [snip other HSV200 paths - all similar] multipath -ll -v 3 includes explicit statements that defaults are being used for the HSV400 (long output snipped...) sdaa: path checker = readsector0 (config file default) versus sda: path checker = tur (controller setting) sdx: getprio = NULL (internal default) versus sdd: getprio = /sbin/mpath_prio_alua %n (controller setting) - furthermore we see in the log file messages from both readsector0 *and* tur rather than just tur if the correct settings were used , which also backs that up. My questions are basically - why is it happening , and how to fix it ? The vendor and product regexps definitely do match both "HSP" and both "HSV200" and "HSV400" respectively so it doesn't seem that fiddling with the patterns will work , and I'm sure this config has been tested. Its not due to this server having to deal with two controllers - we have a second server that only mounts from the HSV400, and its multipath settings appear to be entirely the defaults, and not what we have set. (And conversely, its not due to the conf file not being read at all - since the server with two controllers is using the correct config for one of them , but not the other.) thanks for any tips and I will summarise. Cheers AMcC ======================================================================Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. =======================================================================