Simon Matter
2023-Mar-01 11:22 UTC
[CentOS] EL9/udev generates wrong device nodes/symlinks with HPE Smart Array controller
Hi, I see some strange and dangerous things happening on a HPE server with HPE Smart Array controller where EL9 ends up with wrong device nodes/symlinks to the attached disks/raid volumes: (I didn't touch anything here but at 08:09 some symlinks were changed) /dev/disk/by-id/: lrwxrwxrwx 1 root root 9 Mar 1 07:57 scsi-0HP_LOGICAL_VOLUME_00000000 -> ../../sdc lrwxrwxrwx 1 root root 10 Mar 1 07:57 scsi-0HP_LOGICAL_VOLUME_00000000-part1 -> ../../sdc1 lrwxrwxrwx 1 root root 10 Mar 1 07:57 scsi-0HP_LOGICAL_VOLUME_00000000-part2 -> ../../sdc2 lrwxrwxrwx 1 root root 9 Mar 1 07:57 scsi-0HP_LOGICAL_VOLUME_01000000 -> ../../sdb lrwxrwxrwx 1 root root 9 Mar 1 08:09 scsi-0HP_LOGICAL_VOLUME_02000000 -> ../../sda lrwxrwxrwx 1 root root 9 Mar 1 07:57 scsi-0HP_LOGICAL_VOLUME_03000000 -> ../../sdd lrwxrwxrwx 1 root root 9 Mar 1 08:09 scsi-SHP_LOGICAL_VOLUME_500143801722C0B0 -> ../../sda lrwxrwxrwx 1 root root 10 Mar 1 07:57 scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part1 -> ../../sdc1 lrwxrwxrwx 1 root root 10 Mar 1 07:57 scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part2 -> ../../sdc2 /dev/disk/by-path/: lrwxrwxrwx 1 root root 9 Mar 1 07:57 pci-0000:03:00.0-scsi-0:1:0:0 -> ../../sdc lrwxrwxrwx 1 root root 10 Mar 1 07:57 pci-0000:03:00.0-scsi-0:1:0:0-part1 -> ../../sdc1 lrwxrwxrwx 1 root root 10 Mar 1 07:57 pci-0000:03:00.0-scsi-0:1:0:0-part2 -> ../../sdc2 lrwxrwxrwx 1 root root 9 Mar 1 07:57 pci-0000:03:00.0-scsi-0:1:0:1 -> ../../sdb lrwxrwxrwx 1 root root 9 Mar 1 08:09 pci-0000:03:00.0-scsi-0:1:0:2 -> ../../sda lrwxrwxrwx 1 root root 9 Mar 1 07:57 pci-0000:03:00.0-scsi-0:1:0:3 -> ../../sdd After rebooting, the things are different but also wrong: (here nothing has changed after boot but symlinks are already wrong) /dev/disk/by-id/: lrwxrwxrwx 1 root root 9 Mar 1 10:56 scsi-0HP_LOGICAL_VOLUME_00000000 -> ../../sdb lrwxrwxrwx 1 root root 10 Mar 1 10:56 scsi-0HP_LOGICAL_VOLUME_00000000-part1 -> ../../sdb1 lrwxrwxrwx 1 root root 10 Mar 1 10:56 scsi-0HP_LOGICAL_VOLUME_00000000-part2 -> ../../sdb2 lrwxrwxrwx 1 root root 9 Mar 1 10:56 scsi-0HP_LOGICAL_VOLUME_01000000 -> ../../sda lrwxrwxrwx 1 root root 9 Mar 1 10:56 scsi-0HP_LOGICAL_VOLUME_02000000 -> ../../sdd lrwxrwxrwx 1 root root 9 Mar 1 10:56 scsi-0HP_LOGICAL_VOLUME_03000000 -> ../../sdc lrwxrwxrwx 1 root root 9 Mar 1 10:56 scsi-SHP_LOGICAL_VOLUME_500143801722C0B0 -> ../../sda lrwxrwxrwx 1 root root 10 Mar 1 10:56 scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part1 -> ../../sdb1 lrwxrwxrwx 1 root root 10 Mar 1 10:56 scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part2 -> ../../sdb2 /dev/disk/by-path/: lrwxrwxrwx 1 root root 9 Mar 1 10:56 pci-0000:03:00.0-scsi-0:1:0:0 -> ../../sdb lrwxrwxrwx 1 root root 10 Mar 1 10:56 pci-0000:03:00.0-scsi-0:1:0:0-part1 -> ../../sdb1 lrwxrwxrwx 1 root root 10 Mar 1 10:56 pci-0000:03:00.0-scsi-0:1:0:0-part2 -> ../../sdb2 lrwxrwxrwx 1 root root 9 Mar 1 10:56 pci-0000:03:00.0-scsi-0:1:0:1 -> ../../sda lrwxrwxrwx 1 root root 9 Mar 1 10:56 pci-0000:03:00.0-scsi-0:1:0:2 -> ../../sdd lrwxrwxrwx 1 root root 9 Mar 1 10:56 pci-0000:03:00.0-scsi-0:1:0:3 -> ../../sdc Note that two things are strange: 1) the /dev/sd* nodes are in a random order after every restart. # lsscsi [1:0:0:0] storage HP P410i 6.64 - [1:1:0:0] disk HP LOGICAL VOLUME 6.64 /dev/sdb [1:1:0:1] disk HP LOGICAL VOLUME 6.64 /dev/sda [1:1:0:2] disk HP LOGICAL VOLUME 6.64 /dev/sdd [1:1:0:3] disk HP LOGICAL VOLUME 6.64 /dev/sdc 2) some symlinks created by udev are just wrong and therefore very dangerous to use: scsi-SHP_LOGICAL_VOLUME_500143801722C0B0 -> ../../sda scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part1 -> ../../sdb1 scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part2 -> ../../sdb2 While 1 may be expected(???) I think 2 should really not happen. I've tried to find out where things go wrong but the whole udev stuff started to hurt my brain :) I'm quite sure HPE Smart Array based servers are quite common so my big question is: do others see that same? While it's possible to live with this mess I'd really like to fix it somehow. Thanks, Simon
d tbsky
2023-Mar-02 04:41 UTC
[CentOS] EL9/udev generates wrong device nodes/symlinks with HPE Smart Array controller
Simon Matter <simon.matter at invoca.ch>> 2) some symlinks created by udev are just wrong and therefore very > dangerous to use: > scsi-SHP_LOGICAL_VOLUME_500143801722C0B0 -> ../../sda > scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part1 -> ../../sdb1 > scsi-SHP_LOGICAL_VOLUME_500143801722C0B0-part2 -> ../../sdb2I think it maybe caused by sd driver asynchronous scanning. I am lucky that I didn't see this before. nvme may have similar issues, but nvme has boot parameter to avoid it. Suse has boot parameter to avoid it. with EL9 we will wait until EL 9.3 if we are lucky. I had report issue: https://bugzilla.redhat.com/show_bug.cgi?id=2140017
Possibly Parallel Threads
- EL9/udev generates wrong device nodes/symlinks with HPE Smart Array controller
- EL9/udev generates wrong device nodes/symlinks with HPE Smart Array controller
- EL9 says: pcp-pmie[2870]: Low random number entropy available 15.6%
- Update RPM GPG key for EL9
- Update RPM GPG key for EL9