Matthew Bohnsack
2008-Aug-28 15:12 UTC
[Linux_hpc_swstack] Feedback on latest release of Sun HPC Software, Linux Edition
Hello. I recently tried out the latest release of Sun HPC Software, Linux Edition on a small testbed cluster and would like to share my experiences and give some feedback. I installed the software on a setup shown in the attached diagram - TestBed.pdf. Notable aspects of this cluster, include: * I used a Linux workstation on my desk running Fedora 9 to remotely manage the system. * There''s a SunX2100 with two Ethernet interfaces that I use as the cluster management/head node. One of the node''s interfaces connects to my corporate network and thereby my workstation, while the other connects to the cluster/mgmt network. I prefer that a head node like this has three network interfaces, so that cluster mgmt traffic (e.g., power control) can be isolated from other cluster traffic (e.g., NFS), but two interfaces seem fine for a small testbed. * There are two compute nodes. The first one is a Sun X6220 AMD blade and the second one is a Sun X6250 Intel blade. Both of these blades are running inside of a Sun 6000 blade chassis. * The blade chassis has a Chassis Management Module (CMM) that can be used to access blade Service Processors (SPs). This connectivity enables serial console, power control, and other systems management functionality. I used a CentOS5.2 machine to build a Sun HPC Software, Linux Edition ISO image as follows: # wget http://dlc.sun.com/linux_hpc/iso/tools/sunhpc-linux-iso-0.5-sunhpc2.noarch.rpm # rpm -Uvh sunhpc-linux-iso-0.5-sunhpc2.noarch.rpm # sunhpc-linux-iso Numbered notes on the rest of the process follow... 01) The first thing I notice is that there''s a tarball fetched that has no version information in it (sunhpc_base.tgz). I.e., Downloading ISO skeleton... --15:29:09-- http://dlc.sun.com/linux_hpc/iso/base/sunhpc_base.tgz Without version information, how can different releases create repeatably simmilar ISO images? I.e., sunhpc-linux-iso-0.1-sunhpc6.noarch.rpm referring to one versioned ISO skeleton file while sunhpc-linux-iso-0.1-sunhpc6.noarch.rpm refers to another. 02) Taking the repeatability theme a little further, how is it possible to make a repeatable release, such that I can deploy exactly the same bits that Sun has tested, when yum might download newer or older RPM packages, depending on when sunhpc-linux-iso is run? This applies equally to repos containing CentOS RPMs and those containing Sun RPMS. 03) sunhpc-linux-iso-0.5-sunhpc2.noarch.rpm creates files in /etc/sunhpc-linux-iso/yum.repos.d/ that have invalid URLs of the form "http://giraffe.lustre/..." E.g., /etc/sunhpc-linux-iso/yum.repos.d/runhpc-centos5.repo: [sunhpc-centos5] name=SunHPC Packages for CentOS 5 baseurl=http://giraffe.lustre/dlc_stage/yum/sunhpc/1.0/c5/x86_64/ http://dlc.sun.com/linux_hpc/yum/sunhpc/1.0/c5/x86_64/ gpgcheck=0 enabled=1 This causes ugly errors like: Starting download for packages in sunhpc-centos5... Downloading RPMs with yumdownloader... sunhpc-centos5-updates 100% |=========================| 951 B 00:00 primary.xml.gz 100% |=========================| 301 kB 00:01 sunhpc-cen: ################################################## 605/605 http://giraffe.lustre/dlc_stage/yum/sunhpc/1.0/c5/x86_64/repodata/repomd.xml: [Errno 4] IOError: <urlopen error (-2, ''Name or service not known'')> What is this "giraffe.lustre", some kind of internal testing domain? Is it required in code that''s distributed to the public? 04) Even though the RPM version number was incremented with the recent release, the filename of the generated ISO was not, so it''s hard for me to tell the difference between resultant releases and documentation from one RPM revision to another. E.g., I have two files in my home directory named sun-linux-hpc-1.0-PREVIEW.iso. Which is which? I''m also puzzled by the naming convention of the documentation file. E.g., installation_guide_1.0_preview.pdf vs installation_guide_1.0.pdf. What will the title be in the next release? 05) I installed from the ISO on my Sun X2100 management/head node machine per installation_guide_1.0.pdf, selecting the Lustre software option, but not customizing any software packages. This resulted in a system running 2.6.18-53.1.21.el5.sunhpc2. 06) As is already reported in the release notes, a double reboot was required after the install, before the system could be used. 07) After installation, I noticed that root''s .bash* ini files were missing. I added them, consulting another machine''s files. This seems like bug. 08) One of the first things I tried to do was ssh to my management node from my Linux workstation, so that I could access a target compute blade''s VGA console via the web management GUI with a remote instance of firefox. This didn''t work for multiple reasons: * A full X11 installation wasn''t done. * Ssh didn''t support setting up $DISPLAY. * There were two conflicting instances of firefox installed (32 and 64-bit). For the purposes of running Java applets required by the web GUI, the 32-bit firefox is needed. * The Java plugin for 32-bit firefox wasn''t installed. I can understand that some people might not want/need X11 or firefox, but for a management node it''s a requirement for me, and in most of my installations, it''s also required on the compute nodes for totalview, among other things. Here''s what I had to do to fix things: a. I found that I couldn''t install X11, until I did a yum update. b. To do a yum update, I had to exclude various RPMs in /etc/yum.conf: exclude=kernel* opensm* infiniband* c. yum update d. yum groupinstall -y "X Window System" e. rpm -e firefox.x86_64 f. rpm -e firefox.i386 g. install firefox.i386 h. I installed Sun JDK 1.5.0_15 in /opt/java/jdk1.5.0_15 and then created scripts in /etc/profile.d, such that... JAVA_HOME=/opt/java/jdk1.5.0_15 PATH=$JAVA_HOME/bin:$PATH k. cd /usr/lib/mozilla/plugns && ln -s /opt/java/jdk1.5.0_15/jre/plugin/i386/ns7/libjavaplugin_oji.so 09) The documentation''s step #3 lists a table of nodes, IP addresses, and MAC addresses. While this kind of information is needed, and someone experienced with implementing and administering Linux clusters will be familiar with what''s going on, there''s little context to let a new user know why this information is important to them. If you''re going to have a sample table, I suggest, providing a lot more context. E.g., a picture and description of a sample cluster. On my clusters, I would like cluster IP addresses to be first put in a hosts file like /etc/hosts and then optionally automatically turned into DNS configuration files. My testbed /etc/hosts file follows: == BEGIN /etc/hosts = 127.0.0.1 localhost.localdomain localhost # Management/Head Node # ==================== # Has two Ethernet interfaces: # eth0: connected to the corporate network # eth1: connected to the cluster/management network 10.0.0.101 cdsytestbed # eth0 192.168.1.254 cdsytestbed-mgmt # eth1 # Management Devices # ==================== 192.168.1.200 cmm1 192.168.1.1 b01-sp 192.168.1.2 b02-sp # Compute Nodes # ==================== 192.168.1.201 b01 192.168.1.202 b02 == END /etc/hosts = 10) Step 4, 2 seems innocuous, but I see this as a critically important, missing piece of functionality in the current release. IMHO, its absence is a showstopper on a large system. That is, "Set up management interfaces, such as the sun Integrated Lights Out Manager (ILOM) service sprocessors (SPs), to prepare to provision the systems." is simple enough to do manually on a small test cluster, but is untenable on a large system. I''ve developed tools (mostly Perl scripts that "expect" various SMASH CLI interfaces) to: a. Get the MAC address of the CMM via a serial connection. b. Access CMM over IP, set IP address and make other CMM settings. c. Update CMM firmware over IP (Assumes a TFTP server). d. Verify CMM firmware versions. e. Set IP addresses of SPs. f. Turn off blades. g. Update blade and SP firmware (CPLD and REM is still TODO). h. Turn on blades. i. Verify blade firmware versions. j. HCA flash to enable boot-over-IB (for a different Infiniband machine I''m working with). k. Collect IB link layer host IDs (for a different Infiniband machine I''m working with). Is there any interest is putting this stuff into the release or something like it? I''d like to see a toolkit to do this stuff in a pluggably general way. E.g., Tools that enable X6220 and X6250 blades to be updated/configured with similar commands, but there would be underlying scripting that takes care of different commands, firmware, etc. for the different platforms. 11) Going on the Step 5 in the Sun Documentation (Create configurations for all nodes in the cluster)... 12) It might be nice to include a link to the actual OneSIS documentation. This whole process is rather mysterious without it. 13) I changed my /etc/sunhpc.conf as follows. Note that this file currently defaults to an old kernel. It needs to be manually changed to the new one. Also, in my case, DHCPD_IF needed to be changed. DISTRO=centos-5.1 ONESIS_IMAGE=/var/lib/oneSIS/image/centos-5.1 DHCPD_IF=eth1 KERNEL_VERSION=2.6.18-53.1.21.el5.sunhpc2 UNUSED_PKGS="conman powerman ganglia-gmetad sunhpc-configuration" 14) Step 5.2 when you''re asked to put MAC addresses in /etc/dhcp_hostlist is another step that could be automated for various supported machines. E.g., SMASH commands like "show /SYS/MB/NET0 fru_serial_number" for X6220s and "show /SYS/NICInfo0 MacAddress1" on X6250s. Any plans for this? I think its critical on large machines. My resultant file is: host b01 {hardware ethernet 00:14:4f:82:1e:02; fixed-address 192.168.1.201;} host b02 {hardware ethernet 00:1e:68:57:77:58; fixed-address 192.168.1.202;} 15) I see that dhcpd is configured in /etc/dhcpd_sunhpc.conf. Why not the default dhcpd.conf? This is very confusing. 16) When running /usr/sbin/sunhpc_setup -i without a NETWORK directive in /etc/sysconfig/network-scripts/ifcfg-eth0 there''s poor grammar in an error message, that you might want to fixup: incompleted network information, please check /etc/sysconfig/network-scripts/ifcfg-${DHCPD_IF}" I.e., s/incompleted/incomplete/ 17) Possibly because of the "yum update" I did to get X11 working, sunhpc_setup results in OneSIS giving an error message about a patch reject in /etc/rc.d/rc.sysinit. I fixed it up manually - I think. More testing is required: oneSIS: Warning! Patch failed or was only partially successful. * * * patching file etc/rc.d/rc.sysinit Hunk #1 FAILED at 702. Hunk #2 succeeded at 736 (offset 24 lines). Hunk #4 succeeded at 883 (offset 24 lines). Hunk #6 succeeded at 968 (offset 24 lines). 1 out of 6 hunks FAILED -- saving rejects to file /tmp/rejects * * * The patch may have failed because it was updated, outdated, * * * or just defective. Support is available on the oneSIS mailing * * * list, onesis-users at lists.sourceforge.net 18) It might be a good idea to explain exactly what /usr/sbin/sunhpc_setup -i is doing. Especially, since there''s no reference to the OneSIS documentation. What''s the ''-i'' for? Do I have to provide that argument, if I''m generating a second image. I don''t think there''s a way for me to figure this out without reading the script line-by-line. 19) Why does the documentation''s step 5.4.b call for turning off iptables? If something doesn''t work with IP tables on, what is it? Surely it''s possible to add some exceptions and keep this excellent security mechanism in place. On most of my deployments, iptables is a requirement on nodes that face the external network. On my testbed, I simply kept it up on the eth0 interface of my head node and then allowed all traffic on the eth1 which faced the rest of the cluster. If I was responsible for the documentation, I''d be careful about advocating the removal of a security feature, without further clarification and explanation. 20) I changed /tftpboot/pxelinux.cfg as follows. The most important differences are a change to the default serial port for X6220s and X6250s and the kernel version: PROMPT 1 TIMEOUT 20 IPAPPEND 2 DEFAULT linux label linux kernel vmlinuz-2.6.18-53.1.21.el5.sunhpc2 append root=/dev/nfs console=ttyS1,9600 initrd=initrd-2.6.18-53.1.21.el5.sunhpc2.img selinux=0 label rescue kernel vmlinuz-2.6.18-53.1.21.el5.sunhpc2 append root=/dev/nfs console=ttyS1,9600 rescue initrd=initrd-2.6.18-53.1.21.el5.sunhpc2.img selinux=0 21) I much prefer using the pxelinux config technique that allows pxelinux to dynamically select its configuration file rather than hardcoding it in dhcpd.conf. See "Next, it will search for the config file using its own IP address.." in http://syslinux.zytor.com/pxe.php As things currently stand, you have to either edit a PXE config file in a strange way or edit dhcpd.conf and restart the dhcpd server to make a node boot an alternative image. This doesn''t seem like a very scalable practice. I want to be able to change the image that a node boots (including the possibility for a local HD image) simply on the command line for any combination of n nodes and m images - without needing to restart dhcp server(s). As things are currently setup, this isn''t possible. 22) Which version of pxelinux are we running. I find this difficult to determine: # rpm -qf /tftpboot/pxelinux.0 file /tftpboot/pxelinux.0 is not owned by any package 23) Regarding step 6 - Boot up the client nodes: a. The documentation calls for things like: ipmipower -W endianseq -h b01-sp -u root -p changeme --stat ipmipower -W endianseq -h b01-sp -u root -p changeme --off ipmipower -W endianseq -h b01-sp -u root -p changeme --on This works for X6220s, but not for X6250s. For X6250s you need: ipmipower -W authcap -h b02-sp -u root -p changeme --stat ipmipower -W authcap -h b02-sp -u root -p changeme --off ipmipower -W authcap -h b02-sp -u root -p changeme --on That is, a different IPMI workaround is required for Intel/ELOM nodes. b. I suggest setting up powerman at this point before going any further and not confusing matters with ipmipower. To get powerman working with the mixed IPMI workaround, I had to create an /etc/powerman/powerman.conf like: include "/etc/powerman/ipmipower.dev" alias "all" "b01,b02" device "pow0" "ipmipower" "/usr/sbin/ipmipower -h b01-sp --config /etc/ipmipower_amd.conf |&" node "b01" "pow0" "b01-sp" device "pow1" "ipmipower" "/usr/sbin/ipmipower -h b02-sp --config /etc/ipmipower_intel.conf |&" node "b02" "pow1" "b02-sp" And then /etc/ipmipower_amd.conf like: hostname b01-sp username root password changeme workaround-flags "endianseq" on-if-off enable wait-until-on enable wait-until-off enable And finally /etc/ipmipower_intel.conf like: hostname b02-sp username root password changeme workaround-flags "authcap" on-if-off enable wait-until-on enable wait-until-off enable 24) Step 7: Build SSH keys is ripe for a small shell script. No? 25) I think that many users will want to get ConMan setup before actually booting compute nodes, so you might want to move the documentation''s section on ConMan up a few sections. To get ConMan to work in a useful way, I needed to make some edits to /etc/conman.conf that could maybe be there by default (e.g., server logdir="/var/log/conman"). In addition, I had to make an expect script to make things work with the unsupported X6250 nodes. Using sun-ilom.exp as a base, I re-worked it to work with X6250s. It''s not too well tested, but it''s attached as sun-elom.exp and seems to work. 26) After ConMan is setup correctly, I was able to get OneSIS images booted, except the images weren''t setup to have initab start serial gettys after boot, so it was impossible to login via the serial console. See http://www.vanemery.com/Linux/Serial/serial-console.html The documentation might want to mention something about this. 27) I''m going to need diskfull installs. When will this functionality be available? 28) How does the verification suite work? The documentation mentions it, but I couldn''t figure out how to run it. 29) I''d like to see a parallel ping capability included (e.g., fping: http://fping.sourceforge.net/) That''s it for now. Let me know if you need any clarification. I''m going to be testing all of this in a boot-over-IB environment including Lustre in the next week or so, so that experience will undoubtedly generate some more feedback. -Matthew -------------- next part -------------- A non-text attachment was scrubbed... Name: TestBed.pdf Type: application/pdf Size: 122524 bytes Desc: not available Url : http://lists.lustre.org/pipermail/linux_hpc_swstack/attachments/20080828/9d15b584/attachment-0001.pdf -------------- next part -------------- #!/usr/bin/expect -f ############################################################################### # $Id: sun-ilom.exp 749 2007-05-19 07:25:15Z dun $ ############################################################################### # Copyright (C) 2001-2007 The Regents of the University of California. # Produced at Lawrence Livermore National Laboratory. # Written by Chris Dunlap <cdunlap at llnl.gov>. # UCRL-CODE-2002-009. # # This file is part of ConMan: The Console Manager. # For details, see <http://home.gna.org/conman/>. ############################################################################### # This script connects to a console managed by the Integrated Lights Out # Manager (ILOM) on a Sun server using the SSH protocol. # # This script can be specified in "conman.conf" in the following manner: # # console name="zot" dev="/path/to/sun-ilom.exp HOST USER PSWD" # # HOST is the hostname of the Sun ILOM server. # USER is the username being authenticated. # PSWD is the corresponding password. # # Since this command-line will persist in the process listing for the duration # of the connection, passing sensitive information like PSWD in this manner is # not recommended. Instead, consider using either a command-line argument # default or the password database (see below). ############################################################################### ## # Set "exp_internal" to 1 to print diagnostics describing internal operations. # This is helpful in diagnosing pattern-match failures. ## exp_internal 0 ## # Set "log_user" to 1 to show the underlying dialogue establishing a connection # to the console. ## log_user 1 ## # The "timeout" specifies the number of seconds before the connection attempt # times-out and terminates the connection. ## set timeout 10 ## # If "session_override" is set, an existing connection to this console session # will be terminated and the new connection will be established. Subsequent # attempts to steal this console session will be thwarted. # Otherwise, an existing connection to this console session will cause the new # connection to fail with "console session already in use". If the console # session is not in use and the connection succeeds, the console session may # subsequently be stolen thereby causing this connection to terminate with # "console session stolen". ## set session_override 1 ## # The "password_db" specifies the location of the password database. # This avoids exposing sensitive information on the command-line without # needing to modify this script. # Whitespace and lines beginning with ''#'' are ignored. The file format is: # <host-regex> : <user> : <pswd> ## set password_db "/etc/conman.pswd" ## # Command-line argument defaults can be specified here. This avoids exposing # sensitive information on the command-line. ## # set user "root" # set pswd "changeme" ############################################################################### set env(PATH) "/usr/bin:/bin" proc get_password {host user index} { global password_db set db_pswd {} if {! [info exists password_db]} { return } if {[catch {open $password_db} input]} { return } while {[gets $input line] != -1} { if {[regexp {^[ \t]*#} $line]} { continue } set record [split $line ":"] set db_host [string trim [lindex $record 0]] if {[catch {regexp "^$db_host$" $host} got_host_match]} { continue } if {! $got_host_match && [string length $db_host]} { continue } set db_user [string trim [lindex $record 1]] if {[string compare $db_user $user]} { continue } set db_pswd [string trim [lindex $record $index]] break } close $input return $db_pswd } if {! $argc} { set prog [lindex [split $argv0 "/"] end] send_user "Usage: $prog <host> <user> <pswd>\r\n" exit 1 } if {$argc > 0} { set host [lindex $argv 0] } if {$argc > 1} { set user [lindex $argv 1] } if {$argc > 2} { set pswd [lindex $argv 2] } set authenticated 0 set connected 0 if {! [info exists host]} { send_user "Error: Unspecified hostname.\r\n" exit 1 } if {! [info exists user]} { send_user "Error: Unspecified username.\r\n" exit 1 } if {! [info exists pswd]} { set pswd [get_password $host $user 2] if {! [string length $pswd]} { send_user "Error: Unspecified password.\r\n" exit 1 } } if {! [info exists session_override]} { set session_override 0 } set override $session_override if {[catch "spawn ssh -a -e \& -l $user -p 22 -x $host" spawn_result]} { send_user "Error: $spawn_result.\r\n" exit 1 } expect { -re "console activate successful\[^\r]*\r+\n" { ; } -gl "Permission denied" { send_user "Error: Permission denied.\r\n" exit 1 } -gl "Serial console is in use" { send_user "Error: Console session already in use.\r\n" exit 1 } -re "Invalid command (''\[^'']*'')" { send_user "Error: Invalid ILOM command $expect_out(1,string).\r\n" exit 1 } -re "^ssh: (\[^\r]*)\r+\n" { send_user "Error: $expect_out(1,string).\r\n" exit 1 } eof { send_user "Error: Connection closed by remote host.\r\n" exit 1 } timeout { send_user "Error: Timed-out.\r\n" exit 1 } -nocase -gl "Are you sure you want to continue connecting (yes/no)? \$" { send "yes\r" exp_continue -continue_timer } -nocase -gl "Password: \$" { if {$authenticated == 0} { send "$pswd\r" incr authenticated exp_continue -continue_timer } else { send_user "Error: Permission denied.\r\n" exit 1 } } -nocase -gl "-> \$" { if {$connected != 0} { send_user "Error: Unexpected ILOM response.\r\n" exit 1 } elseif {$override != 0} { send "stop /SP/AgentInfo/Console\r" set override 0 } else { send "start /SP/AgentInfo/Console\r" incr connected } exp_continue -continue_timer } -re "\[^\r]*\r+\n" { exp_continue -continue_timer } } send_user "Connection established via ssh (pid $spawn_result).\r\n" set timeout 2 interact { # Replace "&B" with serial-break. "&B" { send "\033B" } # Match subsequent patterns against spawned process, not user''s keystrokes. -o # Disable "ESC (" sequence for stopping console and returning to ILOM prompt. -re "\r\nSerial console stopped.\r\n\r\n-> \$" { send "start /SP/AgentInfo/Console\r" expect -re "\r\nconsole activate successful\[^\r]*\r+\n" } # Prevent theft of console if "session_override" is enabled; o/w, exit. -re "\r\n-> \$" { if {$session_override == 0} { send_user "\r\nConsole session stolen.\r\n" exit 1 } send "start -script /SP/console\r" expect { -re "\r\nSerial console is in use.\r\n" { send_user "\r\nConsole session stolen.\r\n" exit 1 } -re "\r\nconsole activate successful\[^\r]*\r+\n" } } }
Makia Minich
2008-Sep-01 22:13 UTC
[Linux_hpc_swstack] Feedback on latest release of Sun HPC Software, Linux Edition
First off, thanks for the comments; they''ll be quite helpful. My comments will be strewn throughout. (Zhiqi, please look for bug notes on things to put into bugzilla.) Matthew Bohnsack wrote:> Hello. > > I recently tried out the latest release of Sun HPC Software, Linux > Edition on a small testbed cluster and would like to share my > experiences and give some feedback. > > I installed the software on a setup shown in the attached diagram - > TestBed.pdf. Notable aspects of this cluster, include: > > * I used a Linux workstation on my desk running Fedora 9 to > remotely manage the system. > * There''s a SunX2100 with two Ethernet interfaces that I use as > the cluster management/head node. One of the node''s interfaces > connects to my corporate network and thereby my workstation, > while the other connects to the cluster/mgmt network. I prefer > that a head node like this has three network interfaces, so that > cluster mgmt traffic (e.g., power control) can be isolated from > other cluster traffic (e.g., NFS), but two interfaces seem fine > for a small testbed. > * There are two compute nodes. The first one is a Sun X6220 AMD > blade and the second one is a Sun X6250 Intel blade. Both of > these blades are running inside of a Sun 6000 blade chassis. > * The blade chassis has a Chassis Management Module (CMM) that can > be used to access blade Service Processors (SPs). This > connectivity enables serial console, power control, and other > systems management functionality. > > I used a CentOS5.2 machine to build a Sun HPC Software, Linux Edition > ISO image as follows: > > # wget > http://dlc.sun.com/linux_hpc/iso/tools/sunhpc-linux-iso-0.5-sunhpc2.noarch.rpm > # rpm -Uvh sunhpc-linux-iso-0.5-sunhpc2.noarch.rpm > # sunhpc-linux-iso > > Numbered notes on the rest of the process follow... > > 01) The first thing I notice is that there''s a tarball fetched that has > no version information in it (sunhpc_base.tgz). I.e., > > Downloading ISO skeleton... > --15:29:09-- http://dlc.sun.com/linux_hpc/iso/base/sunhpc_base.tgz > > Without version information, how can different releases create > repeatably simmilar ISO images? I.e., > sunhpc-linux-iso-0.1-sunhpc6.noarch.rpm referring to one versioned ISO > skeleton file while sunhpc-linux-iso-0.1-sunhpc6.noarch.rpm refers to > another.We''ll take a closer look at the versioning and make sure we get some kind of standard across releases. The feeling is that we want to make sure that the base of the ISO is always at the latest version (since this does not necessarily affect what is on the ISO itself, but does make sure that the ISO is functioning). There is some versioning that can be done in the base as well as time stamps that tell us which version is used.> 02) Taking the repeatability theme a little further, how is it possible > to make a repeatable release, such that I can deploy exactly the same > bits that Sun has tested, when yum might download newer or older RPM > packages, depending on when sunhpc-linux-iso is run? This applies > equally to repos containing CentOS RPMs and those containing Sun RPMS.The repo file used with the sunhpc-linux-iso script looks at the updates directory for CentOS releases but only the 1.0 releases for SunHPC tools. So, in this instance, if you create an ISO it is made up of all the pieces from our stack and includes all the newest updates from the CentOS base. Therefore, we know that the tools we provide are at levels we''ve tested, and we rely on the CentOS teams to test their own packages. With this idea, every ISO should have all the latest security significant as released via CentOS. We could do some extra work to make sure that this is more known.> 03) sunhpc-linux-iso-0.5-sunhpc2.noarch.rpm creates files > in /etc/sunhpc-linux-iso/yum.repos.d/ that have invalid URLs of the form > "http://giraffe.lustre/..." > > E.g., /etc/sunhpc-linux-iso/yum.repos.d/runhpc-centos5.repo: > > [sunhpc-centos5] > name=SunHPC Packages for CentOS 5 > baseurl=http://giraffe.lustre/dlc_stage/yum/sunhpc/1.0/c5/x86_64/ > http://dlc.sun.com/linux_hpc/yum/sunhpc/1.0/c5/x86_64/ > gpgcheck=0 > enabled=1 > > This causes ugly errors like: > > Starting download for packages in sunhpc-centos5... > Downloading RPMs with yumdownloader... > sunhpc-centos5-updates 100% |=========================| 951 B > 00:00 > primary.xml.gz 100% |=========================| 301 kB > 00:01 > sunhpc-cen: ################################################## 605/605 > > http://giraffe.lustre/dlc_stage/yum/sunhpc/1.0/c5/x86_64/repodata/repomd.xml: > [Errno 4] IOError: <urlopen error (-2, ''Name or service not known'')> > > What is this "giraffe.lustre", some kind of internal testing domain? Is > it required in code that''s distributed to the public?This will get cleaned up. (Giraffe bug to be filed.)> 04) Even though the RPM version number was incremented with the recent > release, the filename of the generated ISO was not, so it''s hard for me > to tell the difference between resultant releases and documentation from > one RPM revision to another. E.g., I have two files in my home > directory named sun-linux-hpc-1.0-PREVIEW.iso. Which is which? I''m > also puzzled by the naming convention of the documentation file. E.g., > installation_guide_1.0_preview.pdf vs installation_guide_1.0.pdf. What > will the title be in the next release?Documentation changes are being worked on. We''ll determine a standard release name for them. As for the ISO names, thanks for the input, we''ll look into it.> 05) I installed from the ISO on my Sun X2100 management/head node > machine per installation_guide_1.0.pdf, selecting the Lustre software > option, but not customizing any software packages. This resulted in a > system running 2.6.18-53.1.21.el5.sunhpc2. > > 06) As is already reported in the release notes, a double reboot was > required after the install, before the system could be used. > > 07) After installation, I noticed that root''s .bash* ini files were > missing. I added them, consulting another machine''s files. This seems > like bug.Not necessarily sure what''s wrong here. The root account should be created via the standard installer''s method, so /etc/skel should have been copied correctly. We''ll take a look and see how to fix this (Giraffe bug to be filed).> 08) One of the first things I tried to do was ssh to my management node > from my Linux workstation, so that I could access a target compute > blade''s VGA console via the web management GUI with a remote instance of > firefox. This didn''t work for multiple reasons: > > * A full X11 installation wasn''t done. > * Ssh didn''t support setting up $DISPLAY. > * There were two conflicting instances of firefox installed (32 > and 64-bit). For the purposes of running Java applets required > by the web GUI, the 32-bit firefox is needed. > * The Java plugin for 32-bit firefox wasn''t installed. > > I can understand that some people might not want/need X11 or firefox, > but for a management node it''s a requirement for me, and in most of my > installations, it''s also required on the compute nodes for totalview, > among other things. > > Here''s what I had to do to fix things: > > a. I found that I couldn''t install X11, until I did a yum update. > b. To do a yum update, I had to exclude various RPMs in /etc/yum.conf: > exclude=kernel* opensm* infiniband* > c. yum update > d. yum groupinstall -y "X Window System" > e. rpm -e firefox.x86_64 > f. rpm -e firefox.i386 > g. install firefox.i386 > h. I installed Sun JDK 1.5.0_15 in /opt/java/jdk1.5.0_15 and then > created scripts in /etc/profile.d, such that... > JAVA_HOME=/opt/java/jdk1.5.0_15 > PATH=$JAVA_HOME/bin:$PATH > k. cd /usr/lib/mozilla/plugns && ln > -s /opt/java/jdk1.5.0_15/jre/plugin/i386/ns7/libjavaplugin_oji.soWe will visit this and look at what is needed on our end to make sure this works better.> 09) The documentation''s step #3 lists a table of nodes, IP addresses, > and MAC addresses. While this kind of information is needed, and > someone experienced with implementing and administering Linux clusters > will be familiar with what''s going on, there''s little context to let a > new user know why this information is important to them. If you''re > going to have a sample table, I suggest, providing a lot more context. > E.g., a picture and description of a sample cluster. On my clusters, I > would like cluster IP addresses to be first put in a hosts file > like /etc/hosts and then optionally automatically turned into DNS > configuration files. > > My testbed /etc/hosts file follows: > > == BEGIN /etc/hosts => 127.0.0.1 localhost.localdomain localhost > > # Management/Head Node > # ====================> # Has two Ethernet interfaces: > # eth0: connected to the corporate network > # eth1: connected to the cluster/management network > 10.0.0.101 cdsytestbed # eth0 > 192.168.1.254 cdsytestbed-mgmt # eth1 > > # Management Devices > # ====================> 192.168.1.200 cmm1 > 192.168.1.1 b01-sp > 192.168.1.2 b02-sp > > # Compute Nodes > # ====================> 192.168.1.201 b01 > 192.168.1.202 b02 > == END /etc/hosts =We will do some documentation reviews.> 10) Step 4, 2 seems innocuous, but I see this as a critically important, > missing piece of functionality in the current release. IMHO, its > absence is a showstopper on a large system. > > That is, > > "Set up management interfaces, such as the sun Integrated Lights Out > Manager (ILOM) service sprocessors (SPs), to prepare to provision the > systems." > > is simple enough to do manually on a small test cluster, but is > untenable on a large system. I''ve developed tools (mostly Perl scripts > that "expect" various SMASH CLI interfaces) to: > > a. Get the MAC address of the CMM via a serial connection. > b. Access CMM over IP, set IP address and make other CMM settings. > c. Update CMM firmware over IP (Assumes a TFTP server). > d. Verify CMM firmware versions. > e. Set IP addresses of SPs. > f. Turn off blades. > g. Update blade and SP firmware (CPLD and REM is still TODO). > h. Turn on blades. > i. Verify blade firmware versions. > j. HCA flash to enable boot-over-IB (for a different Infiniband > machine I''m working with). > k. Collect IB link layer host IDs (for a different Infiniband machine > I''m working with). > > Is there any interest is putting this stuff into the release or > something like it? I''d like to see a toolkit to do this stuff in a > pluggably general way. E.g., Tools that enable X6220 and X6250 blades > to be updated/configured with similar commands, but there would be > underlying scripting that takes care of different commands, firmware, > etc. for the different platforms.Thanks for the input.> 11) Going on the Step 5 in the Sun Documentation (Create configurations > for all nodes in the cluster)... > > 12) It might be nice to include a link to the actual OneSIS > documentation. This whole process is rather mysterious without it. > > 13) I changed my /etc/sunhpc.conf as follows. Note that this file > currently defaults to an old kernel. It needs to be manually changed to > the new one. Also, in my case, DHCPD_IF needed to be changed. > > DISTRO=centos-5.1 > ONESIS_IMAGE=/var/lib/oneSIS/image/centos-5.1 > DHCPD_IF=eth1 > KERNEL_VERSION=2.6.18-53.1.21.el5.sunhpc2 > UNUSED_PKGS="conman powerman ganglia-gmetad sunhpc-configuration" > > 14) Step 5.2 when you''re asked to put MAC addresses > in /etc/dhcp_hostlist is another step that could be automated for > various supported machines. E.g., SMASH commands like > "show /SYS/MB/NET0 fru_serial_number" for X6220s and "show /SYS/NICInfo0 > MacAddress1" on X6250s. Any plans for this? I think its critical on > large machines. My resultant file is: > > host b01 {hardware ethernet 00:14:4f:82:1e:02; fixed-address > 192.168.1.201;} > host b02 {hardware ethernet 00:1e:68:57:77:58; fixed-address > 192.168.1.202;}We will be visiting ideas on how better to approach this in upcoming releases.> 15) I see that dhcpd is configured in /etc/dhcpd_sunhpc.conf. Why not > the default dhcpd.conf? This is very confusing.Noted, we will investigate this.> 16) When running /usr/sbin/sunhpc_setup -i without a NETWORK directive > in /etc/sysconfig/network-scripts/ifcfg-eth0 there''s poor grammar in an > error message, that you might want to fixup: > > incompleted network information, please > check /etc/sysconfig/network-scripts/ifcfg-${DHCPD_IF}" > > I.e., s/incompleted/incomplete/We''ll get this fixed (giraffe bug to be filed).> 17) Possibly because of the "yum update" I did to get X11 working, > sunhpc_setup results in OneSIS giving an error message about a patch > reject in /etc/rc.d/rc.sysinit. I fixed it up manually - I think. More > testing is required: > > oneSIS: Warning! Patch failed or was only partially successful. > * * * patching file etc/rc.d/rc.sysinit > Hunk #1 FAILED at 702. > Hunk #2 succeeded at 736 (offset 24 lines). > Hunk #4 succeeded at 883 (offset 24 lines). > Hunk #6 succeeded at 968 (offset 24 lines). > 1 out of 6 hunks FAILED -- saving rejects to file /tmp/rejects > * * * The patch may have failed because it was updated, > outdated, > * * * or just defective. Support is available on the oneSIS > mailing > * * * list, onesis-users at lists.sourceforge.netWe''ll need to investigate which patch is failing and why. This is probably due to CentOS 5.2 being in place. (Giraffe bug to be filed.)> 18) It might be a good idea to explain exactly > what /usr/sbin/sunhpc_setup -i is doing. Especially, since there''s no > reference to the OneSIS documentation. What''s the ''-i'' for? Do I have > to provide that argument, if I''m generating a second image. I don''t > think there''s a way for me to figure this out without reading the script > line-by-line.Noted.> 19) Why does the documentation''s step 5.4.b call for turning off > iptables? If something doesn''t work with IP tables on, what is it? > Surely it''s possible to add some exceptions and keep this excellent > security mechanism in place. On most of my deployments, iptables is a > requirement on nodes that face the external network. On my testbed, I > simply kept it up on the eth0 interface of my head node and then allowed > all traffic on the eth1 which faced the rest of the cluster. If I was > responsible for the documentation, I''d be careful about advocating the > removal of a security feature, without further clarification and > explanation.In general we are approaching the system as fully trusted internally. And for this reason, we suggested turning off iptables. But you are correct that perhaps the documentation saying "turn it all off" might be a bit aggressive. We''ll change the verbage to make sure we only recommend turning off inwardly focussed network devices.> 20) I changed /tftpboot/pxelinux.cfg as follows. The most important > differences are a change to the default serial port for X6220s and > X6250s and the kernel version: > > PROMPT 1 > TIMEOUT 20 > IPAPPEND 2 > DEFAULT linux > label linux > kernel vmlinuz-2.6.18-53.1.21.el5.sunhpc2 > append root=/dev/nfs console=ttyS1,9600 > initrd=initrd-2.6.18-53.1.21.el5.sunhpc2.img selinux=0 > label rescue > kernel vmlinuz-2.6.18-53.1.21.el5.sunhpc2 > append root=/dev/nfs console=ttyS1,9600 rescue > initrd=initrd-2.6.18-53.1.21.el5.sunhpc2.img selinux=0With more automation, this will be easier to take care of in the future. Perhaps we''ll need some extra documentation until then.> 21) I much prefer using the pxelinux config technique that allows > pxelinux to dynamically select its configuration file rather than > hardcoding it in dhcpd.conf. See "Next, it will search for the config > file using its own IP address.." in http://syslinux.zytor.com/pxe.php > > As things currently stand, you have to either edit a PXE config file in > a strange way or edit dhcpd.conf and restart the dhcpd server to make a > node boot an alternative image. This doesn''t seem like a very scalable > practice. I want to be able to change the image that a node boots > (including the possibility for a local HD image) simply on the command > line for any combination of n nodes and m images - without needing to > restart dhcp server(s). As things are currently setup, this isn''t > possible.There is a debate on how best to handle this (especially when you start talking 1000+ nodes). Having 1000 files (perhaps x2 if you symlink between a MAC Address and a hostname) can sometimes lead into many accidental mistakes. We''ll look into some possible schemes for handling this (you are correct, restarting dhcpd isn''t necessarily a good thing, but since this only controls the booting of a system, it''s not necessarily a bad thing either).> 22) Which version of pxelinux are we running. I find this difficult to > determine: > > # rpm -qf /tftpboot/pxelinux.0 > file /tftpboot/pxelinux.0 is not owned by any packageThat''s a difficulty with just the way the syslinux people release their packages in general. pxelinux.0 is supplied by onesis.> 23) Regarding step 6 - Boot up the client nodes: > > a. The documentation calls for things like: > > ipmipower -W endianseq -h b01-sp -u root -p changeme --stat > ipmipower -W endianseq -h b01-sp -u root -p changeme --off > ipmipower -W endianseq -h b01-sp -u root -p changeme --on > > This works for X6220s, but not for X6250s. For X6250s you need: > > ipmipower -W authcap -h b02-sp -u root -p changeme --stat > ipmipower -W authcap -h b02-sp -u root -p changeme --off > ipmipower -W authcap -h b02-sp -u root -p changeme --on > > That is, a different IPMI workaround is required for Intel/ELOM > nodes.We''ll probably need to be more generic in this information, as this would be powerman and node specific.> b. I suggest setting up powerman at this point before going any > further and not confusing matters with ipmipower. To get powerman > working with the mixed IPMI workaround, I had to create > an /etc/powerman/powerman.conf like: > > include "/etc/powerman/ipmipower.dev" > > alias "all" "b01,b02" > > device "pow0" "ipmipower" "/usr/sbin/ipmipower -h b01-sp > --config /etc/ipmipower_amd.conf |&" > node "b01" "pow0" "b01-sp" > > device "pow1" "ipmipower" "/usr/sbin/ipmipower -h b02-sp > --config /etc/ipmipower_intel.conf |&" > node "b02" "pow1" "b02-sp" > > And then /etc/ipmipower_amd.conf like: > > hostname b01-sp > username root > password changeme > workaround-flags "endianseq" > on-if-off enable > wait-until-on enable > wait-until-off enable > > And finally /etc/ipmipower_intel.conf like: > > hostname b02-sp > username root > password changeme > workaround-flags "authcap" > on-if-off enable > wait-until-on enable > wait-until-off enableYou''re probably right, putting power- and conman steps earlier would be a good idea. These powerman changes should be pushed into powerman itself (giraffe bug to be filed).> 24) Step 7: Build SSH keys is ripe for a small shell script. No?Part of auto-configuration steps coming in future revs.> 25) I think that many users will want to get ConMan setup before > actually booting compute nodes, so you might want to move the > documentation''s section on ConMan up a few sections. To get ConMan to > work in a useful way, I needed to make some edits to /etc/conman.conf > that could maybe be there by default (e.g., server > logdir="/var/log/conman"). In addition, I had to make an expect script > to make things work with the unsupported X6250 nodes. Using > sun-ilom.exp as a base, I re-worked it to work with X6250s. It''s not > too well tested, but it''s attached as sun-elom.exp and seems to work.Again, you''re right. With sutomation, we''ll start providing default conman.conf files. The elom version should be pushed back into conman (giraffe bug to be filed).> 26) After ConMan is setup correctly, I was able to get OneSIS images > booted, except the images weren''t setup to have initab start serial > gettys after boot, so it was impossible to login via the serial console. > See http://www.vanemery.com/Linux/Serial/serial-console.html The > documentation might want to mention something about this.Two pieces are missing here, one is in /etc/inittab adding a s0 line. Generally kudzu should take care of this once it recognizes serial output, but we should take steps to force it (giraffe bug to be filed). Also, we should just check that ttyS0 is in the securetty file.> 27) I''m going to need diskfull installs. When will this functionality > be available?Cobbler is being worked into our system for the next versions. This will provide diskfull installs.> 28) How does the verification suite work? The documentation mentions > it, but I couldn''t figure out how to run it.This is a work in progress. More information when it becomes available.> 29) I''d like to see a parallel ping capability included (e.g., fping: > http://fping.sourceforge.net/)We''ll look into this. In the past, I''ve seen less usefulness from a parallel ping than just doing a "pdsh -a uptime | dshbak -c", which will hit all the nodes and actually run a command (which tells you that the system is online).> That''s it for now. Let me know if you need any clarification. I''m > going to be testing all of this in a boot-over-IB environment including > Lustre in the next week or so, so that experience will undoubtedly > generate some more feedback. > > -Matthew > > > > > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Linux_hpc_swstack mailing list > Linux_hpc_swstack at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack-- "A simile is not a lie, unless it is a bad simile." - Christopher John Francis Boone
Zhiqi Tao
2008-Sep-02 05:59 UTC
[Linux_hpc_swstack] Feedback on latest release of Sun HPC Software, Linux Edition
I lodged the following entries in giraffe bugzilla database. 16864 Build CentOS ISO using local ISO/DVD [This is an existing bug entry associate with giraffe.lustre issue] 16963 Add proper revision information on each stack release 16964 Missing root''s .bash* ini files after installation 16965 Improve out-of-box X11 support of software stack 16966 Grammar error in the error message of /usr/sbin/ sunhpc_setup 16967 Patch failing in oneSIS 16968 Set up power- and conman before ipmipower 16969 Setup conman.conf before actually booting compute nodes 16970 initab fails to start serial tty Thanks a lot for your valuable feedback! P.S. You are welcome to participate further. Simply create an account at https://bugzilla.lustre.org/. Best Regards, Zhiqi Makia Minich wrote:> First off, thanks for the comments; they''ll be quite helpful. My > comments will be strewn throughout. (Zhiqi, please look for bug notes > on things to put into bugzilla.) > > Matthew Bohnsack wrote: >> Hello. >> >> I recently tried out the latest release of Sun HPC Software, Linux >> Edition on a small testbed cluster and would like to share my >> experiences and give some feedback. >> >> I installed the software on a setup shown in the attached diagram - >> TestBed.pdf. Notable aspects of this cluster, include: >> >> * I used a Linux workstation on my desk running Fedora 9 to >> remotely manage the system. >> * There''s a SunX2100 with two Ethernet interfaces that I use as >> the cluster management/head node. One of the node''s interfaces >> connects to my corporate network and thereby my workstation, >> while the other connects to the cluster/mgmt network. I prefer >> that a head node like this has three network interfaces, so that >> cluster mgmt traffic (e.g., power control) can be isolated from >> other cluster traffic (e.g., NFS), but two interfaces seem fine >> for a small testbed. >> * There are two compute nodes. The first one is a Sun X6220 AMD >> blade and the second one is a Sun X6250 Intel blade. Both of >> these blades are running inside of a Sun 6000 blade chassis. >> * The blade chassis has a Chassis Management Module (CMM) that can >> be used to access blade Service Processors (SPs). This >> connectivity enables serial console, power control, and other >> systems management functionality. >> >> I used a CentOS5.2 machine to build a Sun HPC Software, Linux Edition >> ISO image as follows: >> >> # wget >> http://dlc.sun.com/linux_hpc/iso/tools/sunhpc-linux-iso-0.5-sunhpc2.noarch.rpm >> # rpm -Uvh sunhpc-linux-iso-0.5-sunhpc2.noarch.rpm >> # sunhpc-linux-iso >> >> Numbered notes on the rest of the process follow... >> >> 01) The first thing I notice is that there''s a tarball fetched that has >> no version information in it (sunhpc_base.tgz). I.e., >> >> Downloading ISO skeleton... >> --15:29:09-- http://dlc.sun.com/linux_hpc/iso/base/sunhpc_base.tgz >> >> Without version information, how can different releases create >> repeatably simmilar ISO images? I.e., >> sunhpc-linux-iso-0.1-sunhpc6.noarch.rpm referring to one versioned ISO >> skeleton file while sunhpc-linux-iso-0.1-sunhpc6.noarch.rpm refers to >> another. > > We''ll take a closer look at the versioning and make sure we get some > kind of standard across releases. The feeling is that we want to make > sure that the base of the ISO is always at the latest version (since > this does not necessarily affect what is on the ISO itself, but does > make sure that the ISO is functioning). There is some versioning that > can be done in the base as well as time stamps that tell us which > version is used. > >> 02) Taking the repeatability theme a little further, how is it possible >> to make a repeatable release, such that I can deploy exactly the same >> bits that Sun has tested, when yum might download newer or older RPM >> packages, depending on when sunhpc-linux-iso is run? This applies >> equally to repos containing CentOS RPMs and those containing Sun RPMS. > > The repo file used with the sunhpc-linux-iso script looks at the updates > directory for CentOS releases but only the 1.0 releases for SunHPC > tools. So, in this instance, if you create an ISO it is made up of all > the pieces from our stack and includes all the newest updates from the > CentOS base. Therefore, we know that the tools we provide are at levels > we''ve tested, and we rely on the CentOS teams to test their own > packages. With this idea, every ISO should have all the latest security > significant as released via CentOS. We could do some extra work to make > sure that this is more known. > >> 03) sunhpc-linux-iso-0.5-sunhpc2.noarch.rpm creates files >> in /etc/sunhpc-linux-iso/yum.repos.d/ that have invalid URLs of the form >> "http://giraffe.lustre/..." >> >> E.g., /etc/sunhpc-linux-iso/yum.repos.d/runhpc-centos5.repo: >> >> [sunhpc-centos5] >> name=SunHPC Packages for CentOS 5 >> baseurl=http://giraffe.lustre/dlc_stage/yum/sunhpc/1.0/c5/x86_64/ >> http://dlc.sun.com/linux_hpc/yum/sunhpc/1.0/c5/x86_64/ >> gpgcheck=0 >> enabled=1 >> >> This causes ugly errors like: >> >> Starting download for packages in sunhpc-centos5... >> Downloading RPMs with yumdownloader... >> sunhpc-centos5-updates 100% |=========================| 951 B >> 00:00 >> primary.xml.gz 100% |=========================| 301 kB >> 00:01 >> sunhpc-cen: ################################################## 605/605 >> >> http://giraffe.lustre/dlc_stage/yum/sunhpc/1.0/c5/x86_64/repodata/repomd.xml: >> [Errno 4] IOError: <urlopen error (-2, ''Name or service not known'')> >> >> What is this "giraffe.lustre", some kind of internal testing domain? Is >> it required in code that''s distributed to the public? > > This will get cleaned up. (Giraffe bug to be filed.) > >> 04) Even though the RPM version number was incremented with the recent >> release, the filename of the generated ISO was not, so it''s hard for me >> to tell the difference between resultant releases and documentation from >> one RPM revision to another. E.g., I have two files in my home >> directory named sun-linux-hpc-1.0-PREVIEW.iso. Which is which? I''m >> also puzzled by the naming convention of the documentation file. E.g., >> installation_guide_1.0_preview.pdf vs installation_guide_1.0.pdf. What >> will the title be in the next release? > > Documentation changes are being worked on. We''ll determine a standard > release name for them. As for the ISO names, thanks for the input, > we''ll look into it. > >> 05) I installed from the ISO on my Sun X2100 management/head node >> machine per installation_guide_1.0.pdf, selecting the Lustre software >> option, but not customizing any software packages. This resulted in a >> system running 2.6.18-53.1.21.el5.sunhpc2. >> >> 06) As is already reported in the release notes, a double reboot was >> required after the install, before the system could be used. >> >> 07) After installation, I noticed that root''s .bash* ini files were >> missing. I added them, consulting another machine''s files. This seems >> like bug. > > Not necessarily sure what''s wrong here. The root account should be > created via the standard installer''s method, so /etc/skel should have > been copied correctly. We''ll take a look and see how to fix this > (Giraffe bug to be filed). > >> 08) One of the first things I tried to do was ssh to my management node >> from my Linux workstation, so that I could access a target compute >> blade''s VGA console via the web management GUI with a remote instance of >> firefox. This didn''t work for multiple reasons: >> >> * A full X11 installation wasn''t done. >> * Ssh didn''t support setting up $DISPLAY. >> * There were two conflicting instances of firefox installed (32 >> and 64-bit). For the purposes of running Java applets required >> by the web GUI, the 32-bit firefox is needed. >> * The Java plugin for 32-bit firefox wasn''t installed. >> >> I can understand that some people might not want/need X11 or firefox, >> but for a management node it''s a requirement for me, and in most of my >> installations, it''s also required on the compute nodes for totalview, >> among other things. >> >> Here''s what I had to do to fix things: >> >> a. I found that I couldn''t install X11, until I did a yum update. >> b. To do a yum update, I had to exclude various RPMs in /etc/yum.conf: >> exclude=kernel* opensm* infiniband* >> c. yum update >> d. yum groupinstall -y "X Window System" >> e. rpm -e firefox.x86_64 >> f. rpm -e firefox.i386 >> g. install firefox.i386 >> h. I installed Sun JDK 1.5.0_15 in /opt/java/jdk1.5.0_15 and then >> created scripts in /etc/profile.d, such that... >> JAVA_HOME=/opt/java/jdk1.5.0_15 >> PATH=$JAVA_HOME/bin:$PATH >> k. cd /usr/lib/mozilla/plugns && ln >> -s /opt/java/jdk1.5.0_15/jre/plugin/i386/ns7/libjavaplugin_oji.so > > We will visit this and look at what is needed on our end to make sure > this works better. > >> 09) The documentation''s step #3 lists a table of nodes, IP addresses, >> and MAC addresses. While this kind of information is needed, and >> someone experienced with implementing and administering Linux clusters >> will be familiar with what''s going on, there''s little context to let a >> new user know why this information is important to them. If you''re >> going to have a sample table, I suggest, providing a lot more context. >> E.g., a picture and description of a sample cluster. On my clusters, I >> would like cluster IP addresses to be first put in a hosts file >> like /etc/hosts and then optionally automatically turned into DNS >> configuration files. >> >> My testbed /etc/hosts file follows: >> >> == BEGIN /etc/hosts =>> 127.0.0.1 localhost.localdomain localhost >> >> # Management/Head Node >> # ====================>> # Has two Ethernet interfaces: >> # eth0: connected to the corporate network >> # eth1: connected to the cluster/management network >> 10.0.0.101 cdsytestbed # eth0 >> 192.168.1.254 cdsytestbed-mgmt # eth1 >> >> # Management Devices >> # ====================>> 192.168.1.200 cmm1 >> 192.168.1.1 b01-sp >> 192.168.1.2 b02-sp >> >> # Compute Nodes >> # ====================>> 192.168.1.201 b01 >> 192.168.1.202 b02 >> == END /etc/hosts => > We will do some documentation reviews. > >> 10) Step 4, 2 seems innocuous, but I see this as a critically important, >> missing piece of functionality in the current release. IMHO, its >> absence is a showstopper on a large system. >> >> That is, >> >> "Set up management interfaces, such as the sun Integrated Lights Out >> Manager (ILOM) service sprocessors (SPs), to prepare to provision the >> systems." >> >> is simple enough to do manually on a small test cluster, but is >> untenable on a large system. I''ve developed tools (mostly Perl scripts >> that "expect" various SMASH CLI interfaces) to: >> >> a. Get the MAC address of the CMM via a serial connection. >> b. Access CMM over IP, set IP address and make other CMM settings. >> c. Update CMM firmware over IP (Assumes a TFTP server). >> d. Verify CMM firmware versions. >> e. Set IP addresses of SPs. >> f. Turn off blades. >> g. Update blade and SP firmware (CPLD and REM is still TODO). >> h. Turn on blades. >> i. Verify blade firmware versions. >> j. HCA flash to enable boot-over-IB (for a different Infiniband >> machine I''m working with). >> k. Collect IB link layer host IDs (for a different Infiniband machine >> I''m working with). >> >> Is there any interest is putting this stuff into the release or >> something like it? I''d like to see a toolkit to do this stuff in a >> pluggably general way. E.g., Tools that enable X6220 and X6250 blades >> to be updated/configured with similar commands, but there would be >> underlying scripting that takes care of different commands, firmware, >> etc. for the different platforms. > > Thanks for the input. > >> 11) Going on the Step 5 in the Sun Documentation (Create configurations >> for all nodes in the cluster)... >> >> 12) It might be nice to include a link to the actual OneSIS >> documentation. This whole process is rather mysterious without it. >> >> 13) I changed my /etc/sunhpc.conf as follows. Note that this file >> currently defaults to an old kernel. It needs to be manually changed to >> the new one. Also, in my case, DHCPD_IF needed to be changed. >> >> DISTRO=centos-5.1 >> ONESIS_IMAGE=/var/lib/oneSIS/image/centos-5.1 >> DHCPD_IF=eth1 >> KERNEL_VERSION=2.6.18-53.1.21.el5.sunhpc2 >> UNUSED_PKGS="conman powerman ganglia-gmetad sunhpc-configuration" >> >> 14) Step 5.2 when you''re asked to put MAC addresses >> in /etc/dhcp_hostlist is another step that could be automated for >> various supported machines. E.g., SMASH commands like >> "show /SYS/MB/NET0 fru_serial_number" for X6220s and "show /SYS/NICInfo0 >> MacAddress1" on X6250s. Any plans for this? I think its critical on >> large machines. My resultant file is: >> >> host b01 {hardware ethernet 00:14:4f:82:1e:02; fixed-address >> 192.168.1.201;} >> host b02 {hardware ethernet 00:1e:68:57:77:58; fixed-address >> 192.168.1.202;} > > We will be visiting ideas on how better to approach this in upcoming > releases. > >> 15) I see that dhcpd is configured in /etc/dhcpd_sunhpc.conf. Why not >> the default dhcpd.conf? This is very confusing. > > Noted, we will investigate this. > >> 16) When running /usr/sbin/sunhpc_setup -i without a NETWORK directive >> in /etc/sysconfig/network-scripts/ifcfg-eth0 there''s poor grammar in an >> error message, that you might want to fixup: >> >> incompleted network information, please >> check /etc/sysconfig/network-scripts/ifcfg-${DHCPD_IF}" >> >> I.e., s/incompleted/incomplete/ > > We''ll get this fixed (giraffe bug to be filed). > >> 17) Possibly because of the "yum update" I did to get X11 working, >> sunhpc_setup results in OneSIS giving an error message about a patch >> reject in /etc/rc.d/rc.sysinit. I fixed it up manually - I think. More >> testing is required: >> >> oneSIS: Warning! Patch failed or was only partially successful. >> * * * patching file etc/rc.d/rc.sysinit >> Hunk #1 FAILED at 702. >> Hunk #2 succeeded at 736 (offset 24 lines). >> Hunk #4 succeeded at 883 (offset 24 lines). >> Hunk #6 succeeded at 968 (offset 24 lines). >> 1 out of 6 hunks FAILED -- saving rejects to file /tmp/rejects >> * * * The patch may have failed because it was updated, >> outdated, >> * * * or just defective. Support is available on the oneSIS >> mailing >> * * * list, onesis-users at lists.sourceforge.net > > We''ll need to investigate which patch is failing and why. This is > probably due to CentOS 5.2 being in place. (Giraffe bug to be filed.) > >> 18) It might be a good idea to explain exactly >> what /usr/sbin/sunhpc_setup -i is doing. Especially, since there''s no >> reference to the OneSIS documentation. What''s the ''-i'' for? Do I have >> to provide that argument, if I''m generating a second image. I don''t >> think there''s a way for me to figure this out without reading the script >> line-by-line. > > Noted. > >> 19) Why does the documentation''s step 5.4.b call for turning off >> iptables? If something doesn''t work with IP tables on, what is it? >> Surely it''s possible to add some exceptions and keep this excellent >> security mechanism in place. On most of my deployments, iptables is a >> requirement on nodes that face the external network. On my testbed, I >> simply kept it up on the eth0 interface of my head node and then allowed >> all traffic on the eth1 which faced the rest of the cluster. If I was >> responsible for the documentation, I''d be careful about advocating the >> removal of a security feature, without further clarification and >> explanation. > > In general we are approaching the system as fully trusted internally. > And for this reason, we suggested turning off iptables. But you are > correct that perhaps the documentation saying "turn it all off" might be > a bit aggressive. We''ll change the verbage to make sure we only > recommend turning off inwardly focussed network devices. > >> 20) I changed /tftpboot/pxelinux.cfg as follows. The most important >> differences are a change to the default serial port for X6220s and >> X6250s and the kernel version: >> >> PROMPT 1 >> TIMEOUT 20 >> IPAPPEND 2 >> DEFAULT linux >> label linux >> kernel vmlinuz-2.6.18-53.1.21.el5.sunhpc2 >> append root=/dev/nfs console=ttyS1,9600 >> initrd=initrd-2.6.18-53.1.21.el5.sunhpc2.img selinux=0 >> label rescue >> kernel vmlinuz-2.6.18-53.1.21.el5.sunhpc2 >> append root=/dev/nfs console=ttyS1,9600 rescue >> initrd=initrd-2.6.18-53.1.21.el5.sunhpc2.img selinux=0 > > With more automation, this will be easier to take care of in the future. > Perhaps we''ll need some extra documentation until then. > >> 21) I much prefer using the pxelinux config technique that allows >> pxelinux to dynamically select its configuration file rather than >> hardcoding it in dhcpd.conf. See "Next, it will search for the config >> file using its own IP address.." in http://syslinux.zytor.com/pxe.php >> >> As things currently stand, you have to either edit a PXE config file in >> a strange way or edit dhcpd.conf and restart the dhcpd server to make a >> node boot an alternative image. This doesn''t seem like a very scalable >> practice. I want to be able to change the image that a node boots >> (including the possibility for a local HD image) simply on the command >> line for any combination of n nodes and m images - without needing to >> restart dhcp server(s). As things are currently setup, this isn''t >> possible. > > There is a debate on how best to handle this (especially when you start > talking 1000+ nodes). Having 1000 files (perhaps x2 if you symlink > between a MAC Address and a hostname) can sometimes lead into many > accidental mistakes. We''ll look into some possible schemes for handling > this (you are correct, restarting dhcpd isn''t necessarily a good thing, > but since this only controls the booting of a system, it''s not > necessarily a bad thing either). > >> 22) Which version of pxelinux are we running. I find this difficult to >> determine: >> >> # rpm -qf /tftpboot/pxelinux.0 >> file /tftpboot/pxelinux.0 is not owned by any package > > That''s a difficulty with just the way the syslinux people release their > packages in general. pxelinux.0 is supplied by onesis. > >> 23) Regarding step 6 - Boot up the client nodes: >> >> a. The documentation calls for things like: >> >> ipmipower -W endianseq -h b01-sp -u root -p changeme --stat >> ipmipower -W endianseq -h b01-sp -u root -p changeme --off >> ipmipower -W endianseq -h b01-sp -u root -p changeme --on >> >> This works for X6220s, but not for X6250s. For X6250s you need: >> >> ipmipower -W authcap -h b02-sp -u root -p changeme --stat >> ipmipower -W authcap -h b02-sp -u root -p changeme --off >> ipmipower -W authcap -h b02-sp -u root -p changeme --on >> >> That is, a different IPMI workaround is required for Intel/ELOM >> nodes. > > We''ll probably need to be more generic in this information, as this > would be powerman and node specific. > >> b. I suggest setting up powerman at this point before going any >> further and not confusing matters with ipmipower. To get powerman >> working with the mixed IPMI workaround, I had to create >> an /etc/powerman/powerman.conf like: >> >> include "/etc/powerman/ipmipower.dev" >> >> alias "all" "b01,b02" >> >> device "pow0" "ipmipower" "/usr/sbin/ipmipower -h b01-sp >> --config /etc/ipmipower_amd.conf |&" >> node "b01" "pow0" "b01-sp" >> >> device "pow1" "ipmipower" "/usr/sbin/ipmipower -h b02-sp >> --config /etc/ipmipower_intel.conf |&" >> node "b02" "pow1" "b02-sp" >> >> And then /etc/ipmipower_amd.conf like: >> >> hostname b01-sp >> username root >> password changeme >> workaround-flags "endianseq" >> on-if-off enable >> wait-until-on enable >> wait-until-off enable >> >> And finally /etc/ipmipower_intel.conf like: >> >> hostname b02-sp >> username root >> password changeme >> workaround-flags "authcap" >> on-if-off enable >> wait-until-on enable >> wait-until-off enable > > You''re probably right, putting power- and conman steps earlier would be > a good idea. These powerman changes should be pushed into powerman > itself (giraffe bug to be filed). > >> 24) Step 7: Build SSH keys is ripe for a small shell script. No? > > Part of auto-configuration steps coming in future revs. > >> 25) I think that many users will want to get ConMan setup before >> actually booting compute nodes, so you might want to move the >> documentation''s section on ConMan up a few sections. To get ConMan to >> work in a useful way, I needed to make some edits to /etc/conman.conf >> that could maybe be there by default (e.g., server >> logdir="/var/log/conman"). In addition, I had to make an expect script >> to make things work with the unsupported X6250 nodes. Using >> sun-ilom.exp as a base, I re-worked it to work with X6250s. It''s not >> too well tested, but it''s attached as sun-elom.exp and seems to work. > > Again, you''re right. With sutomation, we''ll start providing default > conman.conf files. The elom version should be pushed back into conman > (giraffe bug to be filed). > >> 26) After ConMan is setup correctly, I was able to get OneSIS images >> booted, except the images weren''t setup to have initab start serial >> gettys after boot, so it was impossible to login via the serial console. >> See http://www.vanemery.com/Linux/Serial/serial-console.html The >> documentation might want to mention something about this. > > Two pieces are missing here, one is in /etc/inittab adding a s0 line. > Generally kudzu should take care of this once it recognizes serial > output, but we should take steps to force it (giraffe bug to be filed). > Also, we should just check that ttyS0 is in the securetty file. > >> 27) I''m going to need diskfull installs. When will this functionality >> be available? > > Cobbler is being worked into our system for the next versions. This > will provide diskfull installs. > >> 28) How does the verification suite work? The documentation mentions >> it, but I couldn''t figure out how to run it. > > This is a work in progress. More information when it becomes available. > >> 29) I''d like to see a parallel ping capability included (e.g., fping: >> http://fping.sourceforge.net/) > > We''ll look into this. In the past, I''ve seen less usefulness from a > parallel ping than just doing a "pdsh -a uptime | dshbak -c", which will > hit all the nodes and actually run a command (which tells you that the > system is online). > >> That''s it for now. Let me know if you need any clarification. I''m >> going to be testing all of this in a boot-over-IB environment including >> Lustre in the next week or so, so that experience will undoubtedly >> generate some more feedback. >> >> -Matthew >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Linux_hpc_swstack mailing list >> Linux_hpc_swstack at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack >
Matthew Bohnsack
2008-Sep-02 14:05 UTC
[Linux_hpc_swstack] Feedback on latest release of Sun HPC Software, Linux Edition
Thank you Zhiqi. I''ve created a Bugzilla user for myself to enable further collaboration. -Matthew On Tue, 2008-09-02 at 15:59 +1000, Zhiqi Tao wrote:> I lodged the following entries in giraffe bugzilla database. > > 16864 Build CentOS ISO using local ISO/DVD [This is an existing bug > entry associate with giraffe.lustre issue] > > 16963 Add proper revision information on each stack release > > 16964 Missing root''s .bash* ini files after installation > > 16965 Improve out-of-box X11 support of software stack > > 16966 Grammar error in the error message of /usr/sbin/ sunhpc_setup > > 16967 Patch failing in oneSIS > > 16968 Set up power- and conman before ipmipower > > 16969 Setup conman.conf before actually booting compute nodes > > 16970 initab fails to start serial tty > > Thanks a lot for your valuable feedback! > > P.S. You are welcome to participate further. Simply create an account at > https://bugzilla.lustre.org/. > > Best Regards, > Zhiqi
Matthew Bohnsack
2008-Sep-02 14:51 UTC
[Linux_hpc_swstack] Feedback on latest release of Sun HPC Software, Linux Edition
Thank you very much for the follow up Makia. You don''t say much about #s 10 & 14. Are these to be considered as targets that e.g., will be tracked as enhancement requests in Bugzilla? Or are they being considered as only nice things to have sometime far in the future? -Matthew On Mon, 2008-09-01 at 16:13 -0600, Makia Minich wrote:> First off, thanks for the comments; they''ll be quite helpful. My > comments will be strewn throughout. (Zhiqi, please look for bug notes > on things to put into bugzilla.) > > Matthew Bohnsack wrote: > > Hello. > > > > I recently tried out the latest release of Sun HPC Software, Linux > > Edition on a small testbed cluster and would like to share my > > experiences and give some feedback. > > > > I installed the software on a setup shown in the attached diagram - > > TestBed.pdf. Notable aspects of this cluster, include: > > > > * I used a Linux workstation on my desk running Fedora 9 to > > remotely manage the system. > > * There''s a SunX2100 with two Ethernet interfaces that I use as > > the cluster management/head node. One of the node''s interfaces > > connects to my corporate network and thereby my workstation, > > while the other connects to the cluster/mgmt network. I prefer > > that a head node like this has three network interfaces, so that > > cluster mgmt traffic (e.g., power control) can be isolated from > > other cluster traffic (e.g., NFS), but two interfaces seem fine > > for a small testbed. > > * There are two compute nodes. The first one is a Sun X6220 AMD > > blade and the second one is a Sun X6250 Intel blade. Both of > > these blades are running inside of a Sun 6000 blade chassis. > > * The blade chassis has a Chassis Management Module (CMM) that can > > be used to access blade Service Processors (SPs). This > > connectivity enables serial console, power control, and other > > systems management functionality. > > > > I used a CentOS5.2 machine to build a Sun HPC Software, Linux Edition > > ISO image as follows: > > > > # wget > > http://dlc.sun.com/linux_hpc/iso/tools/sunhpc-linux-iso-0.5-sunhpc2.noarch.rpm > > # rpm -Uvh sunhpc-linux-iso-0.5-sunhpc2.noarch.rpm > > # sunhpc-linux-iso > > > > Numbered notes on the rest of the process follow... > > > > 01) The first thing I notice is that there''s a tarball fetched that has > > no version information in it (sunhpc_base.tgz). I.e., > > > > Downloading ISO skeleton... > > --15:29:09-- http://dlc.sun.com/linux_hpc/iso/base/sunhpc_base.tgz > > > > Without version information, how can different releases create > > repeatably simmilar ISO images? I.e., > > sunhpc-linux-iso-0.1-sunhpc6.noarch.rpm referring to one versioned ISO > > skeleton file while sunhpc-linux-iso-0.1-sunhpc6.noarch.rpm refers to > > another. > > We''ll take a closer look at the versioning and make sure we get some > kind of standard across releases. The feeling is that we want to make > sure that the base of the ISO is always at the latest version (since > this does not necessarily affect what is on the ISO itself, but does > make sure that the ISO is functioning). There is some versioning that > can be done in the base as well as time stamps that tell us which > version is used. > > > 02) Taking the repeatability theme a little further, how is it possible > > to make a repeatable release, such that I can deploy exactly the same > > bits that Sun has tested, when yum might download newer or older RPM > > packages, depending on when sunhpc-linux-iso is run? This applies > > equally to repos containing CentOS RPMs and those containing Sun RPMS. > > The repo file used with the sunhpc-linux-iso script looks at the updates > directory for CentOS releases but only the 1.0 releases for SunHPC > tools. So, in this instance, if you create an ISO it is made up of all > the pieces from our stack and includes all the newest updates from the > CentOS base. Therefore, we know that the tools we provide are at levels > we''ve tested, and we rely on the CentOS teams to test their own > packages. With this idea, every ISO should have all the latest security > significant as released via CentOS. We could do some extra work to make > sure that this is more known. > > > 03) sunhpc-linux-iso-0.5-sunhpc2.noarch.rpm creates files > > in /etc/sunhpc-linux-iso/yum.repos.d/ that have invalid URLs of the form > > "http://giraffe.lustre/..." > > > > E.g., /etc/sunhpc-linux-iso/yum.repos.d/runhpc-centos5.repo: > > > > [sunhpc-centos5] > > name=SunHPC Packages for CentOS 5 > > baseurl=http://giraffe.lustre/dlc_stage/yum/sunhpc/1.0/c5/x86_64/ > > http://dlc.sun.com/linux_hpc/yum/sunhpc/1.0/c5/x86_64/ > > gpgcheck=0 > > enabled=1 > > > > This causes ugly errors like: > > > > Starting download for packages in sunhpc-centos5... > > Downloading RPMs with yumdownloader... > > sunhpc-centos5-updates 100% |=========================| 951 B > > 00:00 > > primary.xml.gz 100% |=========================| 301 kB > > 00:01 > > sunhpc-cen: ################################################## 605/605 > > > > http://giraffe.lustre/dlc_stage/yum/sunhpc/1.0/c5/x86_64/repodata/repomd.xml: > > [Errno 4] IOError: <urlopen error (-2, ''Name or service not known'')> > > > > What is this "giraffe.lustre", some kind of internal testing domain? Is > > it required in code that''s distributed to the public? > > This will get cleaned up. (Giraffe bug to be filed.) > > > 04) Even though the RPM version number was incremented with the recent > > release, the filename of the generated ISO was not, so it''s hard for me > > to tell the difference between resultant releases and documentation from > > one RPM revision to another. E.g., I have two files in my home > > directory named sun-linux-hpc-1.0-PREVIEW.iso. Which is which? I''m > > also puzzled by the naming convention of the documentation file. E.g., > > installation_guide_1.0_preview.pdf vs installation_guide_1.0.pdf. What > > will the title be in the next release? > > Documentation changes are being worked on. We''ll determine a standard > release name for them. As for the ISO names, thanks for the input, > we''ll look into it. > > > 05) I installed from the ISO on my Sun X2100 management/head node > > machine per installation_guide_1.0.pdf, selecting the Lustre software > > option, but not customizing any software packages. This resulted in a > > system running 2.6.18-53.1.21.el5.sunhpc2. > > > > 06) As is already reported in the release notes, a double reboot was > > required after the install, before the system could be used. > > > > 07) After installation, I noticed that root''s .bash* ini files were > > missing. I added them, consulting another machine''s files. This seems > > like bug. > > Not necessarily sure what''s wrong here. The root account should be > created via the standard installer''s method, so /etc/skel should have > been copied correctly. We''ll take a look and see how to fix this > (Giraffe bug to be filed). > > > 08) One of the first things I tried to do was ssh to my management node > > from my Linux workstation, so that I could access a target compute > > blade''s VGA console via the web management GUI with a remote instance of > > firefox. This didn''t work for multiple reasons: > > > > * A full X11 installation wasn''t done. > > * Ssh didn''t support setting up $DISPLAY. > > * There were two conflicting instances of firefox installed (32 > > and 64-bit). For the purposes of running Java applets required > > by the web GUI, the 32-bit firefox is needed. > > * The Java plugin for 32-bit firefox wasn''t installed. > > > > I can understand that some people might not want/need X11 or firefox, > > but for a management node it''s a requirement for me, and in most of my > > installations, it''s also required on the compute nodes for totalview, > > among other things. > > > > Here''s what I had to do to fix things: > > > > a. I found that I couldn''t install X11, until I did a yum update. > > b. To do a yum update, I had to exclude various RPMs in /etc/yum.conf: > > exclude=kernel* opensm* infiniband* > > c. yum update > > d. yum groupinstall -y "X Window System" > > e. rpm -e firefox.x86_64 > > f. rpm -e firefox.i386 > > g. install firefox.i386 > > h. I installed Sun JDK 1.5.0_15 in /opt/java/jdk1.5.0_15 and then > > created scripts in /etc/profile.d, such that... > > JAVA_HOME=/opt/java/jdk1.5.0_15 > > PATH=$JAVA_HOME/bin:$PATH > > k. cd /usr/lib/mozilla/plugns && ln > > -s /opt/java/jdk1.5.0_15/jre/plugin/i386/ns7/libjavaplugin_oji.so > > We will visit this and look at what is needed on our end to make sure > this works better. > > > 09) The documentation''s step #3 lists a table of nodes, IP addresses, > > and MAC addresses. While this kind of information is needed, and > > someone experienced with implementing and administering Linux clusters > > will be familiar with what''s going on, there''s little context to let a > > new user know why this information is important to them. If you''re > > going to have a sample table, I suggest, providing a lot more context. > > E.g., a picture and description of a sample cluster. On my clusters, I > > would like cluster IP addresses to be first put in a hosts file > > like /etc/hosts and then optionally automatically turned into DNS > > configuration files. > > > > My testbed /etc/hosts file follows: > > > > == BEGIN /etc/hosts => > 127.0.0.1 localhost.localdomain localhost > > > > # Management/Head Node > > # ====================> > # Has two Ethernet interfaces: > > # eth0: connected to the corporate network > > # eth1: connected to the cluster/management network > > 10.0.0.101 cdsytestbed # eth0 > > 192.168.1.254 cdsytestbed-mgmt # eth1 > > > > # Management Devices > > # ====================> > 192.168.1.200 cmm1 > > 192.168.1.1 b01-sp > > 192.168.1.2 b02-sp > > > > # Compute Nodes > > # ====================> > 192.168.1.201 b01 > > 192.168.1.202 b02 > > == END /etc/hosts => > We will do some documentation reviews. > > > 10) Step 4, 2 seems innocuous, but I see this as a critically important, > > missing piece of functionality in the current release. IMHO, its > > absence is a showstopper on a large system. > > > > That is, > > > > "Set up management interfaces, such as the sun Integrated Lights Out > > Manager (ILOM) service sprocessors (SPs), to prepare to provision the > > systems." > > > > is simple enough to do manually on a small test cluster, but is > > untenable on a large system. I''ve developed tools (mostly Perl scripts > > that "expect" various SMASH CLI interfaces) to: > > > > a. Get the MAC address of the CMM via a serial connection. > > b. Access CMM over IP, set IP address and make other CMM settings. > > c. Update CMM firmware over IP (Assumes a TFTP server). > > d. Verify CMM firmware versions. > > e. Set IP addresses of SPs. > > f. Turn off blades. > > g. Update blade and SP firmware (CPLD and REM is still TODO). > > h. Turn on blades. > > i. Verify blade firmware versions. > > j. HCA flash to enable boot-over-IB (for a different Infiniband > > machine I''m working with). > > k. Collect IB link layer host IDs (for a different Infiniband machine > > I''m working with). > > > > Is there any interest is putting this stuff into the release or > > something like it? I''d like to see a toolkit to do this stuff in a > > pluggably general way. E.g., Tools that enable X6220 and X6250 blades > > to be updated/configured with similar commands, but there would be > > underlying scripting that takes care of different commands, firmware, > > etc. for the different platforms. > > Thanks for the input. > > > 11) Going on the Step 5 in the Sun Documentation (Create configurations > > for all nodes in the cluster)... > > > > 12) It might be nice to include a link to the actual OneSIS > > documentation. This whole process is rather mysterious without it. > > > > 13) I changed my /etc/sunhpc.conf as follows. Note that this file > > currently defaults to an old kernel. It needs to be manually changed to > > the new one. Also, in my case, DHCPD_IF needed to be changed. > > > > DISTRO=centos-5.1 > > ONESIS_IMAGE=/var/lib/oneSIS/image/centos-5.1 > > DHCPD_IF=eth1 > > KERNEL_VERSION=2.6.18-53.1.21.el5.sunhpc2 > > UNUSED_PKGS="conman powerman ganglia-gmetad sunhpc-configuration" > > > > 14) Step 5.2 when you''re asked to put MAC addresses > > in /etc/dhcp_hostlist is another step that could be automated for > > various supported machines. E.g., SMASH commands like > > "show /SYS/MB/NET0 fru_serial_number" for X6220s and "show /SYS/NICInfo0 > > MacAddress1" on X6250s. Any plans for this? I think its critical on > > large machines. My resultant file is: > > > > host b01 {hardware ethernet 00:14:4f:82:1e:02; fixed-address > > 192.168.1.201;} > > host b02 {hardware ethernet 00:1e:68:57:77:58; fixed-address > > 192.168.1.202;} > > We will be visiting ideas on how better to approach this in upcoming > releases. > > > 15) I see that dhcpd is configured in /etc/dhcpd_sunhpc.conf. Why not > > the default dhcpd.conf? This is very confusing. > > Noted, we will investigate this. > > > 16) When running /usr/sbin/sunhpc_setup -i without a NETWORK directive > > in /etc/sysconfig/network-scripts/ifcfg-eth0 there''s poor grammar in an > > error message, that you might want to fixup: > > > > incompleted network information, please > > check /etc/sysconfig/network-scripts/ifcfg-${DHCPD_IF}" > > > > I.e., s/incompleted/incomplete/ > > We''ll get this fixed (giraffe bug to be filed). > > > 17) Possibly because of the "yum update" I did to get X11 working, > > sunhpc_setup results in OneSIS giving an error message about a patch > > reject in /etc/rc.d/rc.sysinit. I fixed it up manually - I think. More > > testing is required: > > > > oneSIS: Warning! Patch failed or was only partially successful. > > * * * patching file etc/rc.d/rc.sysinit > > Hunk #1 FAILED at 702. > > Hunk #2 succeeded at 736 (offset 24 lines). > > Hunk #4 succeeded at 883 (offset 24 lines). > > Hunk #6 succeeded at 968 (offset 24 lines). > > 1 out of 6 hunks FAILED -- saving rejects to file /tmp/rejects > > * * * The patch may have failed because it was updated, > > outdated, > > * * * or just defective. Support is available on the oneSIS > > mailing > > * * * list, onesis-users at lists.sourceforge.net > > We''ll need to investigate which patch is failing and why. This is > probably due to CentOS 5.2 being in place. (Giraffe bug to be filed.) > > > 18) It might be a good idea to explain exactly > > what /usr/sbin/sunhpc_setup -i is doing. Especially, since there''s no > > reference to the OneSIS documentation. What''s the ''-i'' for? Do I have > > to provide that argument, if I''m generating a second image. I don''t > > think there''s a way for me to figure this out without reading the script > > line-by-line. > > Noted. > > > 19) Why does the documentation''s step 5.4.b call for turning off > > iptables? If something doesn''t work with IP tables on, what is it? > > Surely it''s possible to add some exceptions and keep this excellent > > security mechanism in place. On most of my deployments, iptables is a > > requirement on nodes that face the external network. On my testbed, I > > simply kept it up on the eth0 interface of my head node and then allowed > > all traffic on the eth1 which faced the rest of the cluster. If I was > > responsible for the documentation, I''d be careful about advocating the > > removal of a security feature, without further clarification and > > explanation. > > In general we are approaching the system as fully trusted internally. > And for this reason, we suggested turning off iptables. But you are > correct that perhaps the documentation saying "turn it all off" might be > a bit aggressive. We''ll change the verbage to make sure we only > recommend turning off inwardly focussed network devices. > > > 20) I changed /tftpboot/pxelinux.cfg as follows. The most important > > differences are a change to the default serial port for X6220s and > > X6250s and the kernel version: > > > > PROMPT 1 > > TIMEOUT 20 > > IPAPPEND 2 > > DEFAULT linux > > label linux > > kernel vmlinuz-2.6.18-53.1.21.el5.sunhpc2 > > append root=/dev/nfs console=ttyS1,9600 > > initrd=initrd-2.6.18-53.1.21.el5.sunhpc2.img selinux=0 > > label rescue > > kernel vmlinuz-2.6.18-53.1.21.el5.sunhpc2 > > append root=/dev/nfs console=ttyS1,9600 rescue > > initrd=initrd-2.6.18-53.1.21.el5.sunhpc2.img selinux=0 > > With more automation, this will be easier to take care of in the future. > Perhaps we''ll need some extra documentation until then. > > > 21) I much prefer using the pxelinux config technique that allows > > pxelinux to dynamically select its configuration file rather than > > hardcoding it in dhcpd.conf. See "Next, it will search for the config > > file using its own IP address.." in http://syslinux.zytor.com/pxe.php > > > > As things currently stand, you have to either edit a PXE config file in > > a strange way or edit dhcpd.conf and restart the dhcpd server to make a > > node boot an alternative image. This doesn''t seem like a very scalable > > practice. I want to be able to change the image that a node boots > > (including the possibility for a local HD image) simply on the command > > line for any combination of n nodes and m images - without needing to > > restart dhcp server(s). As things are currently setup, this isn''t > > possible. > > There is a debate on how best to handle this (especially when you start > talking 1000+ nodes). Having 1000 files (perhaps x2 if you symlink > between a MAC Address and a hostname) can sometimes lead into many > accidental mistakes. We''ll look into some possible schemes for handling > this (you are correct, restarting dhcpd isn''t necessarily a good thing, > but since this only controls the booting of a system, it''s not > necessarily a bad thing either). > > > 22) Which version of pxelinux are we running. I find this difficult to > > determine: > > > > # rpm -qf /tftpboot/pxelinux.0 > > file /tftpboot/pxelinux.0 is not owned by any package > > That''s a difficulty with just the way the syslinux people release their > packages in general. pxelinux.0 is supplied by onesis. > > > 23) Regarding step 6 - Boot up the client nodes: > > > > a. The documentation calls for things like: > > > > ipmipower -W endianseq -h b01-sp -u root -p changeme --stat > > ipmipower -W endianseq -h b01-sp -u root -p changeme --off > > ipmipower -W endianseq -h b01-sp -u root -p changeme --on > > > > This works for X6220s, but not for X6250s. For X6250s you need: > > > > ipmipower -W authcap -h b02-sp -u root -p changeme --stat > > ipmipower -W authcap -h b02-sp -u root -p changeme --off > > ipmipower -W authcap -h b02-sp -u root -p changeme --on > > > > That is, a different IPMI workaround is required for Intel/ELOM > > nodes. > > We''ll probably need to be more generic in this information, as this > would be powerman and node specific. > > > b. I suggest setting up powerman at this point before going any > > further and not confusing matters with ipmipower. To get powerman > > working with the mixed IPMI workaround, I had to create > > an /etc/powerman/powerman.conf like: > > > > include "/etc/powerman/ipmipower.dev" > > > > alias "all" "b01,b02" > > > > device "pow0" "ipmipower" "/usr/sbin/ipmipower -h b01-sp > > --config /etc/ipmipower_amd.conf |&" > > node "b01" "pow0" "b01-sp" > > > > device "pow1" "ipmipower" "/usr/sbin/ipmipower -h b02-sp > > --config /etc/ipmipower_intel.conf |&" > > node "b02" "pow1" "b02-sp" > > > > And then /etc/ipmipower_amd.conf like: > > > > hostname b01-sp > > username root > > password changeme > > workaround-flags "endianseq" > > on-if-off enable > > wait-until-on enable > > wait-until-off enable > > > > And finally /etc/ipmipower_intel.conf like: > > > > hostname b02-sp > > username root > > password changeme > > workaround-flags "authcap" > > on-if-off enable > > wait-until-on enable > > wait-until-off enable > > You''re probably right, putting power- and conman steps earlier would be > a good idea. These powerman changes should be pushed into powerman > itself (giraffe bug to be filed). > > > 24) Step 7: Build SSH keys is ripe for a small shell script. No? > > Part of auto-configuration steps coming in future revs. > > > 25) I think that many users will want to get ConMan setup before > > actually booting compute nodes, so you might want to move the > > documentation''s section on ConMan up a few sections. To get ConMan to > > work in a useful way, I needed to make some edits to /etc/conman.conf > > that could maybe be there by default (e.g., server > > logdir="/var/log/conman"). In addition, I had to make an expect script > > to make things work with the unsupported X6250 nodes. Using > > sun-ilom.exp as a base, I re-worked it to work with X6250s. It''s not > > too well tested, but it''s attached as sun-elom.exp and seems to work. > > Again, you''re right. With sutomation, we''ll start providing default > conman.conf files. The elom version should be pushed back into conman > (giraffe bug to be filed). > > > 26) After ConMan is setup correctly, I was able to get OneSIS images > > booted, except the images weren''t setup to have initab start serial > > gettys after boot, so it was impossible to login via the serial console. > > See http://www.vanemery.com/Linux/Serial/serial-console.html The > > documentation might want to mention something about this. > > Two pieces are missing here, one is in /etc/inittab adding a s0 line. > Generally kudzu should take care of this once it recognizes serial > output, but we should take steps to force it (giraffe bug to be filed). > Also, we should just check that ttyS0 is in the securetty file. > > > 27) I''m going to need diskfull installs. When will this functionality > > be available? > > Cobbler is being worked into our system for the next versions. This > will provide diskfull installs. > > > 28) How does the verification suite work? The documentation mentions > > it, but I couldn''t figure out how to run it. > > This is a work in progress. More information when it becomes available. > > > 29) I''d like to see a parallel ping capability included (e.g., fping: > > http://fping.sourceforge.net/) > > We''ll look into this. In the past, I''ve seen less usefulness from a > parallel ping than just doing a "pdsh -a uptime | dshbak -c", which will > hit all the nodes and actually run a command (which tells you that the > system is online). > > > That''s it for now. Let me know if you need any clarification. I''m > > going to be testing all of this in a boot-over-IB environment including > > Lustre in the next week or so, so that experience will undoubtedly > > generate some more feedback. > > > > -Matthew > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Linux_hpc_swstack mailing list > > Linux_hpc_swstack at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack >
Makia Minich
2008-Sep-02 15:06 UTC
[Linux_hpc_swstack] Feedback on latest release of Sun HPC Software, Linux Edition
These were lightly mentioned because they are something that needs to be planned out better with the overall structure. You''re right that they should be turned into trackers (Zhiqi, can you take care of this?). "A simile is not a lie, unless it is a bad simile." - Christopher John Francis Boone On Sep 2, 2008, at 8:51 AM, Matthew Bohnsack <bohnsack at cdsinc.com> wrote:> Thank you very much for the follow up Makia. > > You don''t say much about #s 10 & 14. Are these to be considered as > targets that e.g., will be tracked as enhancement requests in > Bugzilla? > Or are they being considered as only nice things to have sometime > far in > the future? > > -Matthew > > On Mon, 2008-09-01 at 16:13 -0600, Makia Minich wrote: >> First off, thanks for the comments; they''ll be quite helpful. My >> comments will be strewn throughout. (Zhiqi, please look for bug >> notes >> on things to put into bugzilla.) >> >> Matthew Bohnsack wrote: >>> Hello. >>> >>> I recently tried out the latest release of Sun HPC Software, Linux >>> Edition on a small testbed cluster and would like to share my >>> experiences and give some feedback. >>> >>> I installed the software on a setup shown in the attached diagram - >>> TestBed.pdf. Notable aspects of this cluster, include: >>> >>> * I used a Linux workstation on my desk running Fedora 9 to >>> remotely manage the system. >>> * There''s a SunX2100 with two Ethernet interfaces that I use as >>> the cluster management/head node. One of the node''s >>> interfaces >>> connects to my corporate network and thereby my workstation, >>> while the other connects to the cluster/mgmt network. I >>> prefer >>> that a head node like this has three network interfaces, so >>> that >>> cluster mgmt traffic (e.g., power control) can be isolated >>> from >>> other cluster traffic (e.g., NFS), but two interfaces seem >>> fine >>> for a small testbed. >>> * There are two compute nodes. The first one is a Sun X6220 >>> AMD >>> blade and the second one is a Sun X6250 Intel blade. Both of >>> these blades are running inside of a Sun 6000 blade chassis. >>> * The blade chassis has a Chassis Management Module (CMM) >>> that can >>> be used to access blade Service Processors (SPs). This >>> connectivity enables serial console, power control, and other >>> systems management functionality. >>> >>> I used a CentOS5.2 machine to build a Sun HPC Software, Linux >>> Edition >>> ISO image as follows: >>> >>> # wget >>> http://dlc.sun.com/linux_hpc/iso/tools/sunhpc-linux-iso-0.5-sunhpc2.noarch.rpm >>> # rpm -Uvh sunhpc-linux-iso-0.5-sunhpc2.noarch.rpm >>> # sunhpc-linux-iso >>> >>> Numbered notes on the rest of the process follow... >>> >>> 01) The first thing I notice is that there''s a tarball fetched >>> that has >>> no version information in it (sunhpc_base.tgz). I.e., >>> >>> Downloading ISO skeleton... >>> --15:29:09-- http://dlc.sun.com/linux_hpc/iso/base/sunhpc_base.tgz >>> >>> Without version information, how can different releases create >>> repeatably simmilar ISO images? I.e., >>> sunhpc-linux-iso-0.1-sunhpc6.noarch.rpm referring to one versioned >>> ISO >>> skeleton file while sunhpc-linux-iso-0.1-sunhpc6.noarch.rpm refers >>> to >>> another. >> >> We''ll take a closer look at the versioning and make sure we get some >> kind of standard across releases. The feeling is that we want to >> make >> sure that the base of the ISO is always at the latest version (since >> this does not necessarily affect what is on the ISO itself, but does >> make sure that the ISO is functioning). There is some versioning >> that >> can be done in the base as well as time stamps that tell us which >> version is used. >> >>> 02) Taking the repeatability theme a little further, how is it >>> possible >>> to make a repeatable release, such that I can deploy exactly the >>> same >>> bits that Sun has tested, when yum might download newer or older RPM >>> packages, depending on when sunhpc-linux-iso is run? This applies >>> equally to repos containing CentOS RPMs and those containing Sun >>> RPMS. >> >> The repo file used with the sunhpc-linux-iso script looks at the >> updates >> directory for CentOS releases but only the 1.0 releases for SunHPC >> tools. So, in this instance, if you create an ISO it is made up of >> all >> the pieces from our stack and includes all the newest updates from >> the >> CentOS base. Therefore, we know that the tools we provide are at >> levels >> we''ve tested, and we rely on the CentOS teams to test their own >> packages. With this idea, every ISO should have all the latest >> security >> significant as released via CentOS. We could do some extra work to >> make >> sure that this is more known. >> >>> 03) sunhpc-linux-iso-0.5-sunhpc2.noarch.rpm creates files >>> in /etc/sunhpc-linux-iso/yum.repos.d/ that have invalid URLs of >>> the form >>> "http://giraffe.lustre/..." >>> >>> E.g., /etc/sunhpc-linux-iso/yum.repos.d/runhpc-centos5.repo: >>> >>> [sunhpc-centos5] >>> name=SunHPC Packages for CentOS 5 >>> baseurl=http://giraffe.lustre/dlc_stage/yum/sunhpc/1.0/c5/x86_64/ >>> http://dlc.sun.com/linux_hpc/yum/sunhpc/1.0/c5/x86_64/ >>> gpgcheck=0 >>> enabled=1 >>> >>> This causes ugly errors like: >>> >>> Starting download for packages in sunhpc-centos5... >>> Downloading RPMs with yumdownloader... >>> sunhpc-centos5-updates 100% |=========================| 951 B >>> 00:00 >>> primary.xml.gz 100% |=========================| 301 kB >>> 00:01 >>> sunhpc-cen: ################################################## >>> 605/605 >>> >>> http://giraffe.lustre/dlc_stage/yum/sunhpc/1.0/c5/x86_64/repodata/repomd.xml >>> : >>> [Errno 4] IOError: <urlopen error (-2, ''Name or service not >>> known'')> >>> >>> What is this "giraffe.lustre", some kind of internal testing >>> domain? Is >>> it required in code that''s distributed to the public? >> >> This will get cleaned up. (Giraffe bug to be filed.) >> >>> 04) Even though the RPM version number was incremented with the >>> recent >>> release, the filename of the generated ISO was not, so it''s hard >>> for me >>> to tell the difference between resultant releases and >>> documentation from >>> one RPM revision to another. E.g., I have two files in my home >>> directory named sun-linux-hpc-1.0-PREVIEW.iso. Which is which? I''m >>> also puzzled by the naming convention of the documentation file. >>> E.g., >>> installation_guide_1.0_preview.pdf vs installation_guide_1.0.pdf. >>> What >>> will the title be in the next release? >> >> Documentation changes are being worked on. We''ll determine a >> standard >> release name for them. As for the ISO names, thanks for the input, >> we''ll look into it. >> >>> 05) I installed from the ISO on my Sun X2100 management/head node >>> machine per installation_guide_1.0.pdf, selecting the Lustre >>> software >>> option, but not customizing any software packages. This resulted >>> in a >>> system running 2.6.18-53.1.21.el5.sunhpc2. >>> >>> 06) As is already reported in the release notes, a double reboot was >>> required after the install, before the system could be used. >>> >>> 07) After installation, I noticed that root''s .bash* ini files were >>> missing. I added them, consulting another machine''s files. This >>> seems >>> like bug. >> >> Not necessarily sure what''s wrong here. The root account should be >> created via the standard installer''s method, so /etc/skel should have >> been copied correctly. We''ll take a look and see how to fix this >> (Giraffe bug to be filed). >> >>> 08) One of the first things I tried to do was ssh to my management >>> node >>> from my Linux workstation, so that I could access a target compute >>> blade''s VGA console via the web management GUI with a remote >>> instance of >>> firefox. This didn''t work for multiple reasons: >>> >>> * A full X11 installation wasn''t done. >>> * Ssh didn''t support setting up $DISPLAY. >>> * There were two conflicting instances of firefox installed (32 >>> and 64-bit). For the purposes of running Java applets >>> required >>> by the web GUI, the 32-bit firefox is needed. >>> * The Java plugin for 32-bit firefox wasn''t installed. >>> >>> I can understand that some people might not want/need X11 or >>> firefox, >>> but for a management node it''s a requirement for me, and in most >>> of my >>> installations, it''s also required on the compute nodes for >>> totalview, >>> among other things. >>> >>> Here''s what I had to do to fix things: >>> >>> a. I found that I couldn''t install X11, until I did a yum update. >>> b. To do a yum update, I had to exclude various RPMs in /etc/ >>> yum.conf: >>> exclude=kernel* opensm* infiniband* >>> c. yum update >>> d. yum groupinstall -y "X Window System" >>> e. rpm -e firefox.x86_64 >>> f. rpm -e firefox.i386 >>> g. install firefox.i386 >>> h. I installed Sun JDK 1.5.0_15 in /opt/java/jdk1.5.0_15 and then >>> created scripts in /etc/profile.d, such that... >>> JAVA_HOME=/opt/java/jdk1.5.0_15 >>> PATH=$JAVA_HOME/bin:$PATH >>> k. cd /usr/lib/mozilla/plugns && ln >>> -s /opt/java/jdk1.5.0_15/jre/plugin/i386/ns7/libjavaplugin_oji.so >> >> We will visit this and look at what is needed on our end to make sure >> this works better. >> >>> 09) The documentation''s step #3 lists a table of nodes, IP >>> addresses, >>> and MAC addresses. While this kind of information is needed, and >>> someone experienced with implementing and administering Linux >>> clusters >>> will be familiar with what''s going on, there''s little context to >>> let a >>> new user know why this information is important to them. If you''re >>> going to have a sample table, I suggest, providing a lot more >>> context. >>> E.g., a picture and description of a sample cluster. On my >>> clusters, I >>> would like cluster IP addresses to be first put in a hosts file >>> like /etc/hosts and then optionally automatically turned into DNS >>> configuration files. >>> >>> My testbed /etc/hosts file follows: >>> >>> == BEGIN /etc/hosts =>>> 127.0.0.1 localhost.localdomain localhost >>> >>> # Management/Head Node >>> # ====================>>> # Has two Ethernet interfaces: >>> # eth0: connected to the corporate network >>> # eth1: connected to the cluster/management network >>> 10.0.0.101 cdsytestbed # eth0 >>> 192.168.1.254 cdsytestbed-mgmt # eth1 >>> >>> # Management Devices >>> # ====================>>> 192.168.1.200 cmm1 >>> 192.168.1.1 b01-sp >>> 192.168.1.2 b02-sp >>> >>> # Compute Nodes >>> # ====================>>> 192.168.1.201 b01 >>> 192.168.1.202 b02 >>> == END /etc/hosts =>> >> We will do some documentation reviews. >> >>> 10) Step 4, 2 seems innocuous, but I see this as a critically >>> important, >>> missing piece of functionality in the current release. IMHO, its >>> absence is a showstopper on a large system. >>> >>> That is, >>> >>> "Set up management interfaces, such as the sun Integrated Lights >>> Out >>> Manager (ILOM) service sprocessors (SPs), to prepare to provision >>> the >>> systems." >>> >>> is simple enough to do manually on a small test cluster, but is >>> untenable on a large system. I''ve developed tools (mostly Perl >>> scripts >>> that "expect" various SMASH CLI interfaces) to: >>> >>> a. Get the MAC address of the CMM via a serial connection. >>> b. Access CMM over IP, set IP address and make other CMM settings. >>> c. Update CMM firmware over IP (Assumes a TFTP server). >>> d. Verify CMM firmware versions. >>> e. Set IP addresses of SPs. >>> f. Turn off blades. >>> g. Update blade and SP firmware (CPLD and REM is still TODO). >>> h. Turn on blades. >>> i. Verify blade firmware versions. >>> j. HCA flash to enable boot-over-IB (for a different Infiniband >>> machine I''m working with). >>> k. Collect IB link layer host IDs (for a different Infiniband >>> machine >>> I''m working with). >>> >>> Is there any interest is putting this stuff into the release or >>> something like it? I''d like to see a toolkit to do this stuff in a >>> pluggably general way. E.g., Tools that enable X6220 and X6250 >>> blades >>> to be updated/configured with similar commands, but there would be >>> underlying scripting that takes care of different commands, >>> firmware, >>> etc. for the different platforms. >> >> Thanks for the input. >> >>> 11) Going on the Step 5 in the Sun Documentation (Create >>> configurations >>> for all nodes in the cluster)... >>> >>> 12) It might be nice to include a link to the actual OneSIS >>> documentation. This whole process is rather mysterious without it. >>> >>> 13) I changed my /etc/sunhpc.conf as follows. Note that this file >>> currently defaults to an old kernel. It needs to be manually >>> changed to >>> the new one. Also, in my case, DHCPD_IF needed to be changed. >>> >>> DISTRO=centos-5.1 >>> ONESIS_IMAGE=/var/lib/oneSIS/image/centos-5.1 >>> DHCPD_IF=eth1 >>> KERNEL_VERSION=2.6.18-53.1.21.el5.sunhpc2 >>> UNUSED_PKGS="conman powerman ganglia-gmetad sunhpc-configuration" >>> >>> 14) Step 5.2 when you''re asked to put MAC addresses >>> in /etc/dhcp_hostlist is another step that could be automated for >>> various supported machines. E.g., SMASH commands like >>> "show /SYS/MB/NET0 fru_serial_number" for X6220s and "show /SYS/ >>> NICInfo0 >>> MacAddress1" on X6250s. Any plans for this? I think its critical >>> on >>> large machines. My resultant file is: >>> >>> host b01 {hardware ethernet 00:14:4f:82:1e:02; fixed-address >>> 192.168.1.201;} >>> host b02 {hardware ethernet 00:1e:68:57:77:58; fixed-address >>> 192.168.1.202;} >> >> We will be visiting ideas on how better to approach this in upcoming >> releases. >> >>> 15) I see that dhcpd is configured in /etc/dhcpd_sunhpc.conf. Why >>> not >>> the default dhcpd.conf? This is very confusing. >> >> Noted, we will investigate this. >> >>> 16) When running /usr/sbin/sunhpc_setup -i without a NETWORK >>> directive >>> in /etc/sysconfig/network-scripts/ifcfg-eth0 there''s poor grammar >>> in an >>> error message, that you might want to fixup: >>> >>> incompleted network information, please >>> check /etc/sysconfig/network-scripts/ifcfg-${DHCPD_IF}" >>> >>> I.e., s/incompleted/incomplete/ >> >> We''ll get this fixed (giraffe bug to be filed). >> >>> 17) Possibly because of the "yum update" I did to get X11 working, >>> sunhpc_setup results in OneSIS giving an error message about a patch >>> reject in /etc/rc.d/rc.sysinit. I fixed it up manually - I >>> think. More >>> testing is required: >>> >>> oneSIS: Warning! Patch failed or was only partially >>> successful. >>> * * * patching file etc/rc.d/rc.sysinit >>> Hunk #1 FAILED at 702. >>> Hunk #2 succeeded at 736 (offset 24 lines). >>> Hunk #4 succeeded at 883 (offset 24 lines). >>> Hunk #6 succeeded at 968 (offset 24 lines). >>> 1 out of 6 hunks FAILED -- saving rejects to file /tmp/ >>> rejects >>> * * * The patch may have failed because it was updated, >>> outdated, >>> * * * or just defective. Support is available on the >>> oneSIS >>> mailing >>> * * * list, onesis-users at lists.sourceforge.net >> >> We''ll need to investigate which patch is failing and why. This is >> probably due to CentOS 5.2 being in place. (Giraffe bug to be >> filed.) >> >>> 18) It might be a good idea to explain exactly >>> what /usr/sbin/sunhpc_setup -i is doing. Especially, since >>> there''s no >>> reference to the OneSIS documentation. What''s the ''-i'' for? Do I >>> have >>> to provide that argument, if I''m generating a second image. I don''t >>> think there''s a way for me to figure this out without reading the >>> script >>> line-by-line. >> >> Noted. >> >>> 19) Why does the documentation''s step 5.4.b call for turning off >>> iptables? If something doesn''t work with IP tables on, what is it? >>> Surely it''s possible to add some exceptions and keep this excellent >>> security mechanism in place. On most of my deployments, iptables >>> is a >>> requirement on nodes that face the external network. On my >>> testbed, I >>> simply kept it up on the eth0 interface of my head node and then >>> allowed >>> all traffic on the eth1 which faced the rest of the cluster. If I >>> was >>> responsible for the documentation, I''d be careful about advocating >>> the >>> removal of a security feature, without further clarification and >>> explanation. >> >> In general we are approaching the system as fully trusted internally. >> And for this reason, we suggested turning off iptables. But you are >> correct that perhaps the documentation saying "turn it all off" >> might be >> a bit aggressive. We''ll change the verbage to make sure we only >> recommend turning off inwardly focussed network devices. >> >>> 20) I changed /tftpboot/pxelinux.cfg as follows. The most important >>> differences are a change to the default serial port for X6220s and >>> X6250s and the kernel version: >>> >>> PROMPT 1 >>> TIMEOUT 20 >>> IPAPPEND 2 >>> DEFAULT linux >>> label linux >>> kernel vmlinuz-2.6.18-53.1.21.el5.sunhpc2 >>> append root=/dev/nfs console=ttyS1,9600 >>> initrd=initrd-2.6.18-53.1.21.el5.sunhpc2.img selinux=0 >>> label rescue >>> kernel vmlinuz-2.6.18-53.1.21.el5.sunhpc2 >>> append root=/dev/nfs console=ttyS1,9600 rescue >>> initrd=initrd-2.6.18-53.1.21.el5.sunhpc2.img selinux=0 >> >> With more automation, this will be easier to take care of in the >> future. >> Perhaps we''ll need some extra documentation until then. >> >>> 21) I much prefer using the pxelinux config technique that allows >>> pxelinux to dynamically select its configuration file rather than >>> hardcoding it in dhcpd.conf. See "Next, it will search for the >>> config >>> file using its own IP address.." in http://syslinux.zytor.com/ >>> pxe.php >>> >>> As things currently stand, you have to either edit a PXE config >>> file in >>> a strange way or edit dhcpd.conf and restart the dhcpd server to >>> make a >>> node boot an alternative image. This doesn''t seem like a very >>> scalable >>> practice. I want to be able to change the image that a node boots >>> (including the possibility for a local HD image) simply on the >>> command >>> line for any combination of n nodes and m images - without needing >>> to >>> restart dhcp server(s). As things are currently setup, this isn''t >>> possible. >> >> There is a debate on how best to handle this (especially when you >> start >> talking 1000+ nodes). Having 1000 files (perhaps x2 if you symlink >> between a MAC Address and a hostname) can sometimes lead into many >> accidental mistakes. We''ll look into some possible schemes for >> handling >> this (you are correct, restarting dhcpd isn''t necessarily a good >> thing, >> but since this only controls the booting of a system, it''s not >> necessarily a bad thing either). >> >>> 22) Which version of pxelinux are we running. I find this >>> difficult to >>> determine: >>> >>> # rpm -qf /tftpboot/pxelinux.0 >>> file /tftpboot/pxelinux.0 is not owned by any package >> >> That''s a difficulty with just the way the syslinux people release >> their >> packages in general. pxelinux.0 is supplied by onesis. >> >>> 23) Regarding step 6 - Boot up the client nodes: >>> >>> a. The documentation calls for things like: >>> >>> ipmipower -W endianseq -h b01-sp -u root -p changeme --stat >>> ipmipower -W endianseq -h b01-sp -u root -p changeme --off >>> ipmipower -W endianseq -h b01-sp -u root -p changeme --on >>> >>> This works for X6220s, but not for X6250s. For X6250s you need: >>> >>> ipmipower -W authcap -h b02-sp -u root -p changeme --stat >>> ipmipower -W authcap -h b02-sp -u root -p changeme --off >>> ipmipower -W authcap -h b02-sp -u root -p changeme --on >>> >>> That is, a different IPMI workaround is required for Intel/ELOM >>> nodes. >> >> We''ll probably need to be more generic in this information, as this >> would be powerman and node specific. >> >>> b. I suggest setting up powerman at this point before going any >>> further and not confusing matters with ipmipower. To get powerman >>> working with the mixed IPMI workaround, I had to create >>> an /etc/powerman/powerman.conf like: >>> >>> include "/etc/powerman/ipmipower.dev" >>> >>> alias "all" "b01,b02" >>> >>> device "pow0" "ipmipower" "/usr/sbin/ipmipower -h b01-sp >>> --config /etc/ipmipower_amd.conf |&" >>> node "b01" "pow0" "b01-sp" >>> >>> device "pow1" "ipmipower" "/usr/sbin/ipmipower -h b02-sp >>> --config /etc/ipmipower_intel.conf |&" >>> node "b02" "pow1" "b02-sp" >>> >>> And then /etc/ipmipower_amd.conf like: >>> >>> hostname b01-sp >>> username root >>> password changeme >>> workaround-flags "endianseq" >>> on-if-off enable >>> wait-until-on enable >>> wait-until-off enable >>> >>> And finally /etc/ipmipower_intel.conf like: >>> >>> hostname b02-sp >>> username root >>> password changeme >>> workaround-flags "authcap" >>> on-if-off enable >>> wait-until-on enable >>> wait-until-off enable >> >> You''re probably right, putting power- and conman steps earlier >> would be >> a good idea. These powerman changes should be pushed into powerman >> itself (giraffe bug to be filed). >> >>> 24) Step 7: Build SSH keys is ripe for a small shell script. No? >> >> Part of auto-configuration steps coming in future revs. >> >>> 25) I think that many users will want to get ConMan setup before >>> actually booting compute nodes, so you might want to move the >>> documentation''s section on ConMan up a few sections. To get >>> ConMan to >>> work in a useful way, I needed to make some edits to /etc/ >>> conman.conf >>> that could maybe be there by default (e.g., server >>> logdir="/var/log/conman"). In addition, I had to make an expect >>> script >>> to make things work with the unsupported X6250 nodes. Using >>> sun-ilom.exp as a base, I re-worked it to work with X6250s. It''s >>> not >>> too well tested, but it''s attached as sun-elom.exp and seems to >>> work. >> >> Again, you''re right. With sutomation, we''ll start providing default >> conman.conf files. The elom version should be pushed back into >> conman >> (giraffe bug to be filed). >> >>> 26) After ConMan is setup correctly, I was able to get OneSIS images >>> booted, except the images weren''t setup to have initab start serial >>> gettys after boot, so it was impossible to login via the serial >>> console. >>> See http://www.vanemery.com/Linux/Serial/serial-console.html The >>> documentation might want to mention something about this. >> >> Two pieces are missing here, one is in /etc/inittab adding a s0 line. >> Generally kudzu should take care of this once it recognizes serial >> output, but we should take steps to force it (giraffe bug to be >> filed). >> Also, we should just check that ttyS0 is in the securetty file. >> >>> 27) I''m going to need diskfull installs. When will this >>> functionality >>> be available? >> >> Cobbler is being worked into our system for the next versions. This >> will provide diskfull installs. >> >>> 28) How does the verification suite work? The documentation >>> mentions >>> it, but I couldn''t figure out how to run it. >> >> This is a work in progress. More information when it becomes >> available. >> >>> 29) I''d like to see a parallel ping capability included (e.g., >>> fping: >>> http://fping.sourceforge.net/) >> >> We''ll look into this. In the past, I''ve seen less usefulness from a >> parallel ping than just doing a "pdsh -a uptime | dshbak -c", which >> will >> hit all the nodes and actually run a command (which tells you that >> the >> system is online). >> >>> That''s it for now. Let me know if you need any clarification. I''m >>> going to be testing all of this in a boot-over-IB environment >>> including >>> Lustre in the next week or so, so that experience will undoubtedly >>> generate some more feedback. >>> >>> -Matthew >>> >>> >>> >>> >>> >>> >>> >>> >>> --- >>> --- >>> ------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Linux_hpc_swstack mailing list >>> Linux_hpc_swstack at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack >> > > _______________________________________________ > Linux_hpc_swstack mailing list > Linux_hpc_swstack at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack > e feedback. >>> >>> -Matthew >>> >>> >>> >>> >>> >>> >>> >>> >>> --- >>> --- >>> ------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Linux_hpc_swstack mailing list >>> Linux_hpc_swstack at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack >> > > _______________________________________________ > Linux_hpc_swstack mailing list > Linux_hpc_swstack at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack > nfo/linux_hpc_swstack
Zhiqi Tao
2008-Sep-03 07:47 UTC
[Linux_hpc_swstack] Feedback on latest release of Sun HPC Software, Linux Edition
I lodged two more entries at bugzilla. 16983 Request tools to automatically put MAC addresses 16982 Request tools to setup management interfaces Best Regards, Zhiqi Makia Minich wrote:> These were lightly mentioned because they are something that needs to > be planned out better with the overall structure. You''re right that > they should be turned into trackers (Zhiqi, can you take care of this?). > > "A simile is not a lie, unless it is a bad simile." > - Christopher John Francis Boone > > On Sep 2, 2008, at 8:51 AM, Matthew Bohnsack <bohnsack at cdsinc.com> > wrote: > >> Thank you very much for the follow up Makia. >> >> You don''t say much about #s 10 & 14. Are these to be considered as >> targets that e.g., will be tracked as enhancement requests in >> Bugzilla? >> Or are they being considered as only nice things to have sometime >> far in >> the future? >> >> -Matthew >> >> On Mon, 2008-09-01 at 16:13 -0600, Makia Minich wrote: >>> First off, thanks for the comments; they''ll be quite helpful. My >>> comments will be strewn throughout. (Zhiqi, please look for bug >>> notes >>> on things to put into bugzilla.) >>> >>> Matthew Bohnsack wrote: >>>> Hello. >>>> >>>> I recently tried out the latest release of Sun HPC Software, Linux >>>> Edition on a small testbed cluster and would like to share my >>>> experiences and give some feedback. >>>> >>>> I installed the software on a setup shown in the attached diagram - >>>> TestBed.pdf. Notable aspects of this cluster, include: >>>> >>>> * I used a Linux workstation on my desk running Fedora 9 to >>>> remotely manage the system. >>>> * There''s a SunX2100 with two Ethernet interfaces that I use as >>>> the cluster management/head node. One of the node''s >>>> interfaces >>>> connects to my corporate network and thereby my workstation, >>>> while the other connects to the cluster/mgmt network. I >>>> prefer >>>> that a head node like this has three network interfaces, so >>>> that >>>> cluster mgmt traffic (e.g., power control) can be isolated >>>> from >>>> other cluster traffic (e.g., NFS), but two interfaces seem >>>> fine >>>> for a small testbed. >>>> * There are two compute nodes. The first one is a Sun X6220 >>>> AMD >>>> blade and the second one is a Sun X6250 Intel blade. Both of >>>> these blades are running inside of a Sun 6000 blade chassis. >>>> * The blade chassis has a Chassis Management Module (CMM) >>>> that can >>>> be used to access blade Service Processors (SPs). This >>>> connectivity enables serial console, power control, and other >>>> systems management functionality. >>>> >>>> I used a CentOS5.2 machine to build a Sun HPC Software, Linux >>>> Edition >>>> ISO image as follows: >>>> >>>> # wget >>>> http://dlc.sun.com/linux_hpc/iso/tools/sunhpc-linux-iso-0.5-sunhpc2.noarch.rpm >>>> # rpm -Uvh sunhpc-linux-iso-0.5-sunhpc2.noarch.rpm >>>> # sunhpc-linux-iso >>>> >>>> Numbered notes on the rest of the process follow... >>>> >>>> 01) The first thing I notice is that there''s a tarball fetched >>>> that has >>>> no version information in it (sunhpc_base.tgz). I.e., >>>> >>>> Downloading ISO skeleton... >>>> --15:29:09-- http://dlc.sun.com/linux_hpc/iso/base/sunhpc_base.tgz >>>> >>>> Without version information, how can different releases create >>>> repeatably simmilar ISO images? I.e., >>>> sunhpc-linux-iso-0.1-sunhpc6.noarch.rpm referring to one versioned >>>> ISO >>>> skeleton file while sunhpc-linux-iso-0.1-sunhpc6.noarch.rpm refers >>>> to >>>> another. >>> We''ll take a closer look at the versioning and make sure we get some >>> kind of standard across releases. The feeling is that we want to >>> make >>> sure that the base of the ISO is always at the latest version (since >>> this does not necessarily affect what is on the ISO itself, but does >>> make sure that the ISO is functioning). There is some versioning >>> that >>> can be done in the base as well as time stamps that tell us which >>> version is used. >>> >>>> 02) Taking the repeatability theme a little further, how is it >>>> possible >>>> to make a repeatable release, such that I can deploy exactly the >>>> same >>>> bits that Sun has tested, when yum might download newer or older RPM >>>> packages, depending on when sunhpc-linux-iso is run? This applies >>>> equally to repos containing CentOS RPMs and those containing Sun >>>> RPMS. >>> The repo file used with the sunhpc-linux-iso script looks at the >>> updates >>> directory for CentOS releases but only the 1.0 releases for SunHPC >>> tools. So, in this instance, if you create an ISO it is made up of >>> all >>> the pieces from our stack and includes all the newest updates from >>> the >>> CentOS base. Therefore, we know that the tools we provide are at >>> levels >>> we''ve tested, and we rely on the CentOS teams to test their own >>> packages. With this idea, every ISO should have all the latest >>> security >>> significant as released via CentOS. We could do some extra work to >>> make >>> sure that this is more known. >>> >>>> 03) sunhpc-linux-iso-0.5-sunhpc2.noarch.rpm creates files >>>> in /etc/sunhpc-linux-iso/yum.repos.d/ that have invalid URLs of >>>> the form >>>> "http://giraffe.lustre/..." >>>> >>>> E.g., /etc/sunhpc-linux-iso/yum.repos.d/runhpc-centos5.repo: >>>> >>>> [sunhpc-centos5] >>>> name=SunHPC Packages for CentOS 5 >>>> baseurl=http://giraffe.lustre/dlc_stage/yum/sunhpc/1.0/c5/x86_64/ >>>> http://dlc.sun.com/linux_hpc/yum/sunhpc/1.0/c5/x86_64/ >>>> gpgcheck=0 >>>> enabled=1 >>>> >>>> This causes ugly errors like: >>>> >>>> Starting download for packages in sunhpc-centos5... >>>> Downloading RPMs with yumdownloader... >>>> sunhpc-centos5-updates 100% |=========================| 951 B >>>> 00:00 >>>> primary.xml.gz 100% |=========================| 301 kB >>>> 00:01 >>>> sunhpc-cen: ################################################## >>>> 605/605 >>>> >>>> http://giraffe.lustre/dlc_stage/yum/sunhpc/1.0/c5/x86_64/repodata/repomd.xml >>>> : >>>> [Errno 4] IOError: <urlopen error (-2, ''Name or service not >>>> known'')> >>>> >>>> What is this "giraffe.lustre", some kind of internal testing >>>> domain? Is >>>> it required in code that''s distributed to the public? >>> This will get cleaned up. (Giraffe bug to be filed.) >>> >>>> 04) Even though the RPM version number was incremented with the >>>> recent >>>> release, the filename of the generated ISO was not, so it''s hard >>>> for me >>>> to tell the difference between resultant releases and >>>> documentation from >>>> one RPM revision to another. E.g., I have two files in my home >>>> directory named sun-linux-hpc-1.0-PREVIEW.iso. Which is which? I''m >>>> also puzzled by the naming convention of the documentation file. >>>> E.g., >>>> installation_guide_1.0_preview.pdf vs installation_guide_1.0.pdf. >>>> What >>>> will the title be in the next release? >>> Documentation changes are being worked on. We''ll determine a >>> standard >>> release name for them. As for the ISO names, thanks for the input, >>> we''ll look into it. >>> >>>> 05) I installed from the ISO on my Sun X2100 management/head node >>>> machine per installation_guide_1.0.pdf, selecting the Lustre >>>> software >>>> option, but not customizing any software packages. This resulted >>>> in a >>>> system running 2.6.18-53.1.21.el5.sunhpc2. >>>> >>>> 06) As is already reported in the release notes, a double reboot was >>>> required after the install, before the system could be used. >>>> >>>> 07) After installation, I noticed that root''s .bash* ini files were >>>> missing. I added them, consulting another machine''s files. This >>>> seems >>>> like bug. >>> Not necessarily sure what''s wrong here. The root account should be >>> created via the standard installer''s method, so /etc/skel should have >>> been copied correctly. We''ll take a look and see how to fix this >>> (Giraffe bug to be filed). >>> >>>> 08) One of the first things I tried to do was ssh to my management >>>> node >>>> from my Linux workstation, so that I could access a target compute >>>> blade''s VGA console via the web management GUI with a remote >>>> instance of >>>> firefox. This didn''t work for multiple reasons: >>>> >>>> * A full X11 installation wasn''t done. >>>> * Ssh didn''t support setting up $DISPLAY. >>>> * There were two conflicting instances of firefox installed (32 >>>> and 64-bit). For the purposes of running Java applets >>>> required >>>> by the web GUI, the 32-bit firefox is needed. >>>> * The Java plugin for 32-bit firefox wasn''t installed. >>>> >>>> I can understand that some people might not want/need X11 or >>>> firefox, >>>> but for a management node it''s a requirement for me, and in most >>>> of my >>>> installations, it''s also required on the compute nodes for >>>> totalview, >>>> among other things. >>>> >>>> Here''s what I had to do to fix things: >>>> >>>> a. I found that I couldn''t install X11, until I did a yum update. >>>> b. To do a yum update, I had to exclude various RPMs in /etc/ >>>> yum.conf: >>>> exclude=kernel* opensm* infiniband* >>>> c. yum update >>>> d. yum groupinstall -y "X Window System" >>>> e. rpm -e firefox.x86_64 >>>> f. rpm -e firefox.i386 >>>> g. install firefox.i386 >>>> h. I installed Sun JDK 1.5.0_15 in /opt/java/jdk1.5.0_15 and then >>>> created scripts in /etc/profile.d, such that... >>>> JAVA_HOME=/opt/java/jdk1.5.0_15 >>>> PATH=$JAVA_HOME/bin:$PATH >>>> k. cd /usr/lib/mozilla/plugns && ln >>>> -s /opt/java/jdk1.5.0_15/jre/plugin/i386/ns7/libjavaplugin_oji.so >>> We will visit this and look at what is needed on our end to make sure >>> this works better. >>> >>>> 09) The documentation''s step #3 lists a table of nodes, IP >>>> addresses, >>>> and MAC addresses. While this kind of information is needed, and >>>> someone experienced with implementing and administering Linux >>>> clusters >>>> will be familiar with what''s going on, there''s little context to >>>> let a >>>> new user know why this information is important to them. If you''re >>>> going to have a sample table, I suggest, providing a lot more >>>> context. >>>> E.g., a picture and description of a sample cluster. On my >>>> clusters, I >>>> would like cluster IP addresses to be first put in a hosts file >>>> like /etc/hosts and then optionally automatically turned into DNS >>>> configuration files. >>>> >>>> My testbed /etc/hosts file follows: >>>> >>>> == BEGIN /etc/hosts =>>>> 127.0.0.1 localhost.localdomain localhost >>>> >>>> # Management/Head Node >>>> # ====================>>>> # Has two Ethernet interfaces: >>>> # eth0: connected to the corporate network >>>> # eth1: connected to the cluster/management network >>>> 10.0.0.101 cdsytestbed # eth0 >>>> 192.168.1.254 cdsytestbed-mgmt # eth1 >>>> >>>> # Management Devices >>>> # ====================>>>> 192.168.1.200 cmm1 >>>> 192.168.1.1 b01-sp >>>> 192.168.1.2 b02-sp >>>> >>>> # Compute Nodes >>>> # ====================>>>> 192.168.1.201 b01 >>>> 192.168.1.202 b02 >>>> == END /etc/hosts =>>> We will do some documentation reviews. >>> >>>> 10) Step 4, 2 seems innocuous, but I see this as a critically >>>> important, >>>> missing piece of functionality in the current release. IMHO, its >>>> absence is a showstopper on a large system. >>>> >>>> That is, >>>> >>>> "Set up management interfaces, such as the sun Integrated Lights >>>> Out >>>> Manager (ILOM) service sprocessors (SPs), to prepare to provision >>>> the >>>> systems." >>>> >>>> is simple enough to do manually on a small test cluster, but is >>>> untenable on a large system. I''ve developed tools (mostly Perl >>>> scripts >>>> that "expect" various SMASH CLI interfaces) to: >>>> >>>> a. Get the MAC address of the CMM via a serial connection. >>>> b. Access CMM over IP, set IP address and make other CMM settings. >>>> c. Update CMM firmware over IP (Assumes a TFTP server). >>>> d. Verify CMM firmware versions. >>>> e. Set IP addresses of SPs. >>>> f. Turn off blades. >>>> g. Update blade and SP firmware (CPLD and REM is still TODO). >>>> h. Turn on blades. >>>> i. Verify blade firmware versions. >>>> j. HCA flash to enable boot-over-IB (for a different Infiniband >>>> machine I''m working with). >>>> k. Collect IB link layer host IDs (for a different Infiniband >>>> machine >>>> I''m working with). >>>> >>>> Is there any interest is putting this stuff into the release or >>>> something like it? I''d like to see a toolkit to do this stuff in a >>>> pluggably general way. E.g., Tools that enable X6220 and X6250 >>>> blades >>>> to be updated/configured with similar commands, but there would be >>>> underlying scripting that takes care of different commands, >>>> firmware, >>>> etc. for the different platforms. >>> Thanks for the input. >>> >>>> 11) Going on the Step 5 in the Sun Documentation (Create >>>> configurations >>>> for all nodes in the cluster)... >>>> >>>> 12) It might be nice to include a link to the actual OneSIS >>>> documentation. This whole process is rather mysterious without it. >>>> >>>> 13) I changed my /etc/sunhpc.conf as follows. Note that this file >>>> currently defaults to an old kernel. It needs to be manually >>>> changed to >>>> the new one. Also, in my case, DHCPD_IF needed to be changed. >>>> >>>> DISTRO=centos-5.1 >>>> ONESIS_IMAGE=/var/lib/oneSIS/image/centos-5.1 >>>> DHCPD_IF=eth1 >>>> KERNEL_VERSION=2.6.18-53.1.21.el5.sunhpc2 >>>> UNUSED_PKGS="conman powerman ganglia-gmetad sunhpc-configuration" >>>> >>>> 14) Step 5.2 when you''re asked to put MAC addresses >>>> in /etc/dhcp_hostlist is another step that could be automated for >>>> various supported machines. E.g., SMASH commands like >>>> "show /SYS/MB/NET0 fru_serial_number" for X6220s and "show /SYS/ >>>> NICInfo0 >>>> MacAddress1" on X6250s. Any plans for this? I think its critical >>>> on >>>> large machines. My resultant file is: >>>> >>>> host b01 {hardware ethernet 00:14:4f:82:1e:02; fixed-address >>>> 192.168.1.201;} >>>> host b02 {hardware ethernet 00:1e:68:57:77:58; fixed-address >>>> 192.168.1.202;} >>> We will be visiting ideas on how better to approach this in upcoming >>> releases. >>> >>>> 15) I see that dhcpd is configured in /etc/dhcpd_sunhpc.conf. Why >>>> not >>>> the default dhcpd.conf? This is very confusing. >>> Noted, we will investigate this. >>> >>>> 16) When running /usr/sbin/sunhpc_setup -i without a NETWORK >>>> directive >>>> in /etc/sysconfig/network-scripts/ifcfg-eth0 there''s poor grammar >>>> in an >>>> error message, that you might want to fixup: >>>> >>>> incompleted network information, please >>>> check /etc/sysconfig/network-scripts/ifcfg-${DHCPD_IF}" >>>> >>>> I.e., s/incompleted/incomplete/ >>> We''ll get this fixed (giraffe bug to be filed). >>> >>>> 17) Possibly because of the "yum update" I did to get X11 working, >>>> sunhpc_setup results in OneSIS giving an error message about a patch >>>> reject in /etc/rc.d/rc.sysinit. I fixed it up manually - I >>>> think. More >>>> testing is required: >>>> >>>> oneSIS: Warning! Patch failed or was only partially >>>> successful. >>>> * * * patching file etc/rc.d/rc.sysinit >>>> Hunk #1 FAILED at 702. >>>> Hunk #2 succeeded at 736 (offset 24 lines). >>>> Hunk #4 succeeded at 883 (offset 24 lines). >>>> Hunk #6 succeeded at 968 (offset 24 lines). >>>> 1 out of 6 hunks FAILED -- saving rejects to file /tmp/ >>>> rejects >>>> * * * The patch may have failed because it was updated, >>>> outdated, >>>> * * * or just defective. Support is available on the >>>> oneSIS >>>> mailing >>>> * * * list, onesis-users at lists.sourceforge.net >>> We''ll need to investigate which patch is failing and why. This is >>> probably due to CentOS 5.2 being in place. (Giraffe bug to be >>> filed.) >>> >>>> 18) It might be a good idea to explain exactly >>>> what /usr/sbin/sunhpc_setup -i is doing. Especially, since >>>> there''s no >>>> reference to the OneSIS documentation. What''s the ''-i'' for? Do I >>>> have >>>> to provide that argument, if I''m generating a second image. I don''t >>>> think there''s a way for me to figure this out without reading the >>>> script >>>> line-by-line. >>> Noted. >>> >>>> 19) Why does the documentation''s step 5.4.b call for turning off >>>> iptables? If something doesn''t work with IP tables on, what is it? >>>> Surely it''s possible to add some exceptions and keep this excellent >>>> security mechanism in place. On most of my deployments, iptables >>>> is a >>>> requirement on nodes that face the external network. On my >>>> testbed, I >>>> simply kept it up on the eth0 interface of my head node and then >>>> allowed >>>> all traffic on the eth1 which faced the rest of the cluster. If I >>>> was >>>> responsible for the documentation, I''d be careful about advocating >>>> the >>>> removal of a security feature, without further clarification and >>>> explanation. >>> In general we are approaching the system as fully trusted internally. >>> And for this reason, we suggested turning off iptables. But you are >>> correct that perhaps the documentation saying "turn it all off" >>> might be >>> a bit aggressive. We''ll change the verbage to make sure we only >>> recommend turning off inwardly focussed network devices. >>> >>>> 20) I changed /tftpboot/pxelinux.cfg as follows. The most important >>>> differences are a change to the default serial port for X6220s and >>>> X6250s and the kernel version: >>>> >>>> PROMPT 1 >>>> TIMEOUT 20 >>>> IPAPPEND 2 >>>> DEFAULT linux >>>> label linux >>>> kernel vmlinuz-2.6.18-53.1.21.el5.sunhpc2 >>>> append root=/dev/nfs console=ttyS1,9600 >>>> initrd=initrd-2.6.18-53.1.21.el5.sunhpc2.img selinux=0 >>>> label rescue >>>> kernel vmlinuz-2.6.18-53.1.21.el5.sunhpc2 >>>> append root=/dev/nfs console=ttyS1,9600 rescue >>>> initrd=initrd-2.6.18-53.1.21.el5.sunhpc2.img selinux=0 >>> With more automation, this will be easier to take care of in the >>> future. >>> Perhaps we''ll need some extra documentation until then. >>> >>>> 21) I much prefer using the pxelinux config technique that allows >>>> pxelinux to dynamically select its configuration file rather than >>>> hardcoding it in dhcpd.conf. See "Next, it will search for the >>>> config >>>> file using its own IP address.." in http://syslinux.zytor.com/ >>>> pxe.php >>>> >>>> As things currently stand, you have to either edit a PXE config >>>> file in >>>> a strange way or edit dhcpd.conf and restart the dhcpd server to >>>> make a >>>> node boot an alternative image. This doesn''t seem like a very >>>> scalable >>>> practice. I want to be able to change the image that a node boots >>>> (including the possibility for a local HD image) simply on the >>>> command >>>> line for any combination of n nodes and m images - without needing >>>> to >>>> restart dhcp server(s). As things are currently setup, this isn''t >>>> possible. >>> There is a debate on how best to handle this (especially when you >>> start >>> talking 1000+ nodes). Having 1000 files (perhaps x2 if you symlink >>> between a MAC Address and a hostname) can sometimes lead into many >>> accidental mistakes. We''ll look into some possible schemes for >>> handling >>> this (you are correct, restarting dhcpd isn''t necessarily a good >>> thing, >>> but since this only controls the booting of a system, it''s not >>> necessarily a bad thing either). >>> >>>> 22) Which version of pxelinux are we running. I find this >>>> difficult to >>>> determine: >>>> >>>> # rpm -qf /tftpboot/pxelinux.0 >>>> file /tftpboot/pxelinux.0 is not owned by any package >>> That''s a difficulty with just the way the syslinux people release >>> their >>> packages in general. pxelinux.0 is supplied by onesis. >>> >>>> 23) Regarding step 6 - Boot up the client nodes: >>>> >>>> a. The documentation calls for things like: >>>> >>>> ipmipower -W endianseq -h b01-sp -u root -p changeme --stat >>>> ipmipower -W endianseq -h b01-sp -u root -p changeme --off >>>> ipmipower -W endianseq -h b01-sp -u root -p changeme --on >>>> >>>> This works for X6220s, but not for X6250s. For X6250s you need: >>>> >>>> ipmipower -W authcap -h b02-sp -u root -p changeme --stat >>>> ipmipower -W authcap -h b02-sp -u root -p changeme --off >>>> ipmipower -W authcap -h b02-sp -u root -p changeme --on >>>> >>>> That is, a different IPMI workaround is required for Intel/ELOM >>>> nodes. >>> We''ll probably need to be more generic in this information, as this >>> would be powerman and node specific. >>> >>>> b. I suggest setting up powerman at this point before going any >>>> further and not confusing matters with ipmipower. To get powerman >>>> working with the mixed IPMI workaround, I had to create >>>> an /etc/powerman/powerman.conf like: >>>> >>>> include "/etc/powerman/ipmipower.dev" >>>> >>>> alias "all" "b01,b02" >>>> >>>> device "pow0" "ipmipower" "/usr/sbin/ipmipower -h b01-sp >>>> --config /etc/ipmipower_amd.conf |&" >>>> node "b01" "pow0" "b01-sp" >>>> >>>> device "pow1" "ipmipower" "/usr/sbin/ipmipower -h b02-sp >>>> --config /etc/ipmipower_intel.conf |&" >>>> node "b02" "pow1" "b02-sp" >>>> >>>> And then /etc/ipmipower_amd.conf like: >>>> >>>> hostname b01-sp >>>> username root >>>> password changeme >>>> workaround-flags "endianseq" >>>> on-if-off enable >>>> wait-until-on enable >>>> wait-until-off enable >>>> >>>> And finally /etc/ipmipower_intel.conf like: >>>> >>>> hostname b02-sp >>>> username root >>>> password changeme >>>> workaround-flags "authcap" >>>> on-if-off enable >>>> wait-until-on enable >>>> wait-until-off enable >>> You''re probably right, putting power- and conman steps earlier >>> would be >>> a good idea. These powerman changes should be pushed into powerman >>> itself (giraffe bug to be filed). >>> >>>> 24) Step 7: Build SSH keys is ripe for a small shell script. No? >>> Part of auto-configuration steps coming in future revs. >>> >>>> 25) I think that many users will want to get ConMan setup before >>>> actually booting compute nodes, so you might want to move the >>>> documentation''s section on ConMan up a few sections. To get >>>> ConMan to >>>> work in a useful way, I needed to make some edits to /etc/ >>>> conman.conf >>>> that could maybe be there by default (e.g., server >>>> logdir="/var/log/conman"). In addition, I had to make an expect >>>> script >>>> to make things work with the unsupported X6250 nodes. Using >>>> sun-ilom.exp as a base, I re-worked it to work with X6250s. It''s >>>> not >>>> too well tested, but it''s attached as sun-elom.exp and seems to >>>> work. >>> Again, you''re right. With sutomation, we''ll start providing default >>> conman.conf files. The elom version should be pushed back into >>> conman >>> (giraffe bug to be filed). >>> >>>> 26) After ConMan is setup correctly, I was able to get OneSIS images >>>> booted, except the images weren''t setup to have initab start serial >>>> gettys after boot, so it was impossible to login via the serial >>>> console. >>>> See http://www.vanemery.com/Linux/Serial/serial-console.html The >>>> documentation might want to mention something about this. >>> Two pieces are missing here, one is in /etc/inittab adding a s0 line. >>> Generally kudzu should take care of this once it recognizes serial >>> output, but we should take steps to force it (giraffe bug to be >>> filed). >>> Also, we should just check that ttyS0 is in the securetty file. >>> >>>> 27) I''m going to need diskfull installs. When will this >>>> functionality >>>> be available? >>> Cobbler is being worked into our system for the next versions. This >>> will provide diskfull installs. >>> >>>> 28) How does the verification suite work? The documentation >>>> mentions >>>> it, but I couldn''t figure out how to run it. >>> This is a work in progress. More information when it becomes >>> available. >>> >>>> 29) I''d like to see a parallel ping capability included (e.g., >>>> fping: >>>> http://fping.sourceforge.net/) >>> We''ll look into this. In the past, I''ve seen less usefulness from a >>> parallel ping than just doing a "pdsh -a uptime | dshbak -c", which >>> will >>> hit all the nodes and actually run a command (which tells you that >>> the >>> system is online). >>> >>>> That''s it for now. Let me know if you need any clarification. I''m >>>> going to be testing all of this in a boot-over-IB environment >>>> including >>>> Lustre in the next week or so, so that experience will undoubtedly >>>> generate some more feedback. >>>> >>>> -Matthew >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> --- >>>> --- >>>> ------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> Linux_hpc_swstack mailing list >>>> Linux_hpc_swstack at lists.lustre.org >>>> http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack >> _______________________________________________ >> Linux_hpc_swstack mailing list >> Linux_hpc_swstack at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack >> e feedback. >>>> -Matthew >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> --- >>>> --- >>>> ------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> Linux_hpc_swstack mailing list >>>> Linux_hpc_swstack at lists.lustre.org >>>> http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack >> _______________________________________________ >> Linux_hpc_swstack mailing list >> Linux_hpc_swstack at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack >> nfo/linux_hpc_swstack > _______________________________________________ > Linux_hpc_swstack mailing list > Linux_hpc_swstack at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack
Matthew Bohnsack
2008-Sep-05 16:16 UTC
[Linux_hpc_swstack] Feedback on latest release of Sun HPC Software, Linux Edition
On Mon, 2008-09-01 at 16:13 -0600, Makia Minich wrote:> > 29) I''d like to see a parallel ping capability included (e.g., fping: > > http://fping.sourceforge.net/) > > We''ll look into this. In the past, I''ve seen less usefulness from a > parallel ping than just doing a "pdsh -a uptime | dshbak -c", which will > hit all the nodes and actually run a command (which tells you that the > system is online).I''d like to make an enhancement request for fping. I use fping extensively, because it allows for a level of diagnostics that pdsh does not. I.e., the difference between a running IP stack that enables ping and a running sshd that enables pdsh can be very important when diagnosing a large system''s state. We can obviously install fping if it doesn''t come with the stack out-of-the-box, but having it there by default would be nice. -Matthew
Makia Minich
2008-Sep-07 08:18 UTC
[Linux_hpc_swstack] Feedback on latest release of Sun HPC Software, Linux Edition
Matthew Bohnsack wrote:> On Mon, 2008-09-01 at 16:13 -0600, Makia Minich wrote: > >>> 29) I''d like to see a parallel ping capability included (e.g., fping: >>> http://fping.sourceforge.net/) >> We''ll look into this. In the past, I''ve seen less usefulness from a >> parallel ping than just doing a "pdsh -a uptime | dshbak -c", which will >> hit all the nodes and actually run a command (which tells you that the >> system is online). > > I''d like to make an enhancement request for fping. I use fping > extensively, because it allows for a level of diagnostics that pdsh does > not. I.e., the difference between a running IP stack that enables ping > and a running sshd that enables pdsh can be very important when > diagnosing a large system''s state. We can obviously install fping if it > doesn''t come with the stack out-of-the-box, but having it there by > default would be nice.Noted. (Zhiqi, can you please make log this request?) -- "A simile is not a lie, unless it is a bad simile." - Christopher John Francis Boone
Zhiqi Tao
2008-Sep-07 12:52 UTC
[Linux_hpc_swstack] Feedback on latest release of Sun HPC Software, Linux Edition
Dear Matthew, Thanks for your feedback! I add one bugzilla entry for this one. Bug 17020 [Feature Request] fping Best Regards, Zhiqi Makia Minich wrote:> Matthew Bohnsack wrote: >> On Mon, 2008-09-01 at 16:13 -0600, Makia Minich wrote: >> >>>> 29) I''d like to see a parallel ping capability included (e.g., fping: >>>> http://fping.sourceforge.net/) >>> We''ll look into this. In the past, I''ve seen less usefulness from a >>> parallel ping than just doing a "pdsh -a uptime | dshbak -c", which will >>> hit all the nodes and actually run a command (which tells you that the >>> system is online). >> I''d like to make an enhancement request for fping. I use fping >> extensively, because it allows for a level of diagnostics that pdsh does >> not. I.e., the difference between a running IP stack that enables ping >> and a running sshd that enables pdsh can be very important when >> diagnosing a large system''s state. We can obviously install fping if it >> doesn''t come with the stack out-of-the-box, but having it there by >> default would be nice. > > Noted. (Zhiqi, can you please make log this request?) >