Matthew Bohnsack
2008-Sep-08 20:03 UTC
[Linux_hpc_swstack] Feature Request: BIOS Tools and Improved BIOS/FW Release Notes
Hello, I would like to make a feature request for a future release of Sun(TM) HPC Software, Linux Edition. In a nutshell, we need the following for a large system that will be implemented over the course of the next few months: A. Linux commandline utilities that can be used to read and write the BIOS images of Sun 6000-series blades. B. Improved (or simply existent ;)) release notes for BIOS and other firmware for Sun 6000-series blades. Some might consider this request more appropriate for Sun hardware engineering than it is for Sun Linux cluster software development. However, I''m asking here, because I believe the requirement for this functionality flows very naturally from a Linux cluster software stack. That is, IMHO, without this functionality, a Linux cluster software stack is incomplete and the robust implementation of a large-scale production system is very difficult. Details of my request... Requirement A. Linux commandline utilities for BIOS read/write =============================================================== Description: We need a Linux commandline utility that: * Can read a BIOS image from a node''s EEPROM, including firmware revision and settings and then write this image to a file. * Can write a previously created BIOS image to a node''s EEPROM. * Will not overwrite node VPD. E.g., if I save a BIOS image from node1 to disk and then write that image to node2, node1 and node2 should retain unique VPD information such as serial numbers that are viewable from dmidecode, IMPI utilities, etc. * Works on current Harpertown-based X6250 blades * Works on future Gainestown-based X???? blades Note that while this utility would enable BIOS revision updates, this type of update is currently possible via a SMASH/TFTP process that can be easily automated. I.e., it''s making the BIOS settings changes that''s today''s key missing piece of functionality. Justification: For implementation-specific technical reasons and because of HPC-and-scale-related manageability requirements, we need this functionality to effectively implement and manage the Linux cluster I''m currently working on. Some details: * We have blades that will only have an Infiniband HCA to boot from - there will be no Ethernet or hard drives. These blades will be unusable unless certain BIOS settings are made that don''t ship as the default. E.g., PCI expansion ROM functions need to be enabled. Because of this requirement and the scale of our system, we require an automated solution. Manually making these BIOS settings changes at our scale is not feasible. * It would be nice to have the required BIOS settings shipped from the factory, but this would not be enough, because I have seen BIOS settings revert to a state useless for Boot-over-IB with certain combinations of power cycle and unseat/reseat with the X6250-based system I''m currently working on. In other words, we need these tools to implement a robust break/fix process. * We''re going to be running tightly-coupled parallel codes that will only go as fast as the slowest node. Because BIOS revisions and settings can have important effects on single node performance, and we are sure to identify beneficial BIOS settings changes in the future that aren''t known today, a facility to automate BIOS settings changes has to exist for us to get the maximum benefit from our machine. * We need a way to automatically validate a consistent BIOS state, in terms of revision and settings because: * A BIOS setting could be accidentally changed by a system administrator, and we want to find any issues caused by this kind of thing ASAP. * Replacement hardware could ship from the factory with an unknown state, and we must have a consistent process for taking it to a known state and validating that state. * The BIOS''s writable EEPROM could represent a security risk, unless automated tools can be written to validate EEPROM state. Preferred Implementation: We prefer an OpenSource solution running from Linux userspace, based on coreboot''s flashrom. See notes about tests performed with this tool at the end of this note and the website: http://www.coreboot.org/Flashrom Other Less Desirable Possibilities: * Sun-proprietary Linux tools that implement the required functionality could possibly work as well as the coreboot tools. * SP/SMASH-CLI-based tools that implement the required functionality would enable "Expect"-like scripting tools to be developed. * DOS-based utilities that that implement the required functionality would enable tools based on network-bootable DOS images to be developed. This would be lame, but it could be made to work. * "Expect"-like scripting tools could be developed that "screen-scrape" the BIOS setup screens over the serial console. This solution would be very brittle and least desirable of all the options. Requirement B. Improved changelogs for BIOS and other firmware =============================================================== Description: Firmware updates for Sun 6000 and 6048-based systems are currently available for download at: http://www.sun.com/servers/blades/downloads.jsp While updates for some hardware (e.g., Sun Blade X6220 Server Module Software 2.0) has changelog detail, other updates that I''m interested in (e.g., Sun Blade X6250 Server Module 1.3.3, 1.3.2, and 1.3.1) contain no change information whatsoever. I need changelog information for firmware that applies to X6250 machines today and the blades based on Intel Gainestown CPUs when they are released. Justification: Updates and settings changes made to BIOS, systems management, and other firmware can have a significant effect on a system, in terms of system performance, stability, and manageability. All of these factors are of critical importance on a large-scale HPC system, because seemingly small firmware improvements/tweaks can have major impacts at scale. It follows that HPC system managers are often eager to apply new firmware updates on their HPC systems. However, the application of firmware updates can be costly on a large system. E.g, disrupting production jobs on thousands of nodes to do a firmware update has a serious impact in terms of lost throughput. Therefore, an HPC system manager must balance the cost of firmware updates with the possible improvements that the firmware updates could provide. This analysis is required, but not possible when proper changelogs describing new firmware releases do not exist. Other Notes =============================================================== 1) I''ve previously made similar requests on the Sun Hardware blade forum: * http://forums.sun.com/thread.jspa?threadID=5316894&tstart=0 * http://forums.sun.com/thread.jspa?threadID=5316461&tstart=0 2) Details on required BIOS setting changes (among other things): http://forums.sun.com/thread.jspa?threadID=5316472&tstart=0 3) I tested the latest FlashROM from SVN on a Harpertown-based X6250 (doesn''t work) and a AMD-based x6220 machine (seems to work): # svn info Path: . URL: svn://coreboot.org/repos/trunk/util/flashrom Repository Root: svn://coreboot.org/repos Repository UUID: 2b7e53f0-3cfb-0310-b3e9-8179ed1497e1 Revision: 3535 Node Kind: directory Schedule: normal Last Changed Author: hailfinger Last Changed Rev: 3532 Last Changed Date: 2008-08-20 14:31:41 -0600 (Wed, 20 Aug 2008) harpertown-box# ./flashrom Calibrating delay loop... OK. No coreboot table found. WARNING: No chipset found. Flash detection will most likely fail. No EEPROM/flash device found. If you know which flash chip you have, and if this version of flashrom supports a similar flash chip, you can try to force read your chip. Run: flashrom -f -r -c similar_supported_flash_chip filename Note: flashrom can never write when the flash chip isn''t found automatically. amd-box# ./flashrom Calibrating delay loop... OK. No coreboot table found. Found chipset "NVIDIA CK804", enabling flash write... OK. Found chip "ST M50FLW080A" (1024 KB) at physical address 0xfff00000. No operations were specified.
Zhiqi Tao
2008-Sep-10 07:01 UTC
[Linux_hpc_swstack] Feature Request: BIOS Tools and Improved BIOS/FW Release Notes
Dear Matthew, I completely understand the inconvenience of not being able to automate BIOS update in a large scale server environment. I created one ticket in bugzilla for this request. Bug 17051 Feature Request: BIOS Tools and Improved BIOS/FW Release Notes I appreciate your effort of compiling such a detailed specification. I will investigate what would be the best approach to address this feature. Best Regards, Zhiqi Matthew Bohnsack wrote:> Hello, > > I would like to make a feature request for a future release of Sun(TM) > HPC Software, Linux Edition. > > In a nutshell, we need the following for a large system that will be > implemented over the course of the next few months: > > A. Linux commandline utilities that can be used to read and write > the BIOS images of Sun 6000-series blades. > B. Improved (or simply existent ;)) release notes for BIOS and > other firmware for Sun 6000-series blades. > > Some might consider this request more appropriate for Sun hardware > engineering than it is for Sun Linux cluster software development. > However, I''m asking here, because I believe the requirement for this > functionality flows very naturally from a Linux cluster software stack. > That is, IMHO, without this functionality, a Linux cluster software > stack is incomplete and the robust implementation of a large-scale > production system is very difficult. > > Details of my request... > > Requirement A. Linux commandline utilities for BIOS read/write > ===============================================================> > Description: > > We need a Linux commandline utility that: > > * Can read a BIOS image from a node''s EEPROM, including firmware > revision and settings and then write this image to a file. > * Can write a previously created BIOS image to a node''s EEPROM. > * Will not overwrite node VPD. E.g., if I save a BIOS image from > node1 to disk and then write that image to node2, node1 and > node2 should retain unique VPD information such as serial > numbers that are viewable from dmidecode, IMPI utilities, etc. > * Works on current Harpertown-based X6250 blades > * Works on future Gainestown-based X???? blades > > Note that while this utility would enable BIOS revision updates, this > type of update is currently possible via a SMASH/TFTP process that can > be easily automated. I.e., it''s making the BIOS settings changes that''s > today''s key missing piece of functionality. > > Justification: > > For implementation-specific technical reasons and because of > HPC-and-scale-related manageability requirements, we need this > functionality to effectively implement and manage the Linux cluster I''m > currently working on. Some details: > > * We have blades that will only have an Infiniband HCA to boot > from - there will be no Ethernet or hard drives. These blades > will be unusable unless certain BIOS settings are made that > don''t ship as the default. E.g., PCI expansion ROM functions > need to be enabled. Because of this requirement and the scale > of our system, we require an automated solution. Manually > making these BIOS settings changes at our scale is not feasible. > * It would be nice to have the required BIOS settings shipped from > the factory, but this would not be enough, because I have seen > BIOS settings revert to a state useless for Boot-over-IB with > certain combinations of power cycle and unseat/reseat with the > X6250-based system I''m currently working on. In other words, we > need these tools to implement a robust break/fix process. > * We''re going to be running tightly-coupled parallel codes that > will only go as fast as the slowest node. Because BIOS > revisions and settings can have important effects on single node > performance, and we are sure to identify beneficial BIOS > settings changes in the future that aren''t known today, a > facility to automate BIOS settings changes has to exist for us > to get the maximum benefit from our machine. > * We need a way to automatically validate a consistent BIOS state, > in terms of revision and settings because: > * A BIOS setting could be accidentally changed by a system > administrator, and we want to find any issues caused by > this kind of thing ASAP. > * Replacement hardware could ship from the factory with an > unknown state, and we must have a consistent process for > taking it to a known state and validating that state. > * The BIOS''s writable EEPROM could represent a security > risk, unless automated tools can be written to validate > EEPROM state. > > > Preferred Implementation: > > We prefer an OpenSource solution running from Linux userspace, based on > coreboot''s flashrom. See notes about tests performed with this tool at > the end of this note and the website: http://www.coreboot.org/Flashrom > > Other Less Desirable Possibilities: > > * Sun-proprietary Linux tools that implement the required > functionality could possibly work as well as the coreboot tools. > * SP/SMASH-CLI-based tools that implement the required > functionality would enable "Expect"-like scripting tools to be > developed. > * DOS-based utilities that that implement the required > functionality would enable tools based on network-bootable DOS > images to be developed. This would be lame, but it could be > made to work. > * "Expect"-like scripting tools could be developed that > "screen-scrape" the BIOS setup screens over the serial console. > This solution would be very brittle and least desirable of all > the options. > > > Requirement B. Improved changelogs for BIOS and other firmware > ===============================================================> > Description: > > Firmware updates for Sun 6000 and 6048-based systems are currently > available for download at: > > http://www.sun.com/servers/blades/downloads.jsp > > While updates for some hardware (e.g., Sun Blade X6220 Server Module > Software 2.0) has changelog detail, other updates that I''m interested in > (e.g., Sun Blade X6250 Server Module 1.3.3, 1.3.2, and 1.3.1) contain no > change information whatsoever. I need changelog information for > firmware that applies to X6250 machines today and the blades based on > Intel Gainestown CPUs when they are released. > > Justification: > > Updates and settings changes made to BIOS, systems management, and other > firmware can have a significant effect on a system, in terms of system > performance, stability, and manageability. All of these factors are of > critical importance on a large-scale HPC system, because seemingly small > firmware improvements/tweaks can have major impacts at scale. It > follows that HPC system managers are often eager to apply new firmware > updates on their HPC systems. However, the application of firmware > updates can be costly on a large system. E.g, disrupting production > jobs on thousands of nodes to do a firmware update has a serious impact > in terms of lost throughput. Therefore, an HPC system manager must > balance the cost of firmware updates with the possible improvements that > the firmware updates could provide. This analysis is required, but not > possible when proper changelogs describing new firmware releases do not > exist. > > Other Notes > ===============================================================> > 1) I''ve previously made similar requests on the Sun Hardware blade > forum: > > * http://forums.sun.com/thread.jspa?threadID=5316894&tstart=0 > * http://forums.sun.com/thread.jspa?threadID=5316461&tstart=0 > > 2) Details on required BIOS setting changes (among other things): > http://forums.sun.com/thread.jspa?threadID=5316472&tstart=0 > > 3) I tested the latest FlashROM from SVN on a Harpertown-based X6250 > (doesn''t work) and a AMD-based x6220 machine (seems to work): > > # svn info > Path: . > URL: svn://coreboot.org/repos/trunk/util/flashrom > Repository Root: svn://coreboot.org/repos > Repository UUID: 2b7e53f0-3cfb-0310-b3e9-8179ed1497e1 > Revision: 3535 > Node Kind: directory > Schedule: normal > Last Changed Author: hailfinger > Last Changed Rev: 3532 > Last Changed Date: 2008-08-20 14:31:41 -0600 (Wed, 20 Aug 2008) > > harpertown-box# ./flashrom > Calibrating delay loop... OK. > No coreboot table found. > WARNING: No chipset found. Flash detection will most likely > fail. > No EEPROM/flash device found. > If you know which flash chip you have, and if this version of > flashrom > supports a similar flash chip, you can try to force read your > chip. Run: > flashrom -f -r -c similar_supported_flash_chip filename > > Note: flashrom can never write when the flash chip isn''t found > automatically. > > amd-box# ./flashrom > Calibrating delay loop... OK. > No coreboot table found. > Found chipset "NVIDIA CK804", enabling flash write... OK. > Found chip "ST M50FLW080A" (1024 KB) at physical address > 0xfff00000. > No operations were specified. > > _______________________________________________ > Linux_hpc_swstack mailing list > Linux_hpc_swstack at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack
Zhiqi Tao
2008-Sep-11 07:48 UTC
[Linux_hpc_swstack] Feature Request: BIOS Tools and Improved BIOS/FW Release Notes
Dear Matthew, I investigated bios updating function on Sun x86 server. At the moment we offer firmware update through both web interface and command line. I guess you have probably tried them. What''s your experience on that? Instead of using third party, I think it would be a simpler solution of developing a shell script to ssh server sp or CMM and update firmware via the following procedure. load -source tftp://10.15.11.200:ilom.X4100M2-2.0.2.5-r30859-bios79.ima Regards, Zhiqi Zhiqi Tao wrote:> Dear Matthew, > > I completely understand the inconvenience of not being able to automate > BIOS update in a large scale server environment. I created one ticket in > bugzilla for this request. > > Bug 17051 Feature Request: BIOS Tools and Improved BIOS/FW Release Notes > > I appreciate your effort of compiling such a detailed specification. I > will investigate what would be the best approach to address this feature. > > Best Regards, > Zhiqi > > > Matthew Bohnsack wrote: >> Hello, >> >> I would like to make a feature request for a future release of Sun(TM) >> HPC Software, Linux Edition. >> >> In a nutshell, we need the following for a large system that will be >> implemented over the course of the next few months: >> >> A. Linux commandline utilities that can be used to read and write >> the BIOS images of Sun 6000-series blades. >> B. Improved (or simply existent ;)) release notes for BIOS and >> other firmware for Sun 6000-series blades. >> >> Some might consider this request more appropriate for Sun hardware >> engineering than it is for Sun Linux cluster software development. >> However, I''m asking here, because I believe the requirement for this >> functionality flows very naturally from a Linux cluster software stack. >> That is, IMHO, without this functionality, a Linux cluster software >> stack is incomplete and the robust implementation of a large-scale >> production system is very difficult. >> >> Details of my request... >> >> Requirement A. Linux commandline utilities for BIOS read/write >> ===============================================================>> >> Description: >> >> We need a Linux commandline utility that: >> >> * Can read a BIOS image from a node''s EEPROM, including firmware >> revision and settings and then write this image to a file. >> * Can write a previously created BIOS image to a node''s EEPROM. >> * Will not overwrite node VPD. E.g., if I save a BIOS image from >> node1 to disk and then write that image to node2, node1 and >> node2 should retain unique VPD information such as serial >> numbers that are viewable from dmidecode, IMPI utilities, etc. >> * Works on current Harpertown-based X6250 blades >> * Works on future Gainestown-based X???? blades >> >> Note that while this utility would enable BIOS revision updates, this >> type of update is currently possible via a SMASH/TFTP process that can >> be easily automated. I.e., it''s making the BIOS settings changes that''s >> today''s key missing piece of functionality. >> >> Justification: >> >> For implementation-specific technical reasons and because of >> HPC-and-scale-related manageability requirements, we need this >> functionality to effectively implement and manage the Linux cluster I''m >> currently working on. Some details: >> >> * We have blades that will only have an Infiniband HCA to boot >> from - there will be no Ethernet or hard drives. These blades >> will be unusable unless certain BIOS settings are made that >> don''t ship as the default. E.g., PCI expansion ROM functions >> need to be enabled. Because of this requirement and the scale >> of our system, we require an automated solution. Manually >> making these BIOS settings changes at our scale is not feasible. >> * It would be nice to have the required BIOS settings shipped from >> the factory, but this would not be enough, because I have seen >> BIOS settings revert to a state useless for Boot-over-IB with >> certain combinations of power cycle and unseat/reseat with the >> X6250-based system I''m currently working on. In other words, we >> need these tools to implement a robust break/fix process. >> * We''re going to be running tightly-coupled parallel codes that >> will only go as fast as the slowest node. Because BIOS >> revisions and settings can have important effects on single node >> performance, and we are sure to identify beneficial BIOS >> settings changes in the future that aren''t known today, a >> facility to automate BIOS settings changes has to exist for us >> to get the maximum benefit from our machine. >> * We need a way to automatically validate a consistent BIOS state, >> in terms of revision and settings because: >> * A BIOS setting could be accidentally changed by a system >> administrator, and we want to find any issues caused by >> this kind of thing ASAP. >> * Replacement hardware could ship from the factory with an >> unknown state, and we must have a consistent process for >> taking it to a known state and validating that state. >> * The BIOS''s writable EEPROM could represent a security >> risk, unless automated tools can be written to validate >> EEPROM state. >> >> >> Preferred Implementation: >> >> We prefer an OpenSource solution running from Linux userspace, based on >> coreboot''s flashrom. See notes about tests performed with this tool at >> the end of this note and the website: http://www.coreboot.org/Flashrom >> >> Other Less Desirable Possibilities: >> >> * Sun-proprietary Linux tools that implement the required >> functionality could possibly work as well as the coreboot tools. >> * SP/SMASH-CLI-based tools that implement the required >> functionality would enable "Expect"-like scripting tools to be >> developed. >> * DOS-based utilities that that implement the required >> functionality would enable tools based on network-bootable DOS >> images to be developed. This would be lame, but it could be >> made to work. >> * "Expect"-like scripting tools could be developed that >> "screen-scrape" the BIOS setup screens over the serial console. >> This solution would be very brittle and least desirable of all >> the options. >> >> >> Requirement B. Improved changelogs for BIOS and other firmware >> ===============================================================>> >> Description: >> >> Firmware updates for Sun 6000 and 6048-based systems are currently >> available for download at: >> >> http://www.sun.com/servers/blades/downloads.jsp >> >> While updates for some hardware (e.g., Sun Blade X6220 Server Module >> Software 2.0) has changelog detail, other updates that I''m interested in >> (e.g., Sun Blade X6250 Server Module 1.3.3, 1.3.2, and 1.3.1) contain no >> change information whatsoever. I need changelog information for >> firmware that applies to X6250 machines today and the blades based on >> Intel Gainestown CPUs when they are released. >> >> Justification: >> >> Updates and settings changes made to BIOS, systems management, and other >> firmware can have a significant effect on a system, in terms of system >> performance, stability, and manageability. All of these factors are of >> critical importance on a large-scale HPC system, because seemingly small >> firmware improvements/tweaks can have major impacts at scale. It >> follows that HPC system managers are often eager to apply new firmware >> updates on their HPC systems. However, the application of firmware >> updates can be costly on a large system. E.g, disrupting production >> jobs on thousands of nodes to do a firmware update has a serious impact >> in terms of lost throughput. Therefore, an HPC system manager must >> balance the cost of firmware updates with the possible improvements that >> the firmware updates could provide. This analysis is required, but not >> possible when proper changelogs describing new firmware releases do not >> exist. >> >> Other Notes >> ===============================================================>> >> 1) I''ve previously made similar requests on the Sun Hardware blade >> forum: >> >> * http://forums.sun.com/thread.jspa?threadID=5316894&tstart=0 >> * http://forums.sun.com/thread.jspa?threadID=5316461&tstart=0 >> >> 2) Details on required BIOS setting changes (among other things): >> http://forums.sun.com/thread.jspa?threadID=5316472&tstart=0 >> >> 3) I tested the latest FlashROM from SVN on a Harpertown-based X6250 >> (doesn''t work) and a AMD-based x6220 machine (seems to work): >> >> # svn info >> Path: . >> URL: svn://coreboot.org/repos/trunk/util/flashrom >> Repository Root: svn://coreboot.org/repos >> Repository UUID: 2b7e53f0-3cfb-0310-b3e9-8179ed1497e1 >> Revision: 3535 >> Node Kind: directory >> Schedule: normal >> Last Changed Author: hailfinger >> Last Changed Rev: 3532 >> Last Changed Date: 2008-08-20 14:31:41 -0600 (Wed, 20 Aug 2008) >> >> harpertown-box# ./flashrom >> Calibrating delay loop... OK. >> No coreboot table found. >> WARNING: No chipset found. Flash detection will most likely >> fail. >> No EEPROM/flash device found. >> If you know which flash chip you have, and if this version of >> flashrom >> supports a similar flash chip, you can try to force read your >> chip. Run: >> flashrom -f -r -c similar_supported_flash_chip filename >> >> Note: flashrom can never write when the flash chip isn''t found >> automatically. >> >> amd-box# ./flashrom >> Calibrating delay loop... OK. >> No coreboot table found. >> Found chipset "NVIDIA CK804", enabling flash write... OK. >> Found chip "ST M50FLW080A" (1024 KB) at physical address >> 0xfff00000. >> No operations were specified. >> >> _______________________________________________ >> Linux_hpc_swstack mailing list >> Linux_hpc_swstack at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack > _______________________________________________ > Linux_hpc_swstack mailing list > Linux_hpc_swstack at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack
Makia Minich
2008-Sep-11 08:09 UTC
[Linux_hpc_swstack] Feature Request: BIOS Tools and Improved BIOS/FW Release Notes
Zhiqi, While this method will work (concerns with reliability and scalability of the solution put to the side) one major piece is missing. We need to find some way to be able to influence BIOS settings as well (these flash methods will change the firmware, but not change settings). The reason for approaching this with something like FlashROM is looking for a method that will not only allow for pulling a copy of a systems BIOS, but also the settings along with it. In this method, you have an image/solution that will allow you to quickly flash multiple nodes with the settings that you prefer. A harder, yet more preferred, solution would be to expose the BIOS settings to either the service processor or to Linux itself (though in some instances you can''t get to Linux until you make changes to the BIOS). As it stands, the method is to use the serial console in a broadcast mode across the system (or with expect via scripting); but this method proves to be quite unreliable when scaling up. In general, we need to get a software and hardware solution. Zhiqi Tao wrote:> Dear Matthew, > > I investigated bios updating function on Sun x86 server. At the moment > we offer firmware update through both web interface and command line. I > guess you have probably tried them. What''s your experience on that? > > Instead of using third party, I think it would be a simpler solution of > developing a shell script to ssh server sp or CMM and update firmware > via the following procedure. > > load -source tftp://10.15.11.200:ilom.X4100M2-2.0.2.5-r30859-bios79.ima > > Regards, > Zhiqi > > > Zhiqi Tao wrote: >> Dear Matthew, >> >> I completely understand the inconvenience of not being able to automate >> BIOS update in a large scale server environment. I created one ticket in >> bugzilla for this request. >> >> Bug 17051 Feature Request: BIOS Tools and Improved BIOS/FW Release Notes >> >> I appreciate your effort of compiling such a detailed specification. I >> will investigate what would be the best approach to address this feature. >> >> Best Regards, >> Zhiqi >> >> >> Matthew Bohnsack wrote: >>> Hello, >>> >>> I would like to make a feature request for a future release of Sun(TM) >>> HPC Software, Linux Edition. >>> >>> In a nutshell, we need the following for a large system that will be >>> implemented over the course of the next few months: >>> >>> A. Linux commandline utilities that can be used to read and write >>> the BIOS images of Sun 6000-series blades. >>> B. Improved (or simply existent ;)) release notes for BIOS and >>> other firmware for Sun 6000-series blades. >>> >>> Some might consider this request more appropriate for Sun hardware >>> engineering than it is for Sun Linux cluster software development. >>> However, I''m asking here, because I believe the requirement for this >>> functionality flows very naturally from a Linux cluster software stack. >>> That is, IMHO, without this functionality, a Linux cluster software >>> stack is incomplete and the robust implementation of a large-scale >>> production system is very difficult. >>> >>> Details of my request... >>> >>> Requirement A. Linux commandline utilities for BIOS read/write >>> ===============================================================>>> >>> Description: >>> >>> We need a Linux commandline utility that: >>> >>> * Can read a BIOS image from a node''s EEPROM, including firmware >>> revision and settings and then write this image to a file. >>> * Can write a previously created BIOS image to a node''s EEPROM. >>> * Will not overwrite node VPD. E.g., if I save a BIOS image from >>> node1 to disk and then write that image to node2, node1 and >>> node2 should retain unique VPD information such as serial >>> numbers that are viewable from dmidecode, IMPI utilities, etc. >>> * Works on current Harpertown-based X6250 blades >>> * Works on future Gainestown-based X???? blades >>> >>> Note that while this utility would enable BIOS revision updates, this >>> type of update is currently possible via a SMASH/TFTP process that can >>> be easily automated. I.e., it''s making the BIOS settings changes that''s >>> today''s key missing piece of functionality. >>> >>> Justification: >>> >>> For implementation-specific technical reasons and because of >>> HPC-and-scale-related manageability requirements, we need this >>> functionality to effectively implement and manage the Linux cluster I''m >>> currently working on. Some details: >>> >>> * We have blades that will only have an Infiniband HCA to boot >>> from - there will be no Ethernet or hard drives. These blades >>> will be unusable unless certain BIOS settings are made that >>> don''t ship as the default. E.g., PCI expansion ROM functions >>> need to be enabled. Because of this requirement and the scale >>> of our system, we require an automated solution. Manually >>> making these BIOS settings changes at our scale is not feasible. >>> * It would be nice to have the required BIOS settings shipped from >>> the factory, but this would not be enough, because I have seen >>> BIOS settings revert to a state useless for Boot-over-IB with >>> certain combinations of power cycle and unseat/reseat with the >>> X6250-based system I''m currently working on. In other words, we >>> need these tools to implement a robust break/fix process. >>> * We''re going to be running tightly-coupled parallel codes that >>> will only go as fast as the slowest node. Because BIOS >>> revisions and settings can have important effects on single node >>> performance, and we are sure to identify beneficial BIOS >>> settings changes in the future that aren''t known today, a >>> facility to automate BIOS settings changes has to exist for us >>> to get the maximum benefit from our machine. >>> * We need a way to automatically validate a consistent BIOS state, >>> in terms of revision and settings because: >>> * A BIOS setting could be accidentally changed by a system >>> administrator, and we want to find any issues caused by >>> this kind of thing ASAP. >>> * Replacement hardware could ship from the factory with an >>> unknown state, and we must have a consistent process for >>> taking it to a known state and validating that state. >>> * The BIOS''s writable EEPROM could represent a security >>> risk, unless automated tools can be written to validate >>> EEPROM state. >>> >>> >>> Preferred Implementation: >>> >>> We prefer an OpenSource solution running from Linux userspace, based on >>> coreboot''s flashrom. See notes about tests performed with this tool at >>> the end of this note and the website: http://www.coreboot.org/Flashrom >>> >>> Other Less Desirable Possibilities: >>> >>> * Sun-proprietary Linux tools that implement the required >>> functionality could possibly work as well as the coreboot tools. >>> * SP/SMASH-CLI-based tools that implement the required >>> functionality would enable "Expect"-like scripting tools to be >>> developed. >>> * DOS-based utilities that that implement the required >>> functionality would enable tools based on network-bootable DOS >>> images to be developed. This would be lame, but it could be >>> made to work. >>> * "Expect"-like scripting tools could be developed that >>> "screen-scrape" the BIOS setup screens over the serial console. >>> This solution would be very brittle and least desirable of all >>> the options. >>> >>> >>> Requirement B. Improved changelogs for BIOS and other firmware >>> ===============================================================>>> >>> Description: >>> >>> Firmware updates for Sun 6000 and 6048-based systems are currently >>> available for download at: >>> >>> http://www.sun.com/servers/blades/downloads.jsp >>> >>> While updates for some hardware (e.g., Sun Blade X6220 Server Module >>> Software 2.0) has changelog detail, other updates that I''m interested in >>> (e.g., Sun Blade X6250 Server Module 1.3.3, 1.3.2, and 1.3.1) contain no >>> change information whatsoever. I need changelog information for >>> firmware that applies to X6250 machines today and the blades based on >>> Intel Gainestown CPUs when they are released. >>> >>> Justification: >>> >>> Updates and settings changes made to BIOS, systems management, and other >>> firmware can have a significant effect on a system, in terms of system >>> performance, stability, and manageability. All of these factors are of >>> critical importance on a large-scale HPC system, because seemingly small >>> firmware improvements/tweaks can have major impacts at scale. It >>> follows that HPC system managers are often eager to apply new firmware >>> updates on their HPC systems. However, the application of firmware >>> updates can be costly on a large system. E.g, disrupting production >>> jobs on thousands of nodes to do a firmware update has a serious impact >>> in terms of lost throughput. Therefore, an HPC system manager must >>> balance the cost of firmware updates with the possible improvements that >>> the firmware updates could provide. This analysis is required, but not >>> possible when proper changelogs describing new firmware releases do not >>> exist. >>> >>> Other Notes >>> ===============================================================>>> >>> 1) I''ve previously made similar requests on the Sun Hardware blade >>> forum: >>> >>> * http://forums.sun.com/thread.jspa?threadID=5316894&tstart=0 >>> * http://forums.sun.com/thread.jspa?threadID=5316461&tstart=0 >>> >>> 2) Details on required BIOS setting changes (among other things): >>> http://forums.sun.com/thread.jspa?threadID=5316472&tstart=0 >>> >>> 3) I tested the latest FlashROM from SVN on a Harpertown-based X6250 >>> (doesn''t work) and a AMD-based x6220 machine (seems to work): >>> >>> # svn info >>> Path: . >>> URL: svn://coreboot.org/repos/trunk/util/flashrom >>> Repository Root: svn://coreboot.org/repos >>> Repository UUID: 2b7e53f0-3cfb-0310-b3e9-8179ed1497e1 >>> Revision: 3535 >>> Node Kind: directory >>> Schedule: normal >>> Last Changed Author: hailfinger >>> Last Changed Rev: 3532 >>> Last Changed Date: 2008-08-20 14:31:41 -0600 (Wed, 20 Aug 2008) >>> >>> harpertown-box# ./flashrom >>> Calibrating delay loop... OK. >>> No coreboot table found. >>> WARNING: No chipset found. Flash detection will most likely >>> fail. >>> No EEPROM/flash device found. >>> If you know which flash chip you have, and if this version of >>> flashrom >>> supports a similar flash chip, you can try to force read your >>> chip. Run: >>> flashrom -f -r -c similar_supported_flash_chip filename >>> >>> Note: flashrom can never write when the flash chip isn''t found >>> automatically. >>> >>> amd-box# ./flashrom >>> Calibrating delay loop... OK. >>> No coreboot table found. >>> Found chipset "NVIDIA CK804", enabling flash write... OK. >>> Found chip "ST M50FLW080A" (1024 KB) at physical address >>> 0xfff00000. >>> No operations were specified. >>> >>> _______________________________________________ >>> Linux_hpc_swstack mailing list >>> Linux_hpc_swstack at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack >> _______________________________________________ >> Linux_hpc_swstack mailing list >> Linux_hpc_swstack at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack > _______________________________________________ > Linux_hpc_swstack mailing list > Linux_hpc_swstack at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack-- "A simile is not a lie, unless it is a bad simile." - Christopher John Francis Boone
Atul Vidwansa
2008-Sep-11 08:15 UTC
[Linux_hpc_swstack] Feature Request: BIOS Tools and Improved BIOS/FW Release Notes
How about using Sun Installation Assistant? It already supports number of Sun serves and has an interface with ILOM. http://www.sun.com/systemmanagement/sia.jsp Cheers, _Atul Zhiqi Tao wrote:> Dear Matthew, > > I investigated bios updating function on Sun x86 server. At the moment > we offer firmware update through both web interface and command line. I > guess you have probably tried them. What''s your experience on that? > > Instead of using third party, I think it would be a simpler solution of > developing a shell script to ssh server sp or CMM and update firmware > via the following procedure. > > load -source tftp://10.15.11.200:ilom.X4100M2-2.0.2.5-r30859-bios79.ima > > Regards, > Zhiqi > > > Zhiqi Tao wrote: > >> Dear Matthew, >> >> I completely understand the inconvenience of not being able to automate >> BIOS update in a large scale server environment. I created one ticket in >> bugzilla for this request. >> >> Bug 17051 Feature Request: BIOS Tools and Improved BIOS/FW Release Notes >> >> I appreciate your effort of compiling such a detailed specification. I >> will investigate what would be the best approach to address this feature. >> >> Best Regards, >> Zhiqi >> >> >> Matthew Bohnsack wrote: >> >>> Hello, >>> >>> I would like to make a feature request for a future release of Sun(TM) >>> HPC Software, Linux Edition. >>> >>> In a nutshell, we need the following for a large system that will be >>> implemented over the course of the next few months: >>> >>> A. Linux commandline utilities that can be used to read and write >>> the BIOS images of Sun 6000-series blades. >>> B. Improved (or simply existent ;)) release notes for BIOS and >>> other firmware for Sun 6000-series blades. >>> >>> Some might consider this request more appropriate for Sun hardware >>> engineering than it is for Sun Linux cluster software development. >>> However, I''m asking here, because I believe the requirement for this >>> functionality flows very naturally from a Linux cluster software stack. >>> That is, IMHO, without this functionality, a Linux cluster software >>> stack is incomplete and the robust implementation of a large-scale >>> production system is very difficult. >>> >>> Details of my request... >>> >>> Requirement A. Linux commandline utilities for BIOS read/write >>> ===============================================================>>> >>> Description: >>> >>> We need a Linux commandline utility that: >>> >>> * Can read a BIOS image from a node''s EEPROM, including firmware >>> revision and settings and then write this image to a file. >>> * Can write a previously created BIOS image to a node''s EEPROM. >>> * Will not overwrite node VPD. E.g., if I save a BIOS image from >>> node1 to disk and then write that image to node2, node1 and >>> node2 should retain unique VPD information such as serial >>> numbers that are viewable from dmidecode, IMPI utilities, etc. >>> * Works on current Harpertown-based X6250 blades >>> * Works on future Gainestown-based X???? blades >>> >>> Note that while this utility would enable BIOS revision updates, this >>> type of update is currently possible via a SMASH/TFTP process that can >>> be easily automated. I.e., it''s making the BIOS settings changes that''s >>> today''s key missing piece of functionality. >>> >>> Justification: >>> >>> For implementation-specific technical reasons and because of >>> HPC-and-scale-related manageability requirements, we need this >>> functionality to effectively implement and manage the Linux cluster I''m >>> currently working on. Some details: >>> >>> * We have blades that will only have an Infiniband HCA to boot >>> from - there will be no Ethernet or hard drives. These blades >>> will be unusable unless certain BIOS settings are made that >>> don''t ship as the default. E.g., PCI expansion ROM functions >>> need to be enabled. Because of this requirement and the scale >>> of our system, we require an automated solution. Manually >>> making these BIOS settings changes at our scale is not feasible. >>> * It would be nice to have the required BIOS settings shipped from >>> the factory, but this would not be enough, because I have seen >>> BIOS settings revert to a state useless for Boot-over-IB with >>> certain combinations of power cycle and unseat/reseat with the >>> X6250-based system I''m currently working on. In other words, we >>> need these tools to implement a robust break/fix process. >>> * We''re going to be running tightly-coupled parallel codes that >>> will only go as fast as the slowest node. Because BIOS >>> revisions and settings can have important effects on single node >>> performance, and we are sure to identify beneficial BIOS >>> settings changes in the future that aren''t known today, a >>> facility to automate BIOS settings changes has to exist for us >>> to get the maximum benefit from our machine. >>> * We need a way to automatically validate a consistent BIOS state, >>> in terms of revision and settings because: >>> * A BIOS setting could be accidentally changed by a system >>> administrator, and we want to find any issues caused by >>> this kind of thing ASAP. >>> * Replacement hardware could ship from the factory with an >>> unknown state, and we must have a consistent process for >>> taking it to a known state and validating that state. >>> * The BIOS''s writable EEPROM could represent a security >>> risk, unless automated tools can be written to validate >>> EEPROM state. >>> >>> >>> Preferred Implementation: >>> >>> We prefer an OpenSource solution running from Linux userspace, based on >>> coreboot''s flashrom. See notes about tests performed with this tool at >>> the end of this note and the website: http://www.coreboot.org/Flashrom >>> >>> Other Less Desirable Possibilities: >>> >>> * Sun-proprietary Linux tools that implement the required >>> functionality could possibly work as well as the coreboot tools. >>> * SP/SMASH-CLI-based tools that implement the required >>> functionality would enable "Expect"-like scripting tools to be >>> developed. >>> * DOS-based utilities that that implement the required >>> functionality would enable tools based on network-bootable DOS >>> images to be developed. This would be lame, but it could be >>> made to work. >>> * "Expect"-like scripting tools could be developed that >>> "screen-scrape" the BIOS setup screens over the serial console. >>> This solution would be very brittle and least desirable of all >>> the options. >>> >>> >>> Requirement B. Improved changelogs for BIOS and other firmware >>> ===============================================================>>> >>> Description: >>> >>> Firmware updates for Sun 6000 and 6048-based systems are currently >>> available for download at: >>> >>> http://www.sun.com/servers/blades/downloads.jsp >>> >>> While updates for some hardware (e.g., Sun Blade X6220 Server Module >>> Software 2.0) has changelog detail, other updates that I''m interested in >>> (e.g., Sun Blade X6250 Server Module 1.3.3, 1.3.2, and 1.3.1) contain no >>> change information whatsoever. I need changelog information for >>> firmware that applies to X6250 machines today and the blades based on >>> Intel Gainestown CPUs when they are released. >>> >>> Justification: >>> >>> Updates and settings changes made to BIOS, systems management, and other >>> firmware can have a significant effect on a system, in terms of system >>> performance, stability, and manageability. All of these factors are of >>> critical importance on a large-scale HPC system, because seemingly small >>> firmware improvements/tweaks can have major impacts at scale. It >>> follows that HPC system managers are often eager to apply new firmware >>> updates on their HPC systems. However, the application of firmware >>> updates can be costly on a large system. E.g, disrupting production >>> jobs on thousands of nodes to do a firmware update has a serious impact >>> in terms of lost throughput. Therefore, an HPC system manager must >>> balance the cost of firmware updates with the possible improvements that >>> the firmware updates could provide. This analysis is required, but not >>> possible when proper changelogs describing new firmware releases do not >>> exist. >>> >>> Other Notes >>> ===============================================================>>> >>> 1) I''ve previously made similar requests on the Sun Hardware blade >>> forum: >>> >>> * http://forums.sun.com/thread.jspa?threadID=5316894&tstart=0 >>> * http://forums.sun.com/thread.jspa?threadID=5316461&tstart=0 >>> >>> 2) Details on required BIOS setting changes (among other things): >>> http://forums.sun.com/thread.jspa?threadID=5316472&tstart=0 >>> >>> 3) I tested the latest FlashROM from SVN on a Harpertown-based X6250 >>> (doesn''t work) and a AMD-based x6220 machine (seems to work): >>> >>> # svn info >>> Path: . >>> URL: svn://coreboot.org/repos/trunk/util/flashrom >>> Repository Root: svn://coreboot.org/repos >>> Repository UUID: 2b7e53f0-3cfb-0310-b3e9-8179ed1497e1 >>> Revision: 3535 >>> Node Kind: directory >>> Schedule: normal >>> Last Changed Author: hailfinger >>> Last Changed Rev: 3532 >>> Last Changed Date: 2008-08-20 14:31:41 -0600 (Wed, 20 Aug 2008) >>> >>> harpertown-box# ./flashrom >>> Calibrating delay loop... OK. >>> No coreboot table found. >>> WARNING: No chipset found. Flash detection will most likely >>> fail. >>> No EEPROM/flash device found. >>> If you know which flash chip you have, and if this version of >>> flashrom >>> supports a similar flash chip, you can try to force read your >>> chip. Run: >>> flashrom -f -r -c similar_supported_flash_chip filename >>> >>> Note: flashrom can never write when the flash chip isn''t found >>> automatically. >>> >>> amd-box# ./flashrom >>> Calibrating delay loop... OK. >>> No coreboot table found. >>> Found chipset "NVIDIA CK804", enabling flash write... OK. >>> Found chip "ST M50FLW080A" (1024 KB) at physical address >>> 0xfff00000. >>> No operations were specified. >>> >>> _______________________________________________ >>> Linux_hpc_swstack mailing list >>> Linux_hpc_swstack at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack >>> >> _______________________________________________ >> Linux_hpc_swstack mailing list >> Linux_hpc_swstack at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack >> > _______________________________________________ > Linux_hpc_swstack mailing list > Linux_hpc_swstack at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack >-- =================================Atul Vidwansa Sun Microsystems Australia Pty Ltd Web: http://blogs.sun.com/atulvid Email: Atul.Vidwansa at Sun.COM
Makia Minich
2008-Sep-11 08:31 UTC
[Linux_hpc_swstack] Feature Request: BIOS Tools and Improved BIOS/FW Release Notes
Looking at the manual, it talks about setting BIOS settings by physically watching for the BIOS screen and going into the BIOS menus to make changes (http://docs.sun.com/source/820-3357-14/SIA_Thumbdrv_Appdx.html#0_pgfId-1004992). Do you happen to know of any Sun Tools that can actually change BIOS (we need to start thinking hundreds of nodes and not five)? Atul Vidwansa wrote:> How about using Sun Installation Assistant? It already supports number > of Sun serves and has an interface with ILOM. > > http://www.sun.com/systemmanagement/sia.jsp > > Cheers, > _Atul > > Zhiqi Tao wrote: >> Dear Matthew, >> >> I investigated bios updating function on Sun x86 server. At the moment >> we offer firmware update through both web interface and command line. I >> guess you have probably tried them. What''s your experience on that? >> >> Instead of using third party, I think it would be a simpler solution of >> developing a shell script to ssh server sp or CMM and update firmware >> via the following procedure. >> >> load -source tftp://10.15.11.200:ilom.X4100M2-2.0.2.5-r30859-bios79.ima >> >> Regards, >> Zhiqi >> >> >> Zhiqi Tao wrote: >> >>> Dear Matthew, >>> >>> I completely understand the inconvenience of not being able to automate >>> BIOS update in a large scale server environment. I created one ticket in >>> bugzilla for this request. >>> >>> Bug 17051 Feature Request: BIOS Tools and Improved BIOS/FW Release Notes >>> >>> I appreciate your effort of compiling such a detailed specification. I >>> will investigate what would be the best approach to address this feature. >>> >>> Best Regards, >>> Zhiqi >>> >>> >>> Matthew Bohnsack wrote: >>> >>>> Hello, >>>> >>>> I would like to make a feature request for a future release of Sun(TM) >>>> HPC Software, Linux Edition. >>>> >>>> In a nutshell, we need the following for a large system that will be >>>> implemented over the course of the next few months: >>>> >>>> A. Linux commandline utilities that can be used to read and write >>>> the BIOS images of Sun 6000-series blades. >>>> B. Improved (or simply existent ;)) release notes for BIOS and >>>> other firmware for Sun 6000-series blades. >>>> >>>> Some might consider this request more appropriate for Sun hardware >>>> engineering than it is for Sun Linux cluster software development. >>>> However, I''m asking here, because I believe the requirement for this >>>> functionality flows very naturally from a Linux cluster software stack. >>>> That is, IMHO, without this functionality, a Linux cluster software >>>> stack is incomplete and the robust implementation of a large-scale >>>> production system is very difficult. >>>> >>>> Details of my request... >>>> >>>> Requirement A. Linux commandline utilities for BIOS read/write >>>> ===============================================================>>>> >>>> Description: >>>> >>>> We need a Linux commandline utility that: >>>> >>>> * Can read a BIOS image from a node''s EEPROM, including firmware >>>> revision and settings and then write this image to a file. >>>> * Can write a previously created BIOS image to a node''s EEPROM. >>>> * Will not overwrite node VPD. E.g., if I save a BIOS image from >>>> node1 to disk and then write that image to node2, node1 and >>>> node2 should retain unique VPD information such as serial >>>> numbers that are viewable from dmidecode, IMPI utilities, etc. >>>> * Works on current Harpertown-based X6250 blades >>>> * Works on future Gainestown-based X???? blades >>>> >>>> Note that while this utility would enable BIOS revision updates, this >>>> type of update is currently possible via a SMASH/TFTP process that can >>>> be easily automated. I.e., it''s making the BIOS settings changes that''s >>>> today''s key missing piece of functionality. >>>> >>>> Justification: >>>> >>>> For implementation-specific technical reasons and because of >>>> HPC-and-scale-related manageability requirements, we need this >>>> functionality to effectively implement and manage the Linux cluster I''m >>>> currently working on. Some details: >>>> >>>> * We have blades that will only have an Infiniband HCA to boot >>>> from - there will be no Ethernet or hard drives. These blades >>>> will be unusable unless certain BIOS settings are made that >>>> don''t ship as the default. E.g., PCI expansion ROM functions >>>> need to be enabled. Because of this requirement and the scale >>>> of our system, we require an automated solution. Manually >>>> making these BIOS settings changes at our scale is not feasible. >>>> * It would be nice to have the required BIOS settings shipped from >>>> the factory, but this would not be enough, because I have seen >>>> BIOS settings revert to a state useless for Boot-over-IB with >>>> certain combinations of power cycle and unseat/reseat with the >>>> X6250-based system I''m currently working on. In other words, we >>>> need these tools to implement a robust break/fix process. >>>> * We''re going to be running tightly-coupled parallel codes that >>>> will only go as fast as the slowest node. Because BIOS >>>> revisions and settings can have important effects on single node >>>> performance, and we are sure to identify beneficial BIOS >>>> settings changes in the future that aren''t known today, a >>>> facility to automate BIOS settings changes has to exist for us >>>> to get the maximum benefit from our machine. >>>> * We need a way to automatically validate a consistent BIOS state, >>>> in terms of revision and settings because: >>>> * A BIOS setting could be accidentally changed by a system >>>> administrator, and we want to find any issues caused by >>>> this kind of thing ASAP. >>>> * Replacement hardware could ship from the factory with an >>>> unknown state, and we must have a consistent process for >>>> taking it to a known state and validating that state. >>>> * The BIOS''s writable EEPROM could represent a security >>>> risk, unless automated tools can be written to validate >>>> EEPROM state. >>>> >>>> >>>> Preferred Implementation: >>>> >>>> We prefer an OpenSource solution running from Linux userspace, based on >>>> coreboot''s flashrom. See notes about tests performed with this tool at >>>> the end of this note and the website: http://www.coreboot.org/Flashrom >>>> >>>> Other Less Desirable Possibilities: >>>> >>>> * Sun-proprietary Linux tools that implement the required >>>> functionality could possibly work as well as the coreboot tools. >>>> * SP/SMASH-CLI-based tools that implement the required >>>> functionality would enable "Expect"-like scripting tools to be >>>> developed. >>>> * DOS-based utilities that that implement the required >>>> functionality would enable tools based on network-bootable DOS >>>> images to be developed. This would be lame, but it could be >>>> made to work. >>>> * "Expect"-like scripting tools could be developed that >>>> "screen-scrape" the BIOS setup screens over the serial console. >>>> This solution would be very brittle and least desirable of all >>>> the options. >>>> >>>> >>>> Requirement B. Improved changelogs for BIOS and other firmware >>>> ===============================================================>>>> >>>> Description: >>>> >>>> Firmware updates for Sun 6000 and 6048-based systems are currently >>>> available for download at: >>>> >>>> http://www.sun.com/servers/blades/downloads.jsp >>>> >>>> While updates for some hardware (e.g., Sun Blade X6220 Server Module >>>> Software 2.0) has changelog detail, other updates that I''m interested in >>>> (e.g., Sun Blade X6250 Server Module 1.3.3, 1.3.2, and 1.3.1) contain no >>>> change information whatsoever. I need changelog information for >>>> firmware that applies to X6250 machines today and the blades based on >>>> Intel Gainestown CPUs when they are released. >>>> >>>> Justification: >>>> >>>> Updates and settings changes made to BIOS, systems management, and other >>>> firmware can have a significant effect on a system, in terms of system >>>> performance, stability, and manageability. All of these factors are of >>>> critical importance on a large-scale HPC system, because seemingly small >>>> firmware improvements/tweaks can have major impacts at scale. It >>>> follows that HPC system managers are often eager to apply new firmware >>>> updates on their HPC systems. However, the application of firmware >>>> updates can be costly on a large system. E.g, disrupting production >>>> jobs on thousands of nodes to do a firmware update has a serious impact >>>> in terms of lost throughput. Therefore, an HPC system manager must >>>> balance the cost of firmware updates with the possible improvements that >>>> the firmware updates could provide. This analysis is required, but not >>>> possible when proper changelogs describing new firmware releases do not >>>> exist. >>>> >>>> Other Notes >>>> ===============================================================>>>> >>>> 1) I''ve previously made similar requests on the Sun Hardware blade >>>> forum: >>>> >>>> * http://forums.sun.com/thread.jspa?threadID=5316894&tstart=0 >>>> * http://forums.sun.com/thread.jspa?threadID=5316461&tstart=0 >>>> >>>> 2) Details on required BIOS setting changes (among other things): >>>> http://forums.sun.com/thread.jspa?threadID=5316472&tstart=0 >>>> >>>> 3) I tested the latest FlashROM from SVN on a Harpertown-based X6250 >>>> (doesn''t work) and a AMD-based x6220 machine (seems to work): >>>> >>>> # svn info >>>> Path: . >>>> URL: svn://coreboot.org/repos/trunk/util/flashrom >>>> Repository Root: svn://coreboot.org/repos >>>> Repository UUID: 2b7e53f0-3cfb-0310-b3e9-8179ed1497e1 >>>> Revision: 3535 >>>> Node Kind: directory >>>> Schedule: normal >>>> Last Changed Author: hailfinger >>>> Last Changed Rev: 3532 >>>> Last Changed Date: 2008-08-20 14:31:41 -0600 (Wed, 20 Aug 2008) >>>> >>>> harpertown-box# ./flashrom >>>> Calibrating delay loop... OK. >>>> No coreboot table found. >>>> WARNING: No chipset found. Flash detection will most likely >>>> fail. >>>> No EEPROM/flash device found. >>>> If you know which flash chip you have, and if this version of >>>> flashrom >>>> supports a similar flash chip, you can try to force read your >>>> chip. Run: >>>> flashrom -f -r -c similar_supported_flash_chip filename >>>> >>>> Note: flashrom can never write when the flash chip isn''t found >>>> automatically. >>>> >>>> amd-box# ./flashrom >>>> Calibrating delay loop... OK. >>>> No coreboot table found. >>>> Found chipset "NVIDIA CK804", enabling flash write... OK. >>>> Found chip "ST M50FLW080A" (1024 KB) at physical address >>>> 0xfff00000. >>>> No operations were specified. >>>> >>>> _______________________________________________ >>>> Linux_hpc_swstack mailing list >>>> Linux_hpc_swstack at lists.lustre.org >>>> http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack >>>> >>> _______________________________________________ >>> Linux_hpc_swstack mailing list >>> Linux_hpc_swstack at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack >>> >> _______________________________________________ >> Linux_hpc_swstack mailing list >> Linux_hpc_swstack at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack >> > >-- "A simile is not a lie, unless it is a bad simile." - Christopher John Francis Boone
Matthew Bohnsack
2008-Sep-11 14:30 UTC
[Linux_hpc_swstack] Feature Request: BIOS Tools and Improved BIOS/FW Release Notes
All, Let''s please up the ante beyond five or a few hundreds of nodes and start thinking about *thousands*. A situation with multiple thousands of nodes is where I''m coming from and the scale of implementation that I need to support. I want to change the BIOS settings of thousands of nodes in a 100% consistent manner in 10-15 minutes. Something like the flashrom solution gives me this capability. To clarify - I''m not too concerned about updating BIOS firmware revisions. I have already written expect-based tools that leverage the SP, SMASH, TFTP etc. to do this kind of update (i.e., automating the process described here: http://docs.sun.com/source/820-1253-14/cli_com.html#0_30193). I don''t think there will be any problems making this work. The issue is automating BIOS settings changes. E.g., say I want to change the serial console BIOS setting from ANSI to vt100. How do I do this? Can Sun Installation Assistant make BIOS settings changes? Thanks, -Matthew On Thu, 2008-09-11 at 02:31 -0600, Makia Minich wrote:> Looking at the manual, it talks about setting BIOS settings by > physically watching for the BIOS screen and going into the BIOS menus to > make changes > (http://docs.sun.com/source/820-3357-14/SIA_Thumbdrv_Appdx.html#0_pgfId-1004992). > Do you happen to know of any Sun Tools that can actually change BIOS > (we need to start thinking hundreds of nodes and not five)? > > Atul Vidwansa wrote: > > How about using Sun Installation Assistant? It already supports number > > of Sun serves and has an interface with ILOM. > > > > http://www.sun.com/systemmanagement/sia.jsp > > > > Cheers, > > _Atul > > > > Zhiqi Tao wrote: > >> Dear Matthew, > >> > >> I investigated bios updating function on Sun x86 server. At the moment > >> we offer firmware update through both web interface and command line. I > >> guess you have probably tried them. What''s your experience on that? > >> > >> Instead of using third party, I think it would be a simpler solution of > >> developing a shell script to ssh server sp or CMM and update firmware > >> via the following procedure. > >> > >> load -source tftp://10.15.11.200:ilom.X4100M2-2.0.2.5-r30859-bios79.ima > >> > >> Regards, > >> Zhiqi > >> > >> > >> Zhiqi Tao wrote: > >> > >>> Dear Matthew, > >>> > >>> I completely understand the inconvenience of not being able to automate > >>> BIOS update in a large scale server environment. I created one ticket in > >>> bugzilla for this request. > >>> > >>> Bug 17051 Feature Request: BIOS Tools and Improved BIOS/FW Release Notes > >>> > >>> I appreciate your effort of compiling such a detailed specification. I > >>> will investigate what would be the best approach to address this feature. > >>> > >>> Best Regards, > >>> Zhiqi > >>> > >>> > >>> Matthew Bohnsack wrote: > >>> > >>>> Hello, > >>>> > >>>> I would like to make a feature request for a future release of Sun(TM) > >>>> HPC Software, Linux Edition. > >>>> > >>>> In a nutshell, we need the following for a large system that will be > >>>> implemented over the course of the next few months: > >>>> > >>>> A. Linux commandline utilities that can be used to read and write > >>>> the BIOS images of Sun 6000-series blades. > >>>> B. Improved (or simply existent ;)) release notes for BIOS and > >>>> other firmware for Sun 6000-series blades. > >>>> > >>>> Some might consider this request more appropriate for Sun hardware > >>>> engineering than it is for Sun Linux cluster software development. > >>>> However, I''m asking here, because I believe the requirement for this > >>>> functionality flows very naturally from a Linux cluster software stack. > >>>> That is, IMHO, without this functionality, a Linux cluster software > >>>> stack is incomplete and the robust implementation of a large-scale > >>>> production system is very difficult. > >>>> > >>>> Details of my request... > >>>> > >>>> Requirement A. Linux commandline utilities for BIOS read/write > >>>> ===============================================================> >>>> > >>>> Description: > >>>> > >>>> We need a Linux commandline utility that: > >>>> > >>>> * Can read a BIOS image from a node''s EEPROM, including firmware > >>>> revision and settings and then write this image to a file. > >>>> * Can write a previously created BIOS image to a node''s EEPROM. > >>>> * Will not overwrite node VPD. E.g., if I save a BIOS image from > >>>> node1 to disk and then write that image to node2, node1 and > >>>> node2 should retain unique VPD information such as serial > >>>> numbers that are viewable from dmidecode, IMPI utilities, etc. > >>>> * Works on current Harpertown-based X6250 blades > >>>> * Works on future Gainestown-based X???? blades > >>>> > >>>> Note that while this utility would enable BIOS revision updates, this > >>>> type of update is currently possible via a SMASH/TFTP process that can > >>>> be easily automated. I.e., it''s making the BIOS settings changes that''s > >>>> today''s key missing piece of functionality. > >>>> > >>>> Justification: > >>>> > >>>> For implementation-specific technical reasons and because of > >>>> HPC-and-scale-related manageability requirements, we need this > >>>> functionality to effectively implement and manage the Linux cluster I''m > >>>> currently working on. Some details: > >>>> > >>>> * We have blades that will only have an Infiniband HCA to boot > >>>> from - there will be no Ethernet or hard drives. These blades > >>>> will be unusable unless certain BIOS settings are made that > >>>> don''t ship as the default. E.g., PCI expansion ROM functions > >>>> need to be enabled. Because of this requirement and the scale > >>>> of our system, we require an automated solution. Manually > >>>> making these BIOS settings changes at our scale is not feasible. > >>>> * It would be nice to have the required BIOS settings shipped from > >>>> the factory, but this would not be enough, because I have seen > >>>> BIOS settings revert to a state useless for Boot-over-IB with > >>>> certain combinations of power cycle and unseat/reseat with the > >>>> X6250-based system I''m currently working on. In other words, we > >>>> need these tools to implement a robust break/fix process. > >>>> * We''re going to be running tightly-coupled parallel codes that > >>>> will only go as fast as the slowest node. Because BIOS > >>>> revisions and settings can have important effects on single node > >>>> performance, and we are sure to identify beneficial BIOS > >>>> settings changes in the future that aren''t known today, a > >>>> facility to automate BIOS settings changes has to exist for us > >>>> to get the maximum benefit from our machine. > >>>> * We need a way to automatically validate a consistent BIOS state, > >>>> in terms of revision and settings because: > >>>> * A BIOS setting could be accidentally changed by a system > >>>> administrator, and we want to find any issues caused by > >>>> this kind of thing ASAP. > >>>> * Replacement hardware could ship from the factory with an > >>>> unknown state, and we must have a consistent process for > >>>> taking it to a known state and validating that state. > >>>> * The BIOS''s writable EEPROM could represent a security > >>>> risk, unless automated tools can be written to validate > >>>> EEPROM state. > >>>> > >>>> > >>>> Preferred Implementation: > >>>> > >>>> We prefer an OpenSource solution running from Linux userspace, based on > >>>> coreboot''s flashrom. See notes about tests performed with this tool at > >>>> the end of this note and the website: http://www.coreboot.org/Flashrom > >>>> > >>>> Other Less Desirable Possibilities: > >>>> > >>>> * Sun-proprietary Linux tools that implement the required > >>>> functionality could possibly work as well as the coreboot tools. > >>>> * SP/SMASH-CLI-based tools that implement the required > >>>> functionality would enable "Expect"-like scripting tools to be > >>>> developed. > >>>> * DOS-based utilities that that implement the required > >>>> functionality would enable tools based on network-bootable DOS > >>>> images to be developed. This would be lame, but it could be > >>>> made to work. > >>>> * "Expect"-like scripting tools could be developed that > >>>> "screen-scrape" the BIOS setup screens over the serial console. > >>>> This solution would be very brittle and least desirable of all > >>>> the options. > >>>> > >>>> > >>>> Requirement B. Improved changelogs for BIOS and other firmware > >>>> ===============================================================> >>>> > >>>> Description: > >>>> > >>>> Firmware updates for Sun 6000 and 6048-based systems are currently > >>>> available for download at: > >>>> > >>>> http://www.sun.com/servers/blades/downloads.jsp > >>>> > >>>> While updates for some hardware (e.g., Sun Blade X6220 Server Module > >>>> Software 2.0) has changelog detail, other updates that I''m interested in > >>>> (e.g., Sun Blade X6250 Server Module 1.3.3, 1.3.2, and 1.3.1) contain no > >>>> change information whatsoever. I need changelog information for > >>>> firmware that applies to X6250 machines today and the blades based on > >>>> Intel Gainestown CPUs when they are released. > >>>> > >>>> Justification: > >>>> > >>>> Updates and settings changes made to BIOS, systems management, and other > >>>> firmware can have a significant effect on a system, in terms of system > >>>> performance, stability, and manageability. All of these factors are of > >>>> critical importance on a large-scale HPC system, because seemingly small > >>>> firmware improvements/tweaks can have major impacts at scale. It > >>>> follows that HPC system managers are often eager to apply new firmware > >>>> updates on their HPC systems. However, the application of firmware > >>>> updates can be costly on a large system. E.g, disrupting production > >>>> jobs on thousands of nodes to do a firmware update has a serious impact > >>>> in terms of lost throughput. Therefore, an HPC system manager must > >>>> balance the cost of firmware updates with the possible improvements that > >>>> the firmware updates could provide. This analysis is required, but not > >>>> possible when proper changelogs describing new firmware releases do not > >>>> exist. > >>>> > >>>> Other Notes > >>>> ===============================================================> >>>> > >>>> 1) I''ve previously made similar requests on the Sun Hardware blade > >>>> forum: > >>>> > >>>> * http://forums.sun.com/thread.jspa?threadID=5316894&tstart=0 > >>>> * http://forums.sun.com/thread.jspa?threadID=5316461&tstart=0 > >>>> > >>>> 2) Details on required BIOS setting changes (among other things): > >>>> http://forums.sun.com/thread.jspa?threadID=5316472&tstart=0 > >>>> > >>>> 3) I tested the latest FlashROM from SVN on a Harpertown-based X6250 > >>>> (doesn''t work) and a AMD-based x6220 machine (seems to work): > >>>> > >>>> # svn info > >>>> Path: . > >>>> URL: svn://coreboot.org/repos/trunk/util/flashrom > >>>> Repository Root: svn://coreboot.org/repos > >>>> Repository UUID: 2b7e53f0-3cfb-0310-b3e9-8179ed1497e1 > >>>> Revision: 3535 > >>>> Node Kind: directory > >>>> Schedule: normal > >>>> Last Changed Author: hailfinger > >>>> Last Changed Rev: 3532 > >>>> Last Changed Date: 2008-08-20 14:31:41 -0600 (Wed, 20 Aug 2008) > >>>> > >>>> harpertown-box# ./flashrom > >>>> Calibrating delay loop... OK. > >>>> No coreboot table found. > >>>> WARNING: No chipset found. Flash detection will most likely > >>>> fail. > >>>> No EEPROM/flash device found. > >>>> If you know which flash chip you have, and if this version of > >>>> flashrom > >>>> supports a similar flash chip, you can try to force read your > >>>> chip. Run: > >>>> flashrom -f -r -c similar_supported_flash_chip filename > >>>> > >>>> Note: flashrom can never write when the flash chip isn''t found > >>>> automatically. > >>>> > >>>> amd-box# ./flashrom > >>>> Calibrating delay loop... OK. > >>>> No coreboot table found. > >>>> Found chipset "NVIDIA CK804", enabling flash write... OK. > >>>> Found chip "ST M50FLW080A" (1024 KB) at physical address > >>>> 0xfff00000. > >>>> No operations were specified. > >>>> > >>>> _______________________________________________ > >>>> Linux_hpc_swstack mailing list > >>>> Linux_hpc_swstack at lists.lustre.org > >>>> http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack > >>>> > >>> _______________________________________________ > >>> Linux_hpc_swstack mailing list > >>> Linux_hpc_swstack at lists.lustre.org > >>> http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack > >>> > >> _______________________________________________ > >> Linux_hpc_swstack mailing list > >> Linux_hpc_swstack at lists.lustre.org > >> http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack > >> > > > > > > -- > "A simile is not a lie, unless it is a bad simile." > - Christopher John Francis Boone > _______________________________________________ > Linux_hpc_swstack mailing list > Linux_hpc_swstack at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack
Matthew Bohnsack
2008-Sep-11 14:33 UTC
[Linux_hpc_swstack] Feature Request: BIOS Tools and Improved BIOS/FW Release Notes
Makia, You have very nicely clarified my perspective and requirements here. Thank you! -Matthew On Thu, 2008-09-11 at 02:09 -0600, Makia Minich wrote:> Zhiqi, > > While this method will work (concerns with reliability and scalability > of the solution put to the side) one major piece is missing. We need to > find some way to be able to influence BIOS settings as well (these flash > methods will change the firmware, but not change settings). > > The reason for approaching this with something like FlashROM is looking > for a method that will not only allow for pulling a copy of a systems > BIOS, but also the settings along with it. In this method, you have an > image/solution that will allow you to quickly flash multiple nodes with > the settings that you prefer. > > A harder, yet more preferred, solution would be to expose the BIOS > settings to either the service processor or to Linux itself (though in > some instances you can''t get to Linux until you make changes to the > BIOS). As it stands, the method is to use the serial console in a > broadcast mode across the system (or with expect via scripting); but > this method proves to be quite unreliable when scaling up. > > In general, we need to get a software and hardware solution. > > Zhiqi Tao wrote: > > Dear Matthew, > > > > I investigated bios updating function on Sun x86 server. At the moment > > we offer firmware update through both web interface and command line. I > > guess you have probably tried them. What''s your experience on that? > > > > Instead of using third party, I think it would be a simpler solution of > > developing a shell script to ssh server sp or CMM and update firmware > > via the following procedure. > > > > load -source tftp://10.15.11.200:ilom.X4100M2-2.0.2.5-r30859-bios79.ima > > > > Regards, > > Zhiqi > > > > > > Zhiqi Tao wrote: > >> Dear Matthew, > >> > >> I completely understand the inconvenience of not being able to automate > >> BIOS update in a large scale server environment. I created one ticket in > >> bugzilla for this request. > >> > >> Bug 17051 Feature Request: BIOS Tools and Improved BIOS/FW Release Notes > >> > >> I appreciate your effort of compiling such a detailed specification. I > >> will investigate what would be the best approach to address this feature. > >> > >> Best Regards, > >> Zhiqi > >> > >> > >> Matthew Bohnsack wrote: > >>> Hello, > >>> > >>> I would like to make a feature request for a future release of Sun(TM) > >>> HPC Software, Linux Edition. > >>> > >>> In a nutshell, we need the following for a large system that will be > >>> implemented over the course of the next few months: > >>> > >>> A. Linux commandline utilities that can be used to read and write > >>> the BIOS images of Sun 6000-series blades. > >>> B. Improved (or simply existent ;)) release notes for BIOS and > >>> other firmware for Sun 6000-series blades. > >>> > >>> Some might consider this request more appropriate for Sun hardware > >>> engineering than it is for Sun Linux cluster software development. > >>> However, I''m asking here, because I believe the requirement for this > >>> functionality flows very naturally from a Linux cluster software stack. > >>> That is, IMHO, without this functionality, a Linux cluster software > >>> stack is incomplete and the robust implementation of a large-scale > >>> production system is very difficult. > >>> > >>> Details of my request... > >>> > >>> Requirement A. Linux commandline utilities for BIOS read/write > >>> ===============================================================> >>> > >>> Description: > >>> > >>> We need a Linux commandline utility that: > >>> > >>> * Can read a BIOS image from a node''s EEPROM, including firmware > >>> revision and settings and then write this image to a file. > >>> * Can write a previously created BIOS image to a node''s EEPROM. > >>> * Will not overwrite node VPD. E.g., if I save a BIOS image from > >>> node1 to disk and then write that image to node2, node1 and > >>> node2 should retain unique VPD information such as serial > >>> numbers that are viewable from dmidecode, IMPI utilities, etc. > >>> * Works on current Harpertown-based X6250 blades > >>> * Works on future Gainestown-based X???? blades > >>> > >>> Note that while this utility would enable BIOS revision updates, this > >>> type of update is currently possible via a SMASH/TFTP process that can > >>> be easily automated. I.e., it''s making the BIOS settings changes that''s > >>> today''s key missing piece of functionality. > >>> > >>> Justification: > >>> > >>> For implementation-specific technical reasons and because of > >>> HPC-and-scale-related manageability requirements, we need this > >>> functionality to effectively implement and manage the Linux cluster I''m > >>> currently working on. Some details: > >>> > >>> * We have blades that will only have an Infiniband HCA to boot > >>> from - there will be no Ethernet or hard drives. These blades > >>> will be unusable unless certain BIOS settings are made that > >>> don''t ship as the default. E.g., PCI expansion ROM functions > >>> need to be enabled. Because of this requirement and the scale > >>> of our system, we require an automated solution. Manually > >>> making these BIOS settings changes at our scale is not feasible. > >>> * It would be nice to have the required BIOS settings shipped from > >>> the factory, but this would not be enough, because I have seen > >>> BIOS settings revert to a state useless for Boot-over-IB with > >>> certain combinations of power cycle and unseat/reseat with the > >>> X6250-based system I''m currently working on. In other words, we > >>> need these tools to implement a robust break/fix process. > >>> * We''re going to be running tightly-coupled parallel codes that > >>> will only go as fast as the slowest node. Because BIOS > >>> revisions and settings can have important effects on single node > >>> performance, and we are sure to identify beneficial BIOS > >>> settings changes in the future that aren''t known today, a > >>> facility to automate BIOS settings changes has to exist for us > >>> to get the maximum benefit from our machine. > >>> * We need a way to automatically validate a consistent BIOS state, > >>> in terms of revision and settings because: > >>> * A BIOS setting could be accidentally changed by a system > >>> administrator, and we want to find any issues caused by > >>> this kind of thing ASAP. > >>> * Replacement hardware could ship from the factory with an > >>> unknown state, and we must have a consistent process for > >>> taking it to a known state and validating that state. > >>> * The BIOS''s writable EEPROM could represent a security > >>> risk, unless automated tools can be written to validate > >>> EEPROM state. > >>> > >>> > >>> Preferred Implementation: > >>> > >>> We prefer an OpenSource solution running from Linux userspace, based on > >>> coreboot''s flashrom. See notes about tests performed with this tool at > >>> the end of this note and the website: http://www.coreboot.org/Flashrom > >>> > >>> Other Less Desirable Possibilities: > >>> > >>> * Sun-proprietary Linux tools that implement the required > >>> functionality could possibly work as well as the coreboot tools. > >>> * SP/SMASH-CLI-based tools that implement the required > >>> functionality would enable "Expect"-like scripting tools to be > >>> developed. > >>> * DOS-based utilities that that implement the required > >>> functionality would enable tools based on network-bootable DOS > >>> images to be developed. This would be lame, but it could be > >>> made to work. > >>> * "Expect"-like scripting tools could be developed that > >>> "screen-scrape" the BIOS setup screens over the serial console. > >>> This solution would be very brittle and least desirable of all > >>> the options. > >>> > >>> > >>> Requirement B. Improved changelogs for BIOS and other firmware > >>> ===============================================================> >>> > >>> Description: > >>> > >>> Firmware updates for Sun 6000 and 6048-based systems are currently > >>> available for download at: > >>> > >>> http://www.sun.com/servers/blades/downloads.jsp > >>> > >>> While updates for some hardware (e.g., Sun Blade X6220 Server Module > >>> Software 2.0) has changelog detail, other updates that I''m interested in > >>> (e.g., Sun Blade X6250 Server Module 1.3.3, 1.3.2, and 1.3.1) contain no > >>> change information whatsoever. I need changelog information for > >>> firmware that applies to X6250 machines today and the blades based on > >>> Intel Gainestown CPUs when they are released. > >>> > >>> Justification: > >>> > >>> Updates and settings changes made to BIOS, systems management, and other > >>> firmware can have a significant effect on a system, in terms of system > >>> performance, stability, and manageability. All of these factors are of > >>> critical importance on a large-scale HPC system, because seemingly small > >>> firmware improvements/tweaks can have major impacts at scale. It > >>> follows that HPC system managers are often eager to apply new firmware > >>> updates on their HPC systems. However, the application of firmware > >>> updates can be costly on a large system. E.g, disrupting production > >>> jobs on thousands of nodes to do a firmware update has a serious impact > >>> in terms of lost throughput. Therefore, an HPC system manager must > >>> balance the cost of firmware updates with the possible improvements that > >>> the firmware updates could provide. This analysis is required, but not > >>> possible when proper changelogs describing new firmware releases do not > >>> exist. > >>> > >>> Other Notes > >>> ===============================================================> >>> > >>> 1) I''ve previously made similar requests on the Sun Hardware blade > >>> forum: > >>> > >>> * http://forums.sun.com/thread.jspa?threadID=5316894&tstart=0 > >>> * http://forums.sun.com/thread.jspa?threadID=5316461&tstart=0 > >>> > >>> 2) Details on required BIOS setting changes (among other things): > >>> http://forums.sun.com/thread.jspa?threadID=5316472&tstart=0 > >>> > >>> 3) I tested the latest FlashROM from SVN on a Harpertown-based X6250 > >>> (doesn''t work) and a AMD-based x6220 machine (seems to work): > >>> > >>> # svn info > >>> Path: . > >>> URL: svn://coreboot.org/repos/trunk/util/flashrom > >>> Repository Root: svn://coreboot.org/repos > >>> Repository UUID: 2b7e53f0-3cfb-0310-b3e9-8179ed1497e1 > >>> Revision: 3535 > >>> Node Kind: directory > >>> Schedule: normal > >>> Last Changed Author: hailfinger > >>> Last Changed Rev: 3532 > >>> Last Changed Date: 2008-08-20 14:31:41 -0600 (Wed, 20 Aug 2008) > >>> > >>> harpertown-box# ./flashrom > >>> Calibrating delay loop... OK. > >>> No coreboot table found. > >>> WARNING: No chipset found. Flash detection will most likely > >>> fail. > >>> No EEPROM/flash device found. > >>> If you know which flash chip you have, and if this version of > >>> flashrom > >>> supports a similar flash chip, you can try to force read your > >>> chip. Run: > >>> flashrom -f -r -c similar_supported_flash_chip filename > >>> > >>> Note: flashrom can never write when the flash chip isn''t found > >>> automatically. > >>> > >>> amd-box# ./flashrom > >>> Calibrating delay loop... OK. > >>> No coreboot table found. > >>> Found chipset "NVIDIA CK804", enabling flash write... OK. > >>> Found chip "ST M50FLW080A" (1024 KB) at physical address > >>> 0xfff00000. > >>> No operations were specified. > >>> > >>> _______________________________________________ > >>> Linux_hpc_swstack mailing list > >>> Linux_hpc_swstack at lists.lustre.org > >>> http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack > >> _______________________________________________ > >> Linux_hpc_swstack mailing list > >> Linux_hpc_swstack at lists.lustre.org > >> http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack > > _______________________________________________ > > Linux_hpc_swstack mailing list > > Linux_hpc_swstack at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack > > -- > "A simile is not a lie, unless it is a bad simile." > - Christopher John Francis Boone > _______________________________________________ > Linux_hpc_swstack mailing list > Linux_hpc_swstack at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack
Frank Leers
2008-Sep-11 14:58 UTC
[Linux_hpc_swstack] Feature Request: BIOS Tools and Improved BIOS/FW Release Notes
On Sep 11, 2008, at 7:30 AM, Matthew Bohnsack wrote:> All, > > Let''s please up the ante beyond five or a few hundreds of nodes and > start thinking about *thousands*. A situation with multiple thousands > of nodes is where I''m coming from and the scale of implementation > that I > need to support. I want to change the BIOS settings of thousands of > nodes in a 100% consistent manner in 10-15 minutes. Something like > the > flashrom solution gives me this capability. > > To clarify - I''m not too concerned about updating BIOS firmware > revisions. I have already written expect-based tools that leverage > the > SP, SMASH, TFTP etc. to do this kind of update (i.e., automating the > process described here: > http://docs.sun.com/source/820-1253-14/cli_com.html#0_30193). I don''t > think there will be any problems making this work. The issue is > automating BIOS settings changes. E.g., say I want to change the > serial > console BIOS setting from ANSI to vt100. How do I do this? Can Sun > Installation Assistant make BIOS settings changes?No it can''t. The only official product from Sun to date that can claim to offer this functionality is suncfg, and it is extremely limited. http://docs.sun.com/source/820-1120-18/suncfg1.html Matt, I think we have the proper folks engaged, from our off-list exchange. Let''s see what progress can be made using that venue. Makia, Zhiki, Atul, if you are interested in details please contact me privately. -frank> Thanks, > > -Matthew > > On Thu, 2008-09-11 at 02:31 -0600, Makia Minich wrote: >> Looking at the manual, it talks about setting BIOS settings by >> physically watching for the BIOS screen and going into the BIOS >> menus to >> make changes >> (http://docs.sun.com/source/820-3357-14/SIA_Thumbdrv_Appdx.html#0_pgfId-1004992 >> ). >> Do you happen to know of any Sun Tools that can actually change BIOS >> (we need to start thinking hundreds of nodes and not five)? >> >> Atul Vidwansa wrote: >>> How about using Sun Installation Assistant? It already supports >>> number >>> of Sun serves and has an interface with ILOM. >>> >>> http://www.sun.com/systemmanagement/sia.jsp >>> >>> Cheers, >>> _Atul >>> >>> Zhiqi Tao wrote: >>>> Dear Matthew, >>>> >>>> I investigated bios updating function on Sun x86 server. At the >>>> moment >>>> we offer firmware update through both web interface and command >>>> line. I >>>> guess you have probably tried them. What''s your experience on that? >>>> >>>> Instead of using third party, I think it would be a simpler >>>> solution of >>>> developing a shell script to ssh server sp or CMM and update >>>> firmware >>>> via the following procedure. >>>> >>>> load -source tftp://10.15.11.200:ilom.X4100M2-2.0.2.5-r30859- >>>> bios79.ima >>>> >>>> Regards, >>>> Zhiqi >>>> >>>> >>>> Zhiqi Tao wrote: >>>> >>>>> Dear Matthew, >>>>> >>>>> I completely understand the inconvenience of not being able to >>>>> automate >>>>> BIOS update in a large scale server environment. I created one >>>>> ticket in >>>>> bugzilla for this request. >>>>> >>>>> Bug 17051 Feature Request: BIOS Tools and Improved BIOS/FW >>>>> Release Notes >>>>> >>>>> I appreciate your effort of compiling such a detailed >>>>> specification. I >>>>> will investigate what would be the best approach to address this >>>>> feature. >>>>> >>>>> Best Regards, >>>>> Zhiqi >>>>> >>>>> >>>>> Matthew Bohnsack wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> I would like to make a feature request for a future release of >>>>>> Sun(TM) >>>>>> HPC Software, Linux Edition. >>>>>> >>>>>> In a nutshell, we need the following for a large system that >>>>>> will be >>>>>> implemented over the course of the next few months: >>>>>> >>>>>> A. Linux commandline utilities that can be used to read and >>>>>> write >>>>>> the BIOS images of Sun 6000-series blades. >>>>>> B. Improved (or simply existent ;)) release notes for BIOS >>>>>> and >>>>>> other firmware for Sun 6000-series blades. >>>>>> >>>>>> Some might consider this request more appropriate for Sun >>>>>> hardware >>>>>> engineering than it is for Sun Linux cluster software >>>>>> development. >>>>>> However, I''m asking here, because I believe the requirement for >>>>>> this >>>>>> functionality flows very naturally from a Linux cluster >>>>>> software stack. >>>>>> That is, IMHO, without this functionality, a Linux cluster >>>>>> software >>>>>> stack is incomplete and the robust implementation of a large- >>>>>> scale >>>>>> production system is very difficult. >>>>>> >>>>>> Details of my request... >>>>>> >>>>>> Requirement A. Linux commandline utilities for BIOS read/write >>>>>> ===============================================================>>>>>> >>>>>> Description: >>>>>> >>>>>> We need a Linux commandline utility that: >>>>>> >>>>>> * Can read a BIOS image from a node''s EEPROM, including >>>>>> firmware >>>>>> revision and settings and then write this image to a file. >>>>>> * Can write a previously created BIOS image to a node''s >>>>>> EEPROM. >>>>>> * Will not overwrite node VPD. E.g., if I save a BIOS >>>>>> image from >>>>>> node1 to disk and then write that image to node2, node1 >>>>>> and >>>>>> node2 should retain unique VPD information such as serial >>>>>> numbers that are viewable from dmidecode, IMPI >>>>>> utilities, etc. >>>>>> * Works on current Harpertown-based X6250 blades >>>>>> * Works on future Gainestown-based X???? blades >>>>>> >>>>>> Note that while this utility would enable BIOS revision >>>>>> updates, this >>>>>> type of update is currently possible via a SMASH/TFTP process >>>>>> that can >>>>>> be easily automated. I.e., it''s making the BIOS settings >>>>>> changes that''s >>>>>> today''s key missing piece of functionality. >>>>>> >>>>>> Justification: >>>>>> >>>>>> For implementation-specific technical reasons and because of >>>>>> HPC-and-scale-related manageability requirements, we need this >>>>>> functionality to effectively implement and manage the Linux >>>>>> cluster I''m >>>>>> currently working on. Some details: >>>>>> >>>>>> * We have blades that will only have an Infiniband HCA to >>>>>> boot >>>>>> from - there will be no Ethernet or hard drives. These >>>>>> blades >>>>>> will be unusable unless certain BIOS settings are made >>>>>> that >>>>>> don''t ship as the default. E.g., PCI expansion ROM >>>>>> functions >>>>>> need to be enabled. Because of this requirement and the >>>>>> scale >>>>>> of our system, we require an automated solution. Manually >>>>>> making these BIOS settings changes at our scale is not >>>>>> feasible. >>>>>> * It would be nice to have the required BIOS settings >>>>>> shipped from >>>>>> the factory, but this would not be enough, because I >>>>>> have seen >>>>>> BIOS settings revert to a state useless for Boot-over-IB >>>>>> with >>>>>> certain combinations of power cycle and unseat/reseat >>>>>> with the >>>>>> X6250-based system I''m currently working on. In other >>>>>> words, we >>>>>> need these tools to implement a robust break/fix process. >>>>>> * We''re going to be running tightly-coupled parallel codes >>>>>> that >>>>>> will only go as fast as the slowest node. Because BIOS >>>>>> revisions and settings can have important effects on >>>>>> single node >>>>>> performance, and we are sure to identify beneficial BIOS >>>>>> settings changes in the future that aren''t known today, a >>>>>> facility to automate BIOS settings changes has to exist >>>>>> for us >>>>>> to get the maximum benefit from our machine. >>>>>> * We need a way to automatically validate a consistent >>>>>> BIOS state, >>>>>> in terms of revision and settings because: >>>>>> * A BIOS setting could be accidentally changed by >>>>>> a system >>>>>> administrator, and we want to find any issues >>>>>> caused by >>>>>> this kind of thing ASAP. >>>>>> * Replacement hardware could ship from the factory >>>>>> with an >>>>>> unknown state, and we must have a consistent >>>>>> process for >>>>>> taking it to a known state and validating that >>>>>> state. >>>>>> * The BIOS''s writable EEPROM could represent a >>>>>> security >>>>>> risk, unless automated tools can be written to >>>>>> validate >>>>>> EEPROM state. >>>>>> >>>>>> >>>>>> Preferred Implementation: >>>>>> >>>>>> We prefer an OpenSource solution running from Linux userspace, >>>>>> based on >>>>>> coreboot''s flashrom. See notes about tests performed with this >>>>>> tool at >>>>>> the end of this note and the website: http://www.coreboot.org/Flashrom >>>>>> >>>>>> Other Less Desirable Possibilities: >>>>>> >>>>>> * Sun-proprietary Linux tools that implement the required >>>>>> functionality could possibly work as well as the >>>>>> coreboot tools. >>>>>> * SP/SMASH-CLI-based tools that implement the required >>>>>> functionality would enable "Expect"-like scripting tools >>>>>> to be >>>>>> developed. >>>>>> * DOS-based utilities that that implement the required >>>>>> functionality would enable tools based on network- >>>>>> bootable DOS >>>>>> images to be developed. This would be lame, but it >>>>>> could be >>>>>> made to work. >>>>>> * "Expect"-like scripting tools could be developed that >>>>>> "screen-scrape" the BIOS setup screens over the serial >>>>>> console. >>>>>> This solution would be very brittle and least desirable >>>>>> of all >>>>>> the options. >>>>>> >>>>>> >>>>>> Requirement B. Improved changelogs for BIOS and other firmware >>>>>> ===============================================================>>>>>> >>>>>> Description: >>>>>> >>>>>> Firmware updates for Sun 6000 and 6048-based systems are >>>>>> currently >>>>>> available for download at: >>>>>> >>>>>> http://www.sun.com/servers/blades/downloads.jsp >>>>>> >>>>>> While updates for some hardware (e.g., Sun Blade X6220 Server >>>>>> Module >>>>>> Software 2.0) has changelog detail, other updates that I''m >>>>>> interested in >>>>>> (e.g., Sun Blade X6250 Server Module 1.3.3, 1.3.2, and 1.3.1) >>>>>> contain no >>>>>> change information whatsoever. I need changelog information for >>>>>> firmware that applies to X6250 machines today and the blades >>>>>> based on >>>>>> Intel Gainestown CPUs when they are released. >>>>>> >>>>>> Justification: >>>>>> >>>>>> Updates and settings changes made to BIOS, systems management, >>>>>> and other >>>>>> firmware can have a significant effect on a system, in terms of >>>>>> system >>>>>> performance, stability, and manageability. All of these >>>>>> factors are of >>>>>> critical importance on a large-scale HPC system, because >>>>>> seemingly small >>>>>> firmware improvements/tweaks can have major impacts at scale. It >>>>>> follows that HPC system managers are often eager to apply new >>>>>> firmware >>>>>> updates on their HPC systems. However, the application of >>>>>> firmware >>>>>> updates can be costly on a large system. E.g, disrupting >>>>>> production >>>>>> jobs on thousands of nodes to do a firmware update has a >>>>>> serious impact >>>>>> in terms of lost throughput. Therefore, an HPC system manager >>>>>> must >>>>>> balance the cost of firmware updates with the possible >>>>>> improvements that >>>>>> the firmware updates could provide. This analysis is required, >>>>>> but not >>>>>> possible when proper changelogs describing new firmware >>>>>> releases do not >>>>>> exist. >>>>>> >>>>>> Other Notes >>>>>> ===============================================================>>>>>> >>>>>> 1) I''ve previously made similar requests on the Sun Hardware >>>>>> blade >>>>>> forum: >>>>>> >>>>>> * http://forums.sun.com/thread.jspa? >>>>>> threadID=5316894&tstart=0 >>>>>> * http://forums.sun.com/thread.jspa? >>>>>> threadID=5316461&tstart=0 >>>>>> >>>>>> 2) Details on required BIOS setting changes (among other things): >>>>>> http://forums.sun.com/thread.jspa?threadID=5316472&tstart=0 >>>>>> >>>>>> 3) I tested the latest FlashROM from SVN on a Harpertown-based >>>>>> X6250 >>>>>> (doesn''t work) and a AMD-based x6220 machine (seems to work): >>>>>> >>>>>> # svn info >>>>>> Path: . >>>>>> URL: svn://coreboot.org/repos/trunk/util/flashrom >>>>>> Repository Root: svn://coreboot.org/repos >>>>>> Repository UUID: 2b7e53f0-3cfb-0310-b3e9-8179ed1497e1 >>>>>> Revision: 3535 >>>>>> Node Kind: directory >>>>>> Schedule: normal >>>>>> Last Changed Author: hailfinger >>>>>> Last Changed Rev: 3532 >>>>>> Last Changed Date: 2008-08-20 14:31:41 -0600 (Wed, 20 >>>>>> Aug 2008) >>>>>> >>>>>> harpertown-box# ./flashrom >>>>>> Calibrating delay loop... OK. >>>>>> No coreboot table found. >>>>>> WARNING: No chipset found. Flash detection will most >>>>>> likely >>>>>> fail. >>>>>> No EEPROM/flash device found. >>>>>> If you know which flash chip you have, and if this >>>>>> version of >>>>>> flashrom >>>>>> supports a similar flash chip, you can try to force read >>>>>> your >>>>>> chip. Run: >>>>>> flashrom -f -r -c similar_supported_flash_chip filename >>>>>> >>>>>> Note: flashrom can never write when the flash chip isn''t >>>>>> found >>>>>> automatically. >>>>>> >>>>>> amd-box# ./flashrom >>>>>> Calibrating delay loop... OK. >>>>>> No coreboot table found. >>>>>> Found chipset "NVIDIA CK804", enabling flash write... OK. >>>>>> Found chip "ST M50FLW080A" (1024 KB) at physical address >>>>>> 0xfff00000. >>>>>> No operations were specified. >>>>>> >>>>>> _______________________________________________ >>>>>> Linux_hpc_swstack mailing list >>>>>> Linux_hpc_swstack at lists.lustre.org >>>>>> http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack >>>>>> >>>>> _______________________________________________ >>>>> Linux_hpc_swstack mailing list >>>>> Linux_hpc_swstack at lists.lustre.org >>>>> http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack >>>>> >>>> _______________________________________________ >>>> Linux_hpc_swstack mailing list >>>> Linux_hpc_swstack at lists.lustre.org >>>> http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack >>>> >>> >>> >> >> -- >> "A simile is not a lie, unless it is a bad simile." >> - Christopher John Francis Boone >> _______________________________________________ >> Linux_hpc_swstack mailing list >> Linux_hpc_swstack at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack > > _______________________________________________ > Linux_hpc_swstack mailing list > Linux_hpc_swstack at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/linux_hpc_swstack-------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2421 bytes Desc: not available Url : http://lists.lustre.org/pipermail/linux_hpc_swstack/attachments/20080911/bc3346d1/attachment.bin