Greg Mason
2010-May-10 21:59 UTC
[Lustre-discuss] Lustre 1.6.6 to 1.8.3 upgrade + MDT hardware migration
I''m in the process of upgrading from Lustre 1.6.6 to Lustre 1.8.3, and there''s quite a few goals I have when it''s all said and done (in no particular order): 1. Upgrade to Lustre 1.8.3 (duh!) 2. Migrate MDT to better-suited hardware (from internal disk on an older system to a 2540 array hanging off a newer system). This is also moving from SLES 10 SP2 to RHEL 5.x 3. Upgrade X4500 OSSes from SLES 10 SP2 to RHEL 5.x 4. Upgrade the MGS to lustre 1.8.3 (most likely leaving on SLES 10 SP2) We will be upgrading the clients in a few months, when we update our compute node OS image. The clients are running Lustre 1.6.7 currently. I''ve been going through the Lustre manual, and it looks like the proper order of events for a 1.6.x to 1.8.x upgrade is: 1. Shut down the filesystem 2. Upgrade the MGS to 1.8, and mount it 3. Upgrade the OSSes, and mount all OSTs 4. Upgrade the MDS and mount the MDT 5. Mount the filesystem on the clients I''ve also looked at operating mixed servers, and it looks like I can migrate and upgrade the MDT to 1.8 and upgrade the MGS to 1.8, while leaving the OSTs at 1.6 and everything will work fine. The idea here would be to eventually upgrade the OSTs at a later time, probably 1 at a time. The only problem is that this would mean I''m upgrading the OSTs after the MDT, which looks to be a no-no. Some other misc. issues: 1. With the migration of the MDT to new hardware, do I need to do a writeconf if hostnames/IPs are staying the same? The new MDS hardware will be assuming the name and IPs of the old MDS hardware. 2. I''ve been asked if I can leave the filesystem mounted on all the clients during the whole upgrade. I could see this working for the OSSes, but not the MDT migration and MDT/MGS upgrade. Am I correct in this assumption? Does anybody have any advice on how to best accomplish the goal of Lustre 1.8.3 servers with Lustre 1.6.7 clients? Thanks, -Greg -- Greg Mason System Administrator Michigan State University High Performance Computing Center web: www.hpcc.msu.edu email: gmason at msu.edu
Andreas Dilger
2010-May-11 07:56 UTC
[Lustre-discuss] Lustre 1.6.6 to 1.8.3 upgrade + MDT hardware migration
On 2010-05-10, at 15:59, Greg Mason wrote:> I''m in the process of upgrading from Lustre 1.6.6 to Lustre 1.8.3, and there''s quite a few goals I have when it''s all said and done (in no particular order): > > 1. Upgrade to Lustre 1.8.3 (duh!) > 2. Migrate MDT to better-suited hardware (from internal disk on an older system to a 2540 array hanging off a newer system). This is also moving from SLES 10 SP2 to RHEL 5.x > 3. Upgrade X4500 OSSes from SLES 10 SP2 to RHEL 5.x > 4. Upgrade the MGS to lustre 1.8.3 (most likely leaving on SLES 10 SP2)Any significant motivation for sticking with SLES10 instead of RHEL 5 for the MGS?> We will be upgrading the clients in a few months, when we update our compute node OS image. The clients are running Lustre 1.6.7 currently. > > I''ve been going through the Lustre manual, and it looks like the proper order of events for a 1.6.x to 1.8.x upgrade is: > > 1. Shut down the filesystem > 2. Upgrade the MGS to 1.8, and mount it > 3. Upgrade the OSSes, and mount all OSTs > 4. Upgrade the MDS and mount the MDT > 5. Mount the filesystem on the clients > > I''ve also looked at operating mixed servers, and it looks like I can migrate and upgrade the MDT to 1.8 and upgrade the MGS to 1.8, while leaving the OSTs at 1.6 and everything will work fine. The idea here would be to eventually upgrade the OSTs at a later time, probably 1 at a time. The only problem is that this would mean I''m upgrading the OSTs after the MDT, which looks to be a no-no.What is the motivation for not upgrading the OSTs at the same time? I don''t think it will significantly increase the outage window, and you will be running code that has been tested together a lot more.> Some other misc. issues: > 1. With the migration of the MDT to new hardware, do I need to do a writeconf if hostnames/IPs are staying the same? The new MDS hardware will be assuming the name and IPs of the old MDS hardware.If you are just doing a block-device copy of the MDT filesystem from the old disks to the new disks, then the hardware replacement itself should not require a writeconf. However, it may be that the 1.6->1.8 upgrade needs a writeconf (I admit I don''t know the answer to this offhand).> 2. I''ve been asked if I can leave the filesystem mounted on all the clients during the whole upgrade. I could see this working for the OSSes, but not the MDT migration and MDT/MGS upgrade. Am I correct in this assumption?You can leave it mounted on all the clients during the upgrade. They will hang if they access the filesystem, waiting for the servers to return. If the MDS and MGS are not changing IP addresses then no problems should be anticipated.> Does anybody have any advice on how to best accomplish the goal of Lustre 1.8.3 servers with Lustre 1.6.7 clients?I know a few sites are currently going through the same process on the servers, and I expect they have to run with clients at 1.6 for at least a short time before they upgrade to 1.8 due to complex environments that don''t allow upgrading everything at the same time. Hopefully they will chime in here. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
Danny Sternkopf
2010-May-11 08:38 UTC
[Lustre-discuss] Lustre 1.6.6 to 1.8.3 upgrade + MDT hardware migration
Hi, one further question regarding the last point. How about upgrading the Lustre clients to 1.8.2 first while the Lustre servers still run 1.6.7.2? We currently have such a configuration and we sometimes see strange things when failover and recovery is going to happen. We saw during a manual failover on the OSS side that almost all the clients got disconnected and were not able to connect again. We could only solve this issue with re-mounting Lustre on the client or rebooting the client which was a disaster due to the loss of all running programs accessing Lustre at this time. Any comments to this? Does it basically work to have 1.8 Clients and 1.6 Servers? Best regards, Danny On 5/11/2010 9:56 AM, Andreas Dilger wrote:> On 2010-05-10, at 15:59, Greg Mason wrote: >> I''m in the process of upgrading from Lustre 1.6.6 to Lustre 1.8.3, and there''s quite a few goals I have when it''s all said and done (in no particular order): >> >> 1. Upgrade to Lustre 1.8.3 (duh!) >> 2. Migrate MDT to better-suited hardware (from internal disk on an older system to a 2540 array hanging off a newer system). This is also moving from SLES 10 SP2 to RHEL 5.x >> 3. Upgrade X4500 OSSes from SLES 10 SP2 to RHEL 5.x >> 4. Upgrade the MGS to lustre 1.8.3 (most likely leaving on SLES 10 SP2) > > Any significant motivation for sticking with SLES10 instead of RHEL 5 for the MGS? > >> We will be upgrading the clients in a few months, when we update our compute node OS image. The clients are running Lustre 1.6.7 currently. >> >> I''ve been going through the Lustre manual, and it looks like the proper order of events for a 1.6.x to 1.8.x upgrade is: >> >> 1. Shut down the filesystem >> 2. Upgrade the MGS to 1.8, and mount it >> 3. Upgrade the OSSes, and mount all OSTs >> 4. Upgrade the MDS and mount the MDT >> 5. Mount the filesystem on the clients >> >> I''ve also looked at operating mixed servers, and it looks like I can migrate and upgrade the MDT to 1.8 and upgrade the MGS to 1.8, while leaving the OSTs at 1.6 and everything will work fine. The idea here would be to eventually upgrade the OSTs at a later time, probably 1 at a time. The only problem is that this would mean I''m upgrading the OSTs after the MDT, which looks to be a no-no. > > What is the motivation for not upgrading the OSTs at the same time? I don''t think it will significantly increase the outage window, and you will be running code that has been tested together a lot more. > >> Some other misc. issues: >> 1. With the migration of the MDT to new hardware, do I need to do a writeconf if hostnames/IPs are staying the same? The new MDS hardware will be assuming the name and IPs of the old MDS hardware. > > If you are just doing a block-device copy of the MDT filesystem from the old disks to the new disks, then the hardware replacement itself should not require a writeconf. However, it may be that the 1.6->1.8 upgrade needs a writeconf (I admit I don''t know the answer to this offhand). > >> 2. I''ve been asked if I can leave the filesystem mounted on all the clients during the whole upgrade. I could see this working for the OSSes, but not the MDT migration and MDT/MGS upgrade. Am I correct in this assumption? > > You can leave it mounted on all the clients during the upgrade. They will hang if they access the filesystem, waiting for the servers to return. If the MDS and MGS are not changing IP addresses then no problems should be anticipated. > >> Does anybody have any advice on how to best accomplish the goal of Lustre 1.8.3 servers with Lustre 1.6.7 clients? > > I know a few sites are currently going through the same process on the servers, and I expect they have to run with clients at 1.6 for at least a short time before they upgrade to 1.8 due to complex environments that don''t allow upgrading everything at the same time. Hopefully they will chime in here. > > Cheers, Andreas > -- > Andreas Dilger > Lustre Technical Lead > Oracle Corporation Canada Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Greg Mason
2010-May-11 13:57 UTC
[Lustre-discuss] Lustre 1.6.6 to 1.8.3 upgrade + MDT hardware migration
> > Any significant motivation for sticking with SLES10 instead of RHEL 5 for the MGS?Our MGS sits in a VMware stack, and I imagine it would be easier to simply install the 1.8.3 RPMs and mount up the MGS rather than make the switch to RHEL. Would it be recommended to stick with RHEL across the board for all Lustre servers?> > What is the motivation for not upgrading the OSTs at the same time? I don''t think it will significantly increase the outage window, and you will be running code that has been tested together a lot more.The motivation was reducing the downtime, and potential for problems. It sounds like the best way to prevent complications is to upgrade everything all at once...> > If you are just doing a block-device copy of the MDT filesystem from the old disks to the new disks, then the hardware replacement itself should not require a writeconf. However, it may be that the 1.6->1.8 upgrade needs a writeconf (I admit I don''t know the answer to this offhand).I''m doing file-level (tar and getfattr) backup and restore, as I''m taking advantage of the larger MDT filesystem to create more inodes in the process.> > You can leave it mounted on all the clients during the upgrade. They will hang if they access the filesystem, waiting for the servers to return. If the MDS and MGS are not changing IP addresses then no problems should be anticipated.This is good to hear, and will greatly simplify things. Thanks for the info Andreas. -- Greg Mason System Administrator Michigan State University High Performance Computing Center web: www.hpcc.msu.edu email: gmason at msu.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100511/944b3d7c/attachment-0001.html
Andreas Dilger
2010-May-11 21:47 UTC
[Lustre-discuss] Lustre 1.6.6 to 1.8.3 upgrade + MDT hardware migration
On 2010-05-11, at 02:38, Danny Sternkopf wrote:> one further question regarding the last point. How about upgrading the > Lustre clients to 1.8.2 first while the Lustre servers still run 1.6.7.2?Any reason not to go to 1.8.3, if you are just starting your update process?> We currently have such a configuration and we sometimes see strange > things when failover and recovery is going to happen. We saw during a > manual failover on the OSS side that almost all the clients got > disconnected and were not able to connect again. We could only solve > this issue with re-mounting Lustre on the client or rebooting the client > which was a disaster due to the loss of all running programs accessing > Lustre at this time.There have been a number of recovery improvements in 1.8, but I couldn''t say offhand whether they are more on the client side or on the server, or both.> Any comments to this? Does it basically work to have 1.8 Clients and 1.6 > Servers?We do test 1.8.latest with 1.6.latest whenever we make a release.> On 5/11/2010 9:56 AM, Andreas Dilger wrote: >> On 2010-05-10, at 15:59, Greg Mason wrote: >>> I''m in the process of upgrading from Lustre 1.6.6 to Lustre 1.8.3, and there''s quite a few goals I have when it''s all said and done (in no particular order): >>> >>> 1. Upgrade to Lustre 1.8.3 (duh!) >>> 2. Migrate MDT to better-suited hardware (from internal disk on an older system to a 2540 array hanging off a newer system). This is also moving from SLES 10 SP2 to RHEL 5.x >>> 3. Upgrade X4500 OSSes from SLES 10 SP2 to RHEL 5.x >>> 4. Upgrade the MGS to lustre 1.8.3 (most likely leaving on SLES 10 SP2) >> >> Any significant motivation for sticking with SLES10 instead of RHEL 5 for the MGS? >> >>> We will be upgrading the clients in a few months, when we update our compute node OS image. The clients are running Lustre 1.6.7 currently. >>> >>> I''ve been going through the Lustre manual, and it looks like the proper order of events for a 1.6.x to 1.8.x upgrade is: >>> >>> 1. Shut down the filesystem >>> 2. Upgrade the MGS to 1.8, and mount it >>> 3. Upgrade the OSSes, and mount all OSTs >>> 4. Upgrade the MDS and mount the MDT >>> 5. Mount the filesystem on the clients >>> >>> I''ve also looked at operating mixed servers, and it looks like I can migrate and upgrade the MDT to 1.8 and upgrade the MGS to 1.8, while leaving the OSTs at 1.6 and everything will work fine. The idea here would be to eventually upgrade the OSTs at a later time, probably 1 at a time. The only problem is that this would mean I''m upgrading the OSTs after the MDT, which looks to be a no-no. >> >> What is the motivation for not upgrading the OSTs at the same time? I don''t think it will significantly increase the outage window, and you will be running code that has been tested together a lot more. >> >>> Some other misc. issues: >>> 1. With the migration of the MDT to new hardware, do I need to do a writeconf if hostnames/IPs are staying the same? The new MDS hardware will be assuming the name and IPs of the old MDS hardware. >> >> If you are just doing a block-device copy of the MDT filesystem from the old disks to the new disks, then the hardware replacement itself should not require a writeconf. However, it may be that the 1.6->1.8 upgrade needs a writeconf (I admit I don''t know the answer to this offhand). >> >>> 2. I''ve been asked if I can leave the filesystem mounted on all the clients during the whole upgrade. I could see this working for the OSSes, but not the MDT migration and MDT/MGS upgrade. Am I correct in this assumption? >> >> You can leave it mounted on all the clients during the upgrade. They will hang if they access the filesystem, waiting for the servers to return. If the MDS and MGS are not changing IP addresses then no problems should be anticipated. >> >>> Does anybody have any advice on how to best accomplish the goal of Lustre 1.8.3 servers with Lustre 1.6.7 clients? >> >> I know a few sites are currently going through the same process on the servers, and I expect they have to run with clients at 1.6 for at least a short time before they upgrade to 1.8 due to complex environments that don''t allow upgrading everything at the same time. Hopefully they will chime in here. >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Lustre Technical Lead >> Oracle Corporation Canada Inc. >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.