Dan Langille
2024-Nov-23 02:50 UTC
[Nut-upsuser] Shutdown the servers first, keep the network running
I have an idea for my shutdown process at home. My goal: maximize the network run-time. At present, the UPS has a run-time of about 57 minutes. This is my idea: * shutdown the servers after 15 minutes of downtime (for me, that's when battery.runtime hits 40) * leave the network gear (switches, firewall, wifi) running so I can continue with Internet access Optionally: * when we get down to 10 minutes, let everything else shutdown The goal: I can keep working from my home office - there's a separate UPS up there. Thinking about the plan: * the firewall runs nut and monitors the UP * the servers can take action and shut themselves off - they run run nut * the firewall will be the only nut instance still running after the services go down The Eaton 5PX UPS has some configurable items in it. I may be able to use them as well. Looking at my notes[1] from way-back-when, I was unable to get FINALDELAY to be observer (why, is not clear). I'd be happy to hear suggestion and idea based on your experience please. I'm running FreeBSD 14.1 and nut 2.8.2 1 - https://dan.langille.org/2020/09/13/nut-testing-shutdown-and-startup/ - I need to redo that timing - none of these servers are still here, and the new ones are not catered for. I hope to be replacing the batteries soon - plenty of opportunity to do that work then. -- Dan Langille dan at langille.org
Kelly Byrd
2024-Nov-23 03:12 UTC
[Nut-upsuser] Shutdown the servers first, keep the network running
I do something sort of similar. But I decided on 5 min of mains AC being the limit. History from the last several years tells me that in my area, if the mains aren't back on in 5min, it's going to be a while. For stupid reasons, my NUT "source of truth" is a Raspberry Pi. I'll move it to be my router when it gets a new enough version of NUT, Then the NAS and other non-essential things monitor that and shut down 5min after an outage starts. On Fri, Nov 22, 2024 at 6:50?PM Dan Langille via Nut-upsuser < nut-upsuser at alioth-lists.debian.net> wrote:> I have an idea for my shutdown process at home. My goal: maximize the > network run-time. At present, the UPS has a run-time of about 57 minutes. > > This is my idea: > > * shutdown the servers after 15 minutes of downtime (for me, that's when > battery.runtime hits 40) > * leave the network gear (switches, firewall, wifi) running so I can > continue with Internet access > > Optionally: > * when we get down to 10 minutes, let everything else shutdown > > The goal: I can keep working from my home office - there's a separate UPS > up there. > > Thinking about the plan: > > * the firewall runs nut and monitors the UP > * the servers can take action and shut themselves off - they run run nut > * the firewall will be the only nut instance still running after the > services go down > > > The Eaton 5PX UPS has some configurable items in it. I may be able to use > them as well. Looking at my notes[1] from way-back-when, I was unable to > get FINALDELAY to be observer (why, is not clear). > > I'd be happy to hear suggestion and idea based on your experience please. > > I'm running FreeBSD 14.1 and nut 2.8.2 > > 1 - https://dan.langille.org/2020/09/13/nut-testing-shutdown-and-startup/ > - I need to redo that timing - none of these servers are still here, and > the new ones are not catered for. I hope to be replacing the batteries soon > - plenty of opportunity to do that work then. > > -- > Dan Langille > dan at langille.org > > _______________________________________________ > Nut-upsuser mailing list > Nut-upsuser at alioth-lists.debian.net > https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsuser >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://alioth-lists.debian.net/pipermail/nut-upsuser/attachments/20241122/8a156a5a/attachment.htm>
Greg Troxel
2024-Nov-23 13:13 UTC
[Nut-upsuser] Shutdown the servers first, keep the network running
Dan Langille via Nut-upsuser <nut-upsuser at alioth-lists.debian.net> writes:> I have an idea for my shutdown process at home. My goal: maximize the network run-time. At present, the UPS has a run-time of about 57 minutes. > > This is my idea: > > * shutdown the servers after 15 minutes of downtime (for me, that's when battery.runtime hits 40) > * leave the network gear (switches, firewall, wifi) running so I can continue with Internet access > > Optionally: > * when we get down to 10 minutes, let everything else shutdown > > The goal: I can keep working from my home office - there's a separate UPS up there.I live in a town with a high ratio of trees that could fall on a line / electric meters and thus we have a fairly large number of outages, even though our power company does a great job. I will second Kelly's point that once power has been out 5 minutes it is unlikely to be back soon. I have been keeping track, and basically - There are a lot of 3-5s outages. I believe these are faults that clear themselves (squirrel stops conducting :-( or branch falls the rest of the way) or have cleared by the time the recloser recloses. Sometimes it is 5s out, 2s on, 5s out, 2s on, 5s out, back on. Sometimes that and then just out. Often just out cleanly, and sometimes pretty messy. - There was one outage recently of just about 3 minutes. I don't understand what happened, but it was apparently substation-wide. I am guessing some protection tripped and because they were there it could be brought up again fast. I have no memory of this ever happening - There was a scheduled outage at 0300, when it was well below freezing, to remove branches from a 115 kV line, that lasted only 13m. A huge round of applause for the guy in the bucket truck who did not damage the transmission line! Also, I had a 19m outage, which I suspect was also planned (if not announced) as part of restoring others after damage. - After that, I think the fastest was 28m, and 40-70m typical, for things that were "minor". There have been changes to distribution protection and these are rarer; I think reclosers are effective at ensuring that close-to-fault protection devices open, enabling the rest to stay on. - Plus some longer ones (multiple hours to small numbers of days), resulting from more serious damage, from trees on wires to broken poles. So aside from the single 3m outage, I would have told you that once power has been out for 30s, it's going to be 30m at least, maybe more. But given that, the 5m guidance sounds good. The other thing is that I more or less believe that running the UPS all the way out is probably rougher on batteries than shutting down when it claims 10m. But I also believe that UPS service is really tough on batteries and they seem to be reliably in need of replacment at 4 years. And, almost every battery I have pulled from a UPS (which I do when it becomes troubled) has been messed up, usually a shorted cell or a very weak cell. Whereas batteries proactively pulled after 5y from a FiOS ONT, are often ok. So I am not at all sure that trying to be nice to the batteries is a good strategy. So I would recommend: - shut down servers after 5 minutes of outage - shut down firewall and killpower when runtime <10m (or maybe 5m) - have some way to start servers, such as switch controllable by firewall - once you have a way to bring servers back hands off, consider server shutdown at 30s of outage, and restore after 15m of no outage - going over all your non-server stuff and thinking if you can reduce usage - log outages and also transfer to battery events. log remaining runtime vs time so you can see what the mapping is from reported runtime to actual runtime Your outage patterns may be different, so I may be off about the precise timings. I suspect though, that there is a gulf between "protection device restores power in seconds" and "truck roll". 28m for drive to fault, visually inspect, decide it's cleared, replace fuse, is amazing and only happens if the people are in the office next to the truck, and even then it needs more luck.
Jim Klimov
2024-Nov-23 23:13 UTC
[Nut-upsuser] Shutdown the servers first, keep the network running
To me, the trick would be not about shutdown, but how to start those servers (which would fully power-off to save battery time for network gear) if wall power comes back early. If automation for that is needed (e.g. you don't want to depend on manual labor to power back the main servers), that would need scripting (upssched?) to either reboot the remaining NUT server and the UPS if power comes back and main servers are off, or to make use of wake-on-LAN/IPMI/... if available. Or use ePDUs to power the servers and recycle each relevant socket when suitable. Jim On Sat, Nov 23, 2024, 03:50 Dan Langille via Nut-upsuser < nut-upsuser at alioth-lists.debian.net> wrote:> I have an idea for my shutdown process at home. My goal: maximize the > network run-time. At present, the UPS has a run-time of about 57 minutes. > > This is my idea: > > * shutdown the servers after 15 minutes of downtime (for me, that's when > battery.runtime hits 40) > * leave the network gear (switches, firewall, wifi) running so I can > continue with Internet access > > Optionally: > * when we get down to 10 minutes, let everything else shutdown > > The goal: I can keep working from my home office - there's a separate UPS > up there. > > Thinking about the plan: > > * the firewall runs nut and monitors the UP > * the servers can take action and shut themselves off - they run run nut > * the firewall will be the only nut instance still running after the > services go down > > > The Eaton 5PX UPS has some configurable items in it. I may be able to use > them as well. Looking at my notes[1] from way-back-when, I was unable to > get FINALDELAY to be observer (why, is not clear). > > I'd be happy to hear suggestion and idea based on your experience please. > > I'm running FreeBSD 14.1 and nut 2.8.2 > > 1 - https://dan.langille.org/2020/09/13/nut-testing-shutdown-and-startup/ > - I need to redo that timing - none of these servers are still here, and > the new ones are not catered for. I hope to be replacing the batteries soon > - plenty of opportunity to do that work then. > > -- > Dan Langille > dan at langille.org > > _______________________________________________ > Nut-upsuser mailing list > Nut-upsuser at alioth-lists.debian.net > https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsuser >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://alioth-lists.debian.net/pipermail/nut-upsuser/attachments/20241124/947849b2/attachment.htm>