Martin Bene
2004-Jan-05 09:18 UTC
[Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
Hi Richard,>Load balancers have some added value, but those that have had to deal >with a problem where a single system within the cluster is up but not >processing data would probably argue their actual value.I've done quite a lot of work with clustered/ha linux configurations. I usualy try to keep additional boxes/hardware to an absolute minimum, otherwise the newly introduced points of (hardware) failure tend to make the whole exersize pointless. A solution I found to work quite well: Software load balancer (using LVS) run as a HA service (ldirectord) on two of the servers. This allows use of quite specific probes for the real servers being balanced, so a server not correctly processing requests can be removed from the list of active quite reliably. Since the director script is perl, adding probes for protocols not supported in the default install is fairly streightforward.>If any proposed design actually involved a different MAC address, >obviously all local sip phones would die since the arp cache timeout >within the phones would preclude a failover. (Not cool.)Arp cache timeouts usualy don't come into this: when moving a cluster IP address to a different NIC (probaly on a different machine) you can broadcast gratuitous arp packets on the affected ethernet segment; this updates the arp caches of all connected devices and allows failovers far faster than arp chache timeout. Notable exception: some firewalls can be quite paranoid wrt. to arp updates and will NOT accept gratuitous arp packets. I've run into this with a cluster installation with one of my customers.>Technology now supports 100 meg layer-2 pipes throughout a city at a >reasonable cost. If a cluster were split across mutiple >buildings within a city, it certainly would be of interest to those >that are responsible for business continuity planning. Are therelimitations? I'm wary of split cluster configurations because often the need for multiple, independent communication paths between cluster nodes gets overlooked or ignored in these configurations, greatly increasing risk of "split-brain" configurations, i.e. several nodes in the cluster thinking they're the only online server and trying to take over services. This easily/usually leads to a real mess (data corruption) that can be costly to clean up. When keeping your nodes in physical proximity it's much easier to have, say, 2 network links + one serial link between cluster nodes thus providing a very resilient fabric for inter-cluster communications.>Someone mentioned the only data needed to be shared between clustered >systems was phone Registration info (and then quickly jumped >to engineering a solution for that). Is that the only data needed or >might someone need a ton of other stuff? (Is cdr, iax, dialplans, agi, >vm, and/or other dynamic data an issue that needs to be considered in >a reasonable high-availability design?)Depends on what you want/need to fail over in case your asterisk box goes down. in stages that'd be 1 (cluster) IP address for sip/h323 etc. services 2 voice mail, recordings, activity logs 3 registrations for connected VoIP clients 4 active calls (VoIP + PSTN) For the moment, item 4 definitely isn't feasible; even if we get some hardware to switch over E1/T1/PRI whatever interfaves, card or interface initialisation will kill active calls. Item 2 would be plain file on-disk data; for an active/standby cluster replicating these should be pretty straigthforward using either shared storage or an apropriate filesystem/blockdevice replication system. I've personaly had good experience with drbd (block device replication over the network; only supports 2 nodes in active/standby configuration but works quite well for that.) Item 3 should also feasible; this information is already persistent over asterisk restarts and seems to be just a berkley db file for a default install. Sme method as for item 2 should work.>I'd have to guess there are probably hundreds on this list that can >engineer raid drives, ups's for ethernet closet switches, protected >cat 5 cabling, and switch boxes that can move physical >interfaces between servers. But, I'd also guess there are far fewer >that can identify many of the sip, rtp, iax, nat, cdr, etc, etc, >issues. What are some of those issues? (Maybe there aren't any?)Since I'm still very much an asterisk beginner I'll have to pass on this one; However, I'm definitely going to do some experiments on my test cluster systems with asterisk to just see what breaks when failing over asterisk services. Also, things get MUCH more interesting when yo start to move from plain active/standby to active/active configurations: here, for failover, you'll end up with the registration and file data from the failed server and need to integrate that into an already running server merging the seperate sets of information - preferably without trashing the running server :-) Bye, Martin