I have written before about the load-balanced FusionPBX cluster I offer to the public. Now, I will write about the high availability one. In order try to answer all your possible question, I am writing this article. I hope after a reading you get a clear picture what it is and what it is not a high availability FusonPBX cluster.
First thing I need to clarify that a high availability cluster is not a load balanced one. Although both kinds of approaches are not mutually exclusive (you can combine them), their pros and cons are different and the way they work as well. I will write later a comparison between them, for now just remember it is not the same.
A Cluster Overview
A high availability cluster deployment needs at least 5 servers. The following picture shows a basic deployment.
The elements are:
- two PBX servers who are responsible for handling the SIP and RTP flow. These servers have installed FusionPBX, FreeSWITCH, Memcached, the Lua supporting scripts and more stuff. One of these servers is playing an active role, the other is a passive one.
- two Database servers who are the ones to store all the information that FreeSWITCH and FusionPBX use.
- one Arbitrator server that its only role is to avoid the brain-split issue when the communication between the two database nodes breaks.
Pros, Cons and Side Effects of a Load-balanced FusionPBX Cluster
Any cluster holder needs to remember that the cluster is far to be a stand-alone server. There are internal differences; information flows are different and as a consequence, the way it operates is not the same. First, let's list the pros:
- Fault tolerance: when a node goes down, the other node will take the load
- Continuity: when a node goes down, the end user will not notice more than a dead air for few seconds.
- 100% compatible: all phones work with this approach since they do not see the cluster. As for end user's eyes, there is only one server.
Now the cons:
- The passive server may be considered a waste of resources. It will be sitting doing nothing until the active fails.
- All the servers must be in the same data centre. As the floating IP needs to be in the same collision domain, the only way is they share the same network. You may work around this if you have an MPLS network between two data centres; but since this is very expensive, most data centres do no provide MPLS or BGP solutions, therefore if a data centre goes down, you may go down as well. The cheapest fix for this is combining the high availability and load balancing approaches.
Some side effects that are not good nor bad, but they are different than a stand-alone server:
- File synchronization: as the Memcached behaviour, this is the same, you do not know where the call is hitting. Although the PBX cluster has a synchronization mechanism (regardless which one you selected), the important thing here is to place a synchronization policy. It is very CPU expensive to synchronize every five seconds, you will waste valuable CPU resources that will impact the quality of your service. You can opt for a five minutes synchronization policy, or a midnight policy. Whatever it is, do not forget about this, it is very common complains about an IVR not playing the proper recording and the cause is that the file has not been synched yet.
- Rebooting the database server: whatever reason makes you reboot a database node, never reboot them all at the same time. Reboot one, wait it recovers, then reboot the next and so on.
- Your router needs to have off any kind of ARP caching parameter. Some routers do cache to speed things, but in this case, this could do a really big harm as the floating IP technique relies on updating the ARP tables.
A cluster's information flow is a little different than a stand-alone FusionPBX server. In a stand-alone deployment, all flows are almost local, then only flow that is external are the ones related to the endpoints or bridging a call to the PSTN. A cluster has some extra flows that cross among the servers.
The endpoints will not make a difference in this case, as for their point of view, they are just connecting to a single IP. The database will replicate the information with the other node.
In the very case of a high availability cluster, information flows are almost the same as a stand-alone deployment. The big difference is the database which it is external. Although you may think why not to let the database in the PBX, beside the future performance issues, when the active node goes down, the database will be as well. Depending on the kind of the crash, the database could not recover easily.
Fault Tolerance in a High Availability FusionPBX Cluster
When a fault tolerance event happens, the endpoints will keep pointing to the same IP. The floating IP technique used in the high availability PBX cluster detects when the active node stops answering and it is here when the passive node takes the floating IP. When the move happens, the now new active server will notify the neighbours it is now the new holder of the floating IP and it will run a FreeSWITCH recovery process. The end user will only notice a dead air for quite some seconds after that conversations will continue normally.
The following image shows the fault tolerance event.
When the faulting server recovers, it has a new role as the passive node. The roles will switch when the actual active node fails.
I hope this gives you a very clear picture of the way a high availability FusionPBX cluster works.
How do I get a Cluster like this?
Easy, just send me a text in any of my social media accounts. I am sure we will get into an agreement and you will enjoy a brand new PBX Cluster.
blog comments powered by Disqus