One of the things I consider myself very good at is doing a lot with little, technologically speaking. As I was requested to deploy a new VoIP Custer, I thought this was the perfect opportunity to create an architecture that doesn't have the waste or resources the high-availability scheme has, yet doesn't have the issues or recovering for all endpoints the load-balancing scheme has. So I designed this architecture that doesn't have 50% of the resources doing nothing, while the recovery of the service is neater.

This is what I call a hybrid high-availability and load-balancing approach.

A Cluster Overview

This deployment requires at least 6 servers. The following diagram shows a basic deployment.

The identified elements are:

  • three PBX servers that are responsible for handling the SIP and RTP flows. These servers have installed FusionPBX/CoolPBX, FreeSWITCH, Memcached, some Lua supporting scripts and more stuff. One of these servers acts as a passive server (waiting for one of the others to fail); and
  • three database servers whose mission is to hold all the information as a common storage. These servers could be synchronous or semi-(a)synchronous, this is a matter of speed and data availability.

Pros, Cons and Side Effects of a Hybrid HA+LB FusionPBX/CoolPBX Cluster

Any cluster holder needs to remember that the cluster is far to be a stand-alone server. There are internal differences; information flows are different and, consequently, how it operates is not the same. First, let's list the pros:

  • Only 33% of the resources are used as a backup.
  • The recovery of the service is as fast as the high-availability approach.
  • No need for extra DNS software, a simple DNS with SRV and A records will do the job. If you still want to use a DNS balancer, you can do but this is optional.

Now the cons:

  • All servers must be in the same DC.
  • The DC's router must be able to update its ARP table on request (requisite).

There are also some side effects:

  • The cache must be flushed individually if you need a change to take place right away. A workaround could be placing a crontab that flushes all cached information periodically. The memache engine is highly recommended not just because it has an expiration mechanism, but because it minimizes the I/O.
  • System file synchronization must be reliable.
  • Rebooting the database server: whatever reason makes you reboot a database node, never reboot them all simultaneously. Reboot one, wait for it to recover, reboot the next and so on.

Information Flow

There are some aspects of how the information flows and how to make the most of all servers. HAProxy should be installed in all PBXes to split the traffic and load of the databases. If you select the semi-(a)synchronous approach, you will benefit from this. In short, a semi-(a)synchronous database cluster will only wait for one node to report synchronization to continue with the next tasks; the remaining nodes will catch up later.

You can detect the passive PBX node to assign tasks such as CDR processing.

In the odd scenario where a member of a tenant is registered on a different server, this scheme will work the same way as the load-balancing approach.

generic fusionpbx cluster loadbalanced no rtp shortcut

Fault Tolerance in a Hybrid High-Availability & Load-balanced FusionPBX/CoolPBX Cluster

There are two layers of fault-tolerance mechanism in this configuration:

  • Layer 3 works with the floating IP. The three servers are constantly monitoring each other, when one stops answering, the passive one will take over the IP, execute the fail-over tasks and continue the service.
  • Layer 5 works with the DNS. With the SRV approach only, the endpoints will have to honour the DNS policy and do the fail-over themselves. With the Smart DNS, the A record will be updated as well; the endpoints will have to do the switch as soon as they detect the new value on the DNS answer. This will only happen if the Layer 3 failover mechanism fails for any reason.

Optimizing your VoIP Cluster

Yes, it is possible. I suggest you read my RTP/SDP optimization article to get an idea of what can be done.

How do I get a Cluster like this?

Easy, just send me a text on any of my social media accounts. I am sure we will get into an agreement and you will enjoy a brand new PBX Cluster.

Good luck!

";