So this is a question that I have been asked frequently. I think it is better to put an explanation here and forward everyone.
It is very common to see new VoIP companies startups. But as with any startup, the entrepreneurs look for the cheapest option with the best performance and availability. First of all, I must say there is no 100% fault-free architecture, but we can archive 99.99999% and as many 9 as you want. It is all about the money you want to invest.
I will explain an architecture that can be close to 5 nines, 99.999%. It will depend on the money to implement it fully or just a part of it. Please remember that I am telling you what, not how.
General Architecture Overview of a PBX Cluster
In this example, there are two data centers. Each data center is isolated from each others with its Internet addressing (or different network IPs) and ideally, they are geographically distant (let's say one in Paris, France and the other in Los Angeles, US).
As you see, the servers that are in each node are almost the same. I will talk about each one and its role.
- SIP Server (active): it is the server that will have the current floating IP. All the load that arrives in the current data center will be handled by this server. The active server will have then two IPs, the floating one which it must be a public one and a static one that will allow system admins to reach it (whatever the selected method). The active and passive servers will need to handle synchronization.
- Sip Server (passive): it is the server that will be on hold until there is something that does not allow the active server operations. The passive server will need to be in sync with the active one, not only pointing to a common database infrastructure but having the same production files as you never know when the server active server goes down.
- DB nodes: in this illustration, there is a cross-data center cluster. I recommend having at least two servers in each data center to avoid any speed issues or data center communications. Thanks to the new capabilities in MariaDB 10, it is now possible to have an active-active cluster.
- DNS Server: please note that at least you must have two DNS servers, it is up to you if in your third data center you wish to deploy it or not. The DNS server will serve a TLD domain (or subdomain, depending on your configuration) using the smart DNS plugin. This will send endpoints to the fastest (not necessarily the closest) data center. Smart DNS is intelligent enough to detect when one data center is unreachable take it out of the IP pool and push it into the pool when it is online.
If you decide to go for this approach, you will find that you really must have very bad luck to be offline. Both data centers must go down!
Anther advantage of this approach is it can be deployed using VPS'es. Please note that not all the VPS companies will offer you the floating IP capability or you can go and deploy it using bare metal servers. If those two options are out of your reach, using one server will work for you (no floating IP), but do not forget it will be more vulnerable to service disruptions.
Data Center VoIP Node Architecture
Now that you have the overview in terms of what data center stuff means, we will talk about the internal architecture of a single node.
https://static.inside-out.xyz/images/pbx-node-architecture.png" />
The SIP server has the following elements:
- Apache: it will handle the HTTP (80/tcp) and HTTPS (443/tcp).
- FusionPBX: one part of the PBX core, it will provide all the Class 4/5 functionalities
- Billing for FusionPBX: it adds billing and LCR support to the FusionPBX project
- Memcached: it speeds up most common data access by storing them in memory
- DB Balancer: it will decide what node of the database cluster to use. Depending on your environment, it can have priorities or round-robin.
- FreeSWITCH: the main element that will control the signalling protocol SIP and the voice flow RTP
The DNS server has the following elements:
- DNS Balancer: a DNS add-on for PowerDNS that allows resolving the correct IP based on network conditions. It is not geo-localization or round-robin.
- Memcached: which it saves some DB accesses
Data Flows between the PBX Elements
FreeSWITH will use FusionPBX's XML handler to feed itself all the information needed to set up the configuration for the modules, dial plans, directory information (SIP extension authentication) and language settings among others. FusionPBX's XML handler gets the information first from the Memcached, if the information is not there then it gets directly from the database by doing a connection through the database balancer. XMLHandler will store information in the Memcached for future use. FreeSWITCH will also do direct database connections through the database balancer to save operational information such as registration, call state, queues and others.
FusionPBX (the WEB interface) feeds the database through the database balancer. It will flush the Memcached sub-system to make sure changes affect the system right away. The billing software is an add-on which resides inside FusionPBX. Having a native interface takes out extra servers from the picture. Billing has LCR capabilities and pseudo-enum support, which means it will save you money from your selected carriers as much as it can.
In the specific case of inter-data center calls (userA@domain1 registered in data center 1 calls userB@domain1 registered in data center 2), because all FreeSWITCH share the same database, it is easy to know where the endpoint is registered. Both FreeSWITCH servers, the caller and the callee, will establish a connection.
Both servers will have some kind of software (there are many options) to allow the filesystem synchronization of some specific paths, not only in the double active-passive but among everyone.
While operating, VoIP servers will collect passive information about network conditions. The DNS add-on will use that information to resolve the correct A and SRV records. The DNS answer will be the optimized one to let an endpoint have connectivity with the data center that has the best network conditions.
Failure Tolerant
This architecture is failure-tolerant in the following ways:
- Database cluster will be up and running, you will be able to turn off one node without disturbing the data flow. When the node comes back, it will sync with the other nodes before starting to accept connections
- Passive SIP Server gives FreeSWITCH in high availability if the active one crashes.
- Smart DNS will forward connectivity if the data center link gets slow or if it is down. In the worst-case scenario, users will need to restart their IP Phone (depending on the brand).
Budgeting your Cluster PBX: It is affordable!
This scheme shouldn't be too expensive if you are getting serious in the VoIP business. Some quick numbers (taking the cheapest numbers, not necessarily the best quality):
- 2 GB RAM VPS for VoIP Servers using KVM virtualization cost approx 20 USD monthly, you will need four of them
- 4 GB RAM VPS for the DB nodes with OpenVZ virtualization costs approx 10 USD monthly, you will need four of them
- Billing license by funding the Opensource campaign 200 CAD (150 USD approx) one-time payment, you will need two licenses
- Smart DNS license for one domain (unlimited subdomains) is 150 USD
- TLD Domain, from 0.88 USD per year (cheapest option)
So, if my maths are correct you can maintain this infrastructure, it will cost you a kick start of 421 USD, plus monthly fees of 120 USD.
Good Luck!