High availability is one of the sexiest subjects these days. There are many ways to achieve it. One of them is the DNS approach. DNS works in the fifth layer of the OSI Network Model, it is the responsible to store directory information such as the FQDN to IP translation, public key storage for DKIM (an anti-spam and integrity technique), SPF information storage (another anti-spam technique), geolocation storage among others. But in this very specific case, I will talk about the SRV records and its relationship with the High Availability, and more specific with VoIP.
An SRV DNS record must meet the following format:
_service._protocol.name. TTL class SRV priority weight port target.
An example of an SRV record is:
_sip._udp.inside-out.xyz. 86400 IN SRV 0 5 5060 sip.inside-out.xyz.
_sip._udp.inside-out.xyz. 86400 IN SRV 1 5 5060 sip2.inside-out.xyz.
This example tells that sip server is the first one to try, if sip server goes down (stop answering) the sip2 server will take its place. Then an SRV record has the information where to look for the desired service.
Firs think you need to know is that this approach relies its success on the endpoint end. This means it is the responsibility of the clients to do the proper failover. Servers will only need to take care of sharing information.
As the decision to do the failover is from the client end, there is not defined criteria when to do it. Usually, the failover takes place when there is no answer from the TCP/UDP connection. Specifically speaking about the VoIP, most SIP endpoints do the failover properly. When the first client stops responding the SIP ACK's, the endpoints jump to the next.
But High Availability handled by pure SRV DNS Records are not the solution in some cases.
These DNS records are static, dumb by definition. This means the priorities will never change and the first server in the queue will always be hit before trying others. Usually, this is enough for many cases but it is not in others. The first situation that comes to my mind is server overload, the first server in the queue will always get the load. The second situation is in big countries such as U.S.A., Rusia, Canada, Brasil, even Mexico.
If you have servers in Montreal, QC and in Vancouver, BC for example. By using static SRV all users will go to one server, for example to Montreal. This will cause users in Vancouver not to use the closest server and as a consequence to experience a slower service.
Fortunately for everyone, I have developed a piece of software that is able to fix this situation. It is a PowerDNS Plugin for High Availability that it is the focus to get the lowest latency as possible. It is under heavy development and at the moment of writing this article, it supports two algorithms:
The Less Average Network Latency will do some passive metrics and monitoring when a request comes from a known network, it will answer with the IP with the less latency according to calculations.
Less VoIP Load will monitor activity in the FreeSWITCH daemons and it will decide to send the server to the less loaded server according to calculations.
If you wonder how this was written, first I wrote it in Perl. But Perl is heavy and slow. Each instance was using no less of 16 MB of RAM (and PowerDNS at least need 5 to start). After struggling with memory issues and speed I decided to write it all in ANSI C. Memory requirement is now 16 kB and speed is awesome. Not to mention that in the code rewriting I have added support for Memcached (if available) to speed up database requests as much as possible without disturbing algorithms and thinking of the software.
This way you be able to deploy FreeSWITCH Clusters with High Availability in an easy way.
Hope this helps.
Good Luck!blog comments powered by Disqus