High availability is one of the sexiest subjects these days. There are many ways to achieve it. One of them is the DNS approach. The DNS protocol works in the fifth layer of the OSI Network Model, it is responsible for storing directory information such as the FQDN to IP translation, public key storage for DKIM (an anti-spam and integrity technique), SPF information storage (another anti-spam technique), geolocation storage among others. But in this very specific case, I will talk about the SRV records and its relationship with the High Availability, and more specifically with VoIP.

An SRV DNS record must meet the following format:

_service._protocol.name. TTL class SRV priority weight port target.

Where:

  • service: the symbolic name of the desired service. Usually, this value matches the content of the /etc/services file.
  • protocol: the transport protocol of the desired service; this is usually either TCP or UDP.
  • name: the domain name for which this record is valid, ending in a dot.
  • TTL: standard DNS time to live field.
  • class: standard DNS class field (this is always IN).
  • priority: the priority of the target host, lower value means more preferred.
  • weight: A relative weight for records with the same priority, a higher value means more preferred.
  • port: the TCP or UDP port on which the service is to be found.
  • target: the canonical hostname of the machine providing the service, ending in a dot.

An example of an SRV record is:

_sip._udp.inside-out.xyz. 86400 IN SRV 0 5 5060 sip.inside-out.xyz.
_sip._udp.inside-out.xyz. 86400 IN SRV 1 5 5060 sip2.inside-out.xyz.

This example tells that the sip server is the first one to try, if the sip server goes down (stop answering) the sip2 server will take its place. Then an SRV record has the information where to look for the desired service.

How do High Availability and SRV DNS Records work together?

The first thing you need to know is that this approach relies its success on the endpoint end. This means it is the responsibility of the clients to do the proper failover. Servers will only need to take care of sharing information.

Because the decision to do the failover is at the client's end, there are no defined criteria for when to do it. Usually, the failover takes place when there is no answer from the TCP/UDP connection. Specifically speaking about VoIP, most SIP endpoints do the failover properly. When the first client stops responding to the SIP ACKs, the endpoints jump to the next.

High Availability handled by pure SRV DNS Records is not the solution in some cases.

Where the SRV DNS Records from a Static DNS server are not enough?

These DNS records are static, and dumb by definition. This means the priorities will never change and the first server in the queue will always be hit before trying others. Usually, this is enough for many cases but it is not in others. The first situation that comes to my mind is server overload, the first server in the queue will always get the load. The second situation is in big countries such as the U.S.A., Russia, Canada, Brazil, and even Mexico.

For example:

north-america-servers-and-users.png

If you have servers in Montreal, QC and Vancouver, BC for example. By using static SRV all users will go to one server, for example to Montreal. This will cause users in Vancouver not to use the closest server and as a consequence to experience a slower service.

PowerDNS Plugin for High Availability

Fortunately for everyone, I have developed a piece of software that can fix this situation. It is a PowerDNS Plugin for High Availability. it is focused on getting the lowest latency possible. It is under heavy development and at the moment of writing this article, it supports two algorithms:

  • Less Average Network Latency
  • Less VoIP Load

The Less Average Network Latency will do some passive metrics and monitoring when a request comes from a known network, it will answer with the IP with the less latency according to calculations.

Less VoIP Load will monitor activity in the FreeSWITCH daemons and it will decide to send the server to the less loaded server according to calculations.

If you wonder how this was written, first I wrote it in Perl. But Perl is heavy and slow. Each instance was using no less than 16 MB of RAM (and PowerDNS at least needs 5 instances to start). After struggling with memory issues and speed I decided to rewrite it all in ANSI C. The Memory requirement is now 16 kB and the speed is awesome. Not to mention that in the code rewriting, I have added support for Memcached (if available) to speed up database requests as much as possible without disturbing algorithms and thinking of the software.

This way you be able to deploy FreeSWITCH/FusionPBX Clusters in Load Balanced mode easily.

Hope this helps. 

Good Luck!

";