High Availability (also known as HA) is the capability of a system to remain online regardless of the adversary events that might happen. Then, we will state that availability is the characteristic that concerns a service to be reachable when it is needed. And as you should guess, availability can be measured using percentages (from 0% to 100%); of course, we all know the more close to 100%, the more expensive it is to deploy a system like that. In the folklore when you say a service has 3 nines of availability, you mean 99.9%.,
As a security consultant and CISSP certified, I have not found any hard definition that states how many 9s you should have to claim you have high availability. In my experience, people start calling high availability when discussing 3 nines or better. But this is only a feeling.
When someone says in the cloud, it is a very gray term that means another's computer. When we speak about servers, we think about VPS in the cloud. A name such as Digital Ocean or Vultr jumps right away into my mind. If you are looking forward to having a non-expensive, reliable system in high availability with some load balancing, this article will help you to understand how this works.
Different Techniques for High Availability
High Availability can be reached by applying countermeasures in different layers of the network layer model. In my experience, usually in the following:
- Dynamic routing: by using some protocols such as BGP or RIP an autonomous system could be relocated to a secondary data centre. As you see, this applies to network blocks.
- Floating IP: two servers that are sensed in a short period. Both servers share a single IP which is called the floating IP. When one server fails, the other takes over the IP and it starts getting all the traffic. Both servers should share all information to avoid breaking sessions, for example, HTTP sessions or SIP sessions. This approach is for the whole server and both servers must be plugged into the same collision segment (aka the same vLAN). As you see, this technique applies to the whole server.
- Smart DNS. This technique is the one I will talk about in this article. In short, the DNS answers the requests using some decisions. Depending on some network factors, for an endpoint it can resolve IP-1, but for another endpoint IP-2. As you can see, this technique applies to the domain names, not the servers. I will talk more about this.
- Native Support for the Protocol: Some protocols such as SMTP, SIP and XMPP are built with fault tolerance in mind. This kind of protocol doesn't need more than a good implementation of their specification.
Native Support for the Protocol
Protocols such as SMTP, SIP and XMPP are the perfect examples of this. SMTP is a special case, which uses MX records instead of SRV. However, those three have a fallback mechanism built in. SRV records are an ideal option if the protocol you are using supports them.
It would help if you always kept in mind that SRV records won't give you High Availability automatically. The servers within the farm should be able to take over other's work. Your servers should share information among them.
In the world of VoIP, SRV records are heavily used to accomplish High Availability. However, the decision to honour such relies purely on the endpoint. I have found that some IP Phone telephones will honour perfectly SRV records and others simply they won't. SRV records can be used to give fallback capabilities, however, their nature is static and the server with the highest priority will get all the load until it fails.
Obtaining High Availability through Smart DNS Technique
DNS is the protocol responsible for translating hostnames into numeric IP addresses. Nowadays, it is very strange a server who doesn't use DNS. If you are looking forward to having fault tolerance, using DNS instead of hard-coding IP addresses is the correct approach. Not to mention that it is easier to remember www.inside-out.xyz rather than 170.75.152.145 for IP version 4 or fe80::f816:3eff:fe33:ea02 for IP version 6.
This Smart DNS server should be capable of making a real-time decision before delivering any register. Helped by other scripts, it could resolve to the IP where latency is lower or server load is minimal. If a node goes down, the Smart DNS technique should realize this and it should take that faulty server IP out of the pool. When the server is back, the IP should be restored into the pool.
Smart DNS technique should apply on A (for IP version 4), AAAA (for IP version 6) and SRV record at least. The A and AAAA records should have a smaller TTL than SRV records. Because of the nature of the fallback on the SRV records, they can have longer TTL with minimal service disruption.
As always, like the Native Support for the Protocol, your cluster nodes should have a way to share information.
This technique is one of the best options if you are using a server in the cloud, and in my opinion, it the best if those servers (VPS) are hosted by different companies. Because you don't have control of the Autonomous System, or the physical server, manipulating the DNS is where you need to focus. All you need to do is create a VPN between the nodes (if you don't want to pass clear data through the Internet), get a valid domain name and host that zone with any of the existing implementations.
Implementations of a Smart DNS
In my career, I only know two implementations of this Smart DNS approach. I don't doubt that there are more out there, but my Google-Fu is low.
- F5 Big-IP: Not much to say that this is a proprietary appliance you get from F5. It is more than a DNS, but I am focusing only on it.
- Low Latency PowerDNS Add-On: This is a piece of software I have developed myself. It is not an appliance, but a plugin that works with PowerDNS to add smart DNS capabilities. To be fair, when writing this article my plug only has two algorithms, F5 has at least five. But F5, as it is an appliance, needs you to have access to a data centre, which discards automatically all the public who wants to use VPSs in the cloud from different cities and different companies.
Interested in having yours?
Don't waste time and contact me. There is always a way to figure out an affordable solution for you.
Good Luck!