User Rating: 4 / 5

Star Active Star Active Star Active Star Active Star Inactive
 
north-america-servers-and-users.png

In this post, I am going to talk about how to configure FreeSWITCH in a high availability active-passive schema. The active-passive approach will share a floating IP between your VoIP switches and when one gets off-line, the passive one will take control over the IP and it will get the load. For more information about how this works, I strongly suggest you read my article about the High Availability Cluster Overview; you will understand what you are going to do.

This article will assume the following:

Configuring the Corosync and Pacemaker

The easiest way is by using the pcsd service. Before starting the service please make sure you have the following requisites:

  • Passwordless SSH key exchanged
  • Servers must have an FQDN hostname
  • Hostnames must be resolvable against each other
  • hacluster username on both servers needs to have the same password

Start the pcsd service on each node and then type:

  • Pacemaker 1.x:  pcs cluster auth freeswitch-1 freeswitch-2, and then pcs cluster setup --name cluster_name freeswitch-1 freeswitch-2
  • Pacemaker 2.x:  pcs host auth freeswitch-1 freeswitch-2 and then pcs cluster setup CLUSTERNAME freesiwthc-1 freeswithc-2

You will be asked for a username. It is a good idea to use hacluster username, as CentOS/Rocky RPM's will do all the privilege setup for you. You will be asked for the password. If all goes for good, you will have no errors.  If there is no error when you type pcs status or crm status you will see your nodes. You might want to restart corosync and pacemaker services.

To be sure, check your /etc/corosync/corosync.conf configuration file, both files should have the same. Here it is my config file, do your changes to fit your needs.

totem {
    version: 2
    secauth: off
    cluster_name: gssg
    transport: udpu
}

nodelist {
    node {
        ring0_addr: freeswitch-1
        nodeid: 1
    }

    node {
        ring0_addr: freeswitch-2
        nodeid: 2
    }
}

quorum {
    provider: corosync_votequorum
    two_node: 1
}

logging {
    to_logfile: yes
    logfile: /var/log/cluster/corosync.log
    to_syslog: yes
}

Please make sure the hostnames you use are resolvable. You will also need to add an IPTables rule to warrant free traffic from and to port 5405/udp.

You must make the hostnames of your nodes resolvable, you can do this with your DNS or by editing your /etc/hosts file on each file. It is up to you.

You can start the corosync and pacemaker daemons on each node. Use the command service or systemctl, for example, service corosync start and service pacemaker start. Type crm status to verify all is good, you shall see nodes names.

Type these commands, this will allow corosync to deal with 2 nodes cluster:

crm configure property stonith-enabled=false

crm configure property no-quorum-policy=ignore

or

pcs property set stonith-enabled=false

pcs property set no-quorum-policy=ignore

You can read the current configuration by typing crm configure show.

Configure your FreeSWITCH for High Availability

At this point, you have your cluster ready but there is nothing to do. Before we continue, there are some configurations you need to do to your FreeSWITCH server.  Remember at this point you should have your FreeSWITCH with the following milestones:

  1. Up and running
  2. Sharing an external database
  3. Synced filesystems

When you are done, you need to add a tracking call method to your FreeSWITCH profiles configuration. Add the tag <param name="track-calls" value="true"/> on them. The way you do this depends on your deployment; for example, if you are using my FusionPBX RPM's, it is done by adding it using the web interface, as FreeSWITCH uses the MySQL/MariaDB to get the profiles-config. You need to add the <param name="odbc-dsn" value="odbc://dsn:username:password"/> as well. You may want to do this on all configuration files that support DSN to allow data sharing as much as possible.

In order to be able to let the switch take the load of the other, they need to have the same name. But don't worry, this is not the hostname, it is the switchname variable, which is completely different. You can edit the switch.conf.xml file (usually in /etc/freeswitch/autoload_configs) and add or modify the tag <param name="switchname" value="freeswitch_name"/>.

We are almost done, it is very important to let the passive Linux system to bind IP addresses they don't have. To do this, type sysctl -w net.ipv4.ip_nonlocal_bind=1 and then sysctl -p /etc/sysctl.conf. Some new Linux distributions such as CentOS 7 or Mageia 5+ will need something like echo net.ipv4.ip_nonlocal_bind=1 > /etc/sysctl.d/98-nonlocal_bind.conf

Last step is to tell your FreeSWITCH SIP Profiles to bind the floating address. Add this configuration:

<param name="rtp-ip" value="FLOATING_IP"/>
<param name="sip-ip" value="FLOATING_IP"/>
<param name="presence-hosts" value="FLOATING_IP"/>
<param name="ext-rtp-ip" value="FLOATING_IP"/>
<param name="ext-sip-ip" value="FLOATING_IP"/>

Adding the Floating IP Resource

The next step is to create the pacemaker resource in order to make the floating IP. You can do this by typing crm configure primitive ClusterIP ocf:IPaddr2 params ip=FLOATING_IP cidr_netmask=FLOATING_IP_CIDR_MASK nic=eth0 iflabel=ipha op monitor interval=10s or pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=FLOATING_IP cidr_netmask=FLOATING_IP_CIDR_MASK op monitor interval=10s when you are done, type crm status to see if all goes for good. Remember to change FLOATING_IP for the correct numeric IP, and FLOATING_IP_CIDR_MASK for the proper (usually this will be 32, but that depends on your network environment). If you do a mistake, you can delete the resource by typing crm_resource --clear --resource ClusterIP.

If you are using Digital Ocean, you may want to read my newest article about how to configure the floating IP for an HA cluster. Because Digital Ocean's floating IP approach is done through a NAT, the approach is different and there is a side-effect you must be aware of.

And you are done, test it.

Adding the FailOver Resource

The next step is letting know the FreeSWITCH what to do when it is the new active node. For that, I have published a modified version of a FailOver Resource for Pacemaker, you can download it at https://bitbucket.org/okay-network/fail-over-resource-agents-for-pacemaker or if you are using my OKay RPM repository just install it via yum command, the name of the package is resource-agents-failover-script.

Create a file in the /usr/libexec/failoverscript/ directory. The name is not important, but remember that Linux sorts it, so a name with 00- at the beginning will warrant you it will be executed first. Put something like: /usr/bin/fs_cli -x 'sofia recover' and make it executable. If you have a buggy router, you may need to add /usr/sbin/arping -q -c3 -A -i interface FLOATING_IP.

Add the resource and the proper constraints with these commands: pcs resource create FailOverScript ocf:heartbeat:FailOverScript op monitor interval=10s or crm configure primitive FailOverScript ocf:heartbeat:FailOverScript op monitor interval=10s, then pcs constraint colocation add FailOverScript with ClusterIP INFINITY or crm configure colocation sofia_recover-with-ip INFINITY: FailOverScript ClusterIP , and pcs constraint order ClusterIP then FailOverScript or crm configure order sofia_recover-after-ip mandatory: ClusterIP FailOverScript

Pros, Cons and Side-Effects

Running a high availability cluster has its differences if you compare it with a stand-alone server. So, I will start with the pros:

  1. When the active node becomes unavailable, the passive will take over. Your end-users who are having a conversation at that moment will notice some 2-3 second dead air effect. After that little silence, there should not be an issue.
  2. You may use the passive node for any other VoIP task such as CDR Importing; doing some special configurations, you may be able to exchange tasks.

Some cons I can think about are:

  1. The server switch will only happen when the actual active server stops answering. A slow server is an alive server.

There is one side effect you need to know:

  1. Endpoints who are registering using TCP may need to re-register. Because TCP uses a sequence number to prevent the session to be stolen, this protection prevents the new server to continue the actual connection. So, the best-case scenario is using SIP through UDP.

What is next?: Active-Active Load Balancing

The approach I am explaining in this post shows an Active-Passive scheme with floating IP. This means that the first server will take all the load until it fails. Sometimes, you may need to spread a little load. If you are looking for the active-active and load-balanced cluster configuration, I will post later how to do that.

As an advance, there are two ways to do this:

  1. Active-Active with floating IP without Kamailio: all nodes must be on the same local network. The balancer will be responsible for load balancing, and the FreeSWITCH boxes will be responsible to do inter-switch connections (if a user registered at box A is calling a user at box B, they should connect).
  2. Active-Active without floating IP without Kamailio: nodes must be spread around the globe. This balancing approach uses the smart DNS add-on I coded myself, this will be responsible to do the balancing. So, this way you can warrant your endpoints connects to the fastest and closest server available. Anyway, the FreeSWITCH servers will need configuration in order to do inter-switch connections as well. You may want to read the Balancing, Clustering and High Availability for FusionPBX article to know a little more about this approach.

Do You Need Help?

You can always contact me for support. Remember this is an advanced topic.

Enjoy!

blog comments powered by Disqus