Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive

Today, while I was watching TV, I got a support request from one of my customers; his name doesn't matter. Issue was conferences were not working, first user was able to join, but the second couldn't. So, here it is my explanation of the bug; I hope this is useful for someone.

I should describe a little his deployment. He has a 4-server cluster in High Availability: 2 servers in active-passive for the PBX with floating IP and 2 servers with the database in active-active mode. All PBX data is stored in the database, not only the FusionPBX'  but the FreeSWITCH' as well.

After reading the LUA code, I found the error. This was a bug I introduce years ago when I was working on the load balancing support on FusionPBX. I think it was about release 3.6. Bug is on file resources/install/scripts/app/conference_center/index.lua.

In order to make the cluster in HA, switches need to have a different hostname and same switchname. The hostname is defined by the operative system (Linux) and switchname is defined in the file switch.conf.xml (/etc/freeswitch/autoload_configs/switch.conf.xml if you are using my RPM's).

Check the following code (taken from GIT - November 18th, 2015):

--check if someone has already joined the conference
local_hostname = trim(api:execute("switchname", "")); -- here it is where I am suggesting hostname
freeswitch.consoleLog("notice", "[conference center] local_hostname is " .. local_hostname .. "\n");
sql = "SELECT hostname FROM channels WHERE application = 'conference' AND dest = '" .. destination_number .. "' AND cid_num <> '".. caller_id_number .."' LIMIT 1";

if (debug["sql"]) then
    freeswitch.consoleLog("notice", "[conference center] SQL: " .. sql .. "\n");
end

status = dbh_switch:query(sql, function(rows)
    conference_hostname = rows["hostname"];
end);

--if conference hosntame exist, then we bridge there
if (conference_hostname ~= nil) then
    freeswitch.consoleLog("notice", "[conference center] conference_hostname is " .. conference_hostname .. "\n");
    if (conference_hostname ~= local_hostname) then
        session:execute("bridge","sofia/internal/" .. destination_number .. "@" .. domain_name .. ";fs_path=sip:" .. conference_hostname);
    end
end

--call not bridged, so we answer
session:answer();

First thing this code does is to fill the local_hostname. This variable will be used later, but I think it should get the hostname instead the switchname. Switchname is a label used for the HA; usually switchname value is not available in the DNS. And this is the reason of the bug.

Later the script will do a SELECT into the channels table. This way it will know if someone has already joined the conference. If you are the first one, the variable conference_hostname will be nil, if you are the second it will have the hostname of the server who is hosting. Note, the variable name is hostname.

If conference_name is not nil, it will check if conference_hostname and local_hostname are the same. If they are the same it means you are in the right server, otherwise it will forwrad you to the correct one (using fs_path). Here is where the error comes, the if will fail because conference_hostname holds a hostname, and local_hostname holds a switchname value. And as I have explained before, in HA environments (active-passive) you use the switchname value to let the HA work; switchnames and hostnames are different values in this scenario (on most cases, when you are not using HA it is the same).

I have reported this bug as soon as I found it in the pull request #1242. Sadly, it was rejected saying:
switchname is more versatile and moving to hostname would reduce functionality.

Whatever you understand for versatile.

FusionPBX pull request 1242

I'm starting to have the feel they are systematically rejecting all my suggestions as none of the pull requests I have sent since they move to GIT has been accepted.

I will include this patch and others I have in my RPM's.

Enjoy!

blog comments powered by Disqus