This is not a new capability. Since 4.2 (maybe before), FusionPBX has offered this capability. When a caller sends a voicemail message, FusionPBX can convert it to text as well. This is handy, not everybody has the patience to listen to the sounds. Sometimes, reading is faster. This service is possible because of the Bing API from Microsoft.

Since 2018 (not sure when), Microsoft has deprecated Bing API in favour of Azure. This means that legacy users may still use it (for a while, we don't know yet when it will drop completely) but new users who want to use the transcribe service won't be able to do it.

I will explain here how I figured out how to make it work with Azure.

Sign up for Azure Account

You will need an Azure Account. Don't worry, they are free. Azure will give you a 260 $ 260 credit if you want to try the full potential of the platform. However, there is a free plan for speech service. If I recall for good, the free plan only allows one transcript translation at a time (no simultaneous). So, let's pretend you can sign up and you understand the billing schemas from Azure.

The next thing you will need to do is to create a resource.

azure creating resource

The important thing here is you remember the location you select. You must know that when getting the keys, they are attached to the location. If you use a key attached to Asia in a North America East server, it won't work. I learned this the hard way.

Once you create the resource, you can take note of the key number one. This is the only one my patch uses.

The Patch for Azure Voicemail Transcribing on FusionPBX

I have sent pull requests #4067 and #4068. Let's hope they are accepted. I will update the article when I get an answer regarding this.

UPDATE: The pull requests have been accepted.

I will now explain what the patch does:

  • for legacy Bing users, it adds more fault tolerance. Bing always answers with a JSON payload, however, it does not always work. So, my patch doesn't assume it works, it verifies the JSON payload to know if the transcription was successful or not.
  • for new Azure users, here is the fun:
    Use the following default settings:
    Category Subcategory Type Value Description
    Voicemail azure_key1 text (Azure key) The Azure key from the resource 
    azure_server_region text (Azure region host) The Azure region host, if the region is East US, the value should be "eastus". Check the full list of Azure regions
    transcribe_enabled boolean true Enable it in the system. You will need to enable it in the voicemail settings as well.
    transcribe_language text en-US The language you will use. Although FusionPBX supports default settings per domain, the LUA script doesn't check the domain (I will need to read that part of the code further). So for now, assume it is system-wide.
    transcribe_provider text azure Set this value to use my patch

    With these settings, my patch can get the transcription and dig into the JSON payload to get it.

    My patch also adds Memcached support. I read somewhere in Azure documentation that the Access Token you negotiate with the key it is valid for ten minutes. Therefore, we can save time and resources by saving it. I have hardcoded it to seven minutes. If you are running FusionPBX as a personal PBX you may not find this useful, but if you are in the VoIP business with a PBX that answers more than 1000 calls a day with voicemail transcript, you will find it handy.

You are ready to go.

UPDATE: Since FusionPBX 5, the STT engine has been moved from LUA scripts to PHP. The default settings should be working as usual. There were two problems with the LUA approach. The first one was the authorization token with the use of the file caching engine (default since FusionPBX 4.5?). Since the file caching engine doesn't have an expiration, pushing the token to the cache will make it impossible to expire, as a consequence, the Lua scripts will end using expired authorization tokens and the STT will fail; second, the Lua scripts were using the popen() function, which blocks the thread until the query has been completed. This stops SIP processing for a moment, which is undesirable.

In my opinion, the PHP approach was not the right move. There are different ways to work around it. Do you remember my Caching patch? (that gives the file-engine expiration capabilities), or you just keep using Memcached. And there are non-bocking functions (instead of popen()) to query external functions.

Good luck!

";