Using Django & AssemblyAI for More Accurate Twilio Call Transcriptions

Photo by Andy Vult on Unsplash

Using Django & AssemblyAI for More Accurate Twilio Call Transcriptions

Building a Django App to Enhance Twilio Call Transcriptions with AssemblyAI's Advanced Speech-to-Text Capabilities

ยท

4 min read

Introduction

Recording phone calls is a breeze with Twilio's Programmable Voice API, but the speech-to-text accuracy can sometimes fall short, especially in niche domains like healthcare and engineering. AssemblyAI's transcription service, however, provides high accuracy by default and offers optional keyword lists for even better results. In this tutorial, we'll walk you through creating a Django application that records Twilio calls and transcribes them using AssemblyAI.

Grab your favorite beverage, and let's dive in!

Prerequisites

Before we start, make sure you have the following:

  • Python 3.7 or greater

  • Twilio account (sign up here)

  • AssemblyAI account (sign up here)

Setting Up the Project

Step 1: Create a Virtual Environment

First, let's create a virtual environment to keep our dependencies isolated:

virtualenv venv
venv/Scripts/activate

This command creates a new virtual environment named tensha and activates it.

Step 2: Install Dependencies

Next, we'll install Django, requests, and Twilio:

pip install django requests twilio

Step 3: Start a Django Project

Let's create a new Django project named tensha:

django-admin startproject djtranscribe
cd djtranscribe
python manage.py startapp caller

Now we have a basic Django project with an app named caller.

Project Configuration

Step 4: Update urls.py

Open tensha/urls.py and update it to include the caller app URLs:

from django.conf.urls import include
from django.contrib import admin
from django.urls import path

urlpatterns = [
    path('', include('caller.urls')),
    path('admin/', admin.site.urls),
]

Step 5: Update settings.py

Open tensha/settings.py and add the caller app to the INSTALLED_APPS list. Also, add environment variables for our project settings:

import os

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'caller',
]

BASE_URL = os.getenv("BASE_URL")
TWIML_INSTRUCTIONS_URL = f"{BASE_URL}/record/"
TWILIO_PHONE_NUMBER = os.getenv("TWILIO_PHONE_NUMBER")

Step 6: Create caller/urls.py

In the caller directory, create a file named urls.py and add the following routes:

from django.conf.urls import url
from . import views

urlpatterns = [
    url(r'dial/(\d+)/$', views.dial, name="dial"),
    url(r'record/$', views.record_twiml, name="record-twiml"),
    url(r'get-recording-url/([A-Za-z0-9]+)/$', views.get_recording_url, name='recording-url'),
]

Step 7: Update views.py

Now, let's add the view functions to handle dialing, recording, and retrieving the recording URL. Open caller/views.py and replace its content with:

from django.conf import settings
from django.http import HttpResponse
from django.views.decorators.csrf import csrf_exempt

from twilio.rest import Client
from twilio.twiml.voice_response import VoiceResponse

def dial(request, phone_number):
    twilio_client = Client()
    call = twilio_client.calls.create(
        to=f'+{phone_number}',
        from_=settings.TWILIO_PHONE_NUMBER,
        url=settings.TWIML_INSTRUCTIONS_URL,
    )
    return HttpResponse(f"dialing +{phone_number}. call SID is: {call.sid}")

@csrf_exempt
def record_twiml(request):
    response = VoiceResponse()
    response.say('Ahoy! Call recording starts now.')
    response.record()
    response.hangup()
    return HttpResponse(str(response), content_type='application/xml')

def get_recording_url(request, call_sid):
    twilio_client = Client()
    recording_urls = ""
    call = twilio_client.calls.get(call_sid)
    for r in call.recordings.list():
        recording_urls = "\n".join([recording_urls, f"https://api.twilio.com{r.uri}"])
    return HttpResponse(recording_urls, 200)

Setting Environment Variables

To keep sensitive information secure, we'll use environment variables. Set the following variables in your terminal:

On Linux

export TWILIO_ACCOUNT_SID=your_twilio_account_sid
export TWILIO_AUTH_TOKEN=your_twilio_auth_token
export TWILIO_PHONE_NUMBER=your_twilio_phone_number
export BASE_URL=your_ngrok_url
export ASSEMBLYAI_KEY=your_assemblyai_key
export RECORDING_URL=your_recording_url
export TRANSCRIPTION_ID=your_transcription_id

On Windows

set TWILIO_ACCOUNT_SID=your_twilio_account_sid
set TWILIO_AUTH_TOKEN=your_twilio_auth_token
set TWILIO_PHONE_NUMBER=your_twilio_phone_number
set BASE_URL=your_ngrok_url
set ASSEMBLYAI_KEY=your_assemblyai_key
set RECORDING_URL=your_recording_url
set TRANSCRIPTION_ID=your_transcription_id

Step 8: Start Ngrok

Ngrok creates a secure tunnel to your localhost, making it accessible over the web. Run Ngrok in a separate terminal window:

./ngrok http 8000

Copy the HTTPS URL provided by Ngrok and set it as your BASE_URL environment variable.

Step 9: Update ALLOWED_HOSTS

In tensha/settings.py, update ALLOWED_HOSTS to include your Ngrok URL:

ALLOWED_HOSTS = ['your_ngrok_subdomain.ngrok.io', '127.0.0.1', 'localhost']

Running the Project

Step 10: Run Django Development Server

Ensure Ngrok is running and your virtual environment is active. Then, start the Django server:

python manage.py runserver

Step 11: Dial a Number

Open your browser and navigate to:

http://localhost:8000/dial/<phone_number>

Replace <phone_number> with the number you want to call. Twilio will call this number, play a message, and start recording.

Step 12: Get Recording URL

After the call, navigate to:

http://localhost:8000/get-recording-url/<call_sid>

Replace <call_sid> with the SID from the call you just made. This URL will display the location of the recording.

Transcribing Recordings

Step 13: Create transcribe.py

Create a file named transcribe.py to send the recording to AssemblyAI for transcription:

import os
import requests

endpoint = "https://api.assemblyai.com/v2/transcript"

json = {
  "audio_url": os.getenv("RECORDING_URL")
}

headers = {
    "authorization": os.getenv("ASSEMBLYAI_KEY"),
    "content-type": "application/json"
}

response = requests.post(endpoint, json=json, headers=headers)
print(response.json())

Run the script to start the transcription process:

python transcribe.py

Step 14: Create print_transcription.py

Create another file named print_transcription.py to retrieve and display the transcription:

import os
import requests

endpoint = f"https://api.assemblyai.com/v2/transcript/{os.getenv('TRANSCRIPTION_ID')}"

headers = {
    "authorization": os.getenv("ASSEMBLYAI_KEY"),
}

response = requests.get(endpoint, headers=headers)
print(response.json())
print("\n\n")
print(response.json()['text'])

Run the script to get the transcription:

python print_transcription.py

Conclusion

Congratulations! You've built a Django application that can dial phone numbers, record calls using Twilio, and transcribe the recordings with AssemblyAI. You can now extend this project to fit your needs, whether it's for personal use, business, or an exciting new application idea. Happy coding! ๐ŸŽ‰

ย