How to transcribe audio files for free

If you want to transcribe audio files for free, I’ll share the secret below. Repurposing content to different formats can increase the number of people who consume your information. While video is becoming more and more popular online, there are still people who prefer to read than watch video.

The information below is lifted from my book Shoestring Hustle and will walk you through all the steps you need to take to convert an audio file to a text transcription. If you need to convert a video to audio before you start, you can download a copy of Audacity for free. To open a video in Audacity, you’ll also need to install FFmpeg – it’s free too.

Assuming you’ve got an audio file to transcribe, you follow the steps below.

Incidentally, this is quite an involved process, though once the initial setup is complete, each transcription is more straightforward. There’s a form at the end of the article. If you’d be interested in a tool that automates this process for you, enter your email in the form and send it to me. If there’s enough interest, I’ll see what I can do.


There’s a surprising number of audio transcription services available that let you upload a file that is then automatically transcribed.

The cost currently seems to vary from about 10 cents to a dollar a minute, though you can pay more for human intervention in the transcription process.

The price could soon mount up using any of those services, so let’s show you how you can get 60 minutes of audio transcription a month for free and then pay just 0.004 cents for every 15 seconds or, to put it another way, 96 cents for an hour of transcription.

The quality of the transcriptions will vary depending on the quality of the original audio file. You should expect to need to do some manual editing of the transcribed text.

We’re going to use Google Cloud’s Speech-to-Text API for this. The prices and the free tier I’ve quoted are correct at the time of writing. You will need a credit or debit card to make use of this service.

Part of the reason for the low cost may be that, unlike most of the competitors, this doesn’t come with an attractive user interface. To make this work, we’re going to make you look like a hacker in a Hollywood movie.

Well, kind of.

First up, if you don’t already have a Google account, you’ll need to register for free at https://myaccount.google.com/intro.

Configure your Google Cloud settings

Next go to https://cloud.google.com/ and click on one of the Get started buttons.

Then complete the sign up process. If you already have a credit card registered with Google, that will be used. As long as nothing has changed since I wrote this, you’re signing up for a free trial and your card won’t be charged until the trial ends, and only then if you turn on automatic billing. At the time of writing, that free trial gives you $300 of credit that lasts for 12 months, so you could get a lot transcriptions in that year without paying anything.

After you complete the sign up, you should find yourself in the Cloud dashboard. A new project named My First Project should also have been created for you and as that is already active, we’ll carry on using that.

Scroll down the page to the Other popular services section and in the APIs and ML column, click on Cloud Speech-to-Text. If you can’t see that link, click the View All button and you should find the link for Cloud Speech-to-Text that way.

On the next page, click the Enable button.

Next, click the Credentials entry in the left hand navigation column. On the next screen, click Create Credentials and then Service Account.

Complete the short form (the Service Account ID is automatically generated from the name you enter) and click the Create button.

Click on the Select a role field and in the menu that opens, mouseover Project and click on Owner. Then click the Continue button.

On the next screen, click the Create Key button and when the panel slides open, ensure Key type is set to JSON and click the Create button. A download modal window will open and you should save the JSON file to your computer.

Make sure you remember where you save it. If you ever lose this file, you will have to create a service account following the same steps we just ran through.

Finally, click the Done button and that’s us finished with setting things up on Google Cloud.

Install the Cloud SDK

For the next step, you need to know the path to your JSON key file. You’ll then use this to install the Cloud SDK. How you do this varies depending on your operating system. I’ll explain how to setup Windows first and then MacOS after.

Windows

In Windows Explorer, make sure you’re in the folder that you saved the key file into. Click the small down arrow at the right of the address bar, then highlight the text in the address bar, right-click and Copy the text.

Paste that into a new text file. Now right-click the file and select Rename. That should make the text selectable. Make sure all the text is selected, including the .json extension, and right-click and click Copy.

In your text file, add a backslash (\) immediately after the text you pasted before and then paste the file name you just copied. You should now have the full path to your file. Here’s mine: C:\Users\e_4_i\Documents\Shoestring-Hustle\transcription-key\nodal-thunder-264510-cfaef22cfedf.json

In your text file on a new line, enter the following text, replacing [PATH] with the path you saved in your text file.

set GOOGLE_APPLICATION_CREDENTIALS=[PATH]

My command looks like the following: set GOOGLE_APPLICATION_CREDENTIALS=C:\Users\e_4_i\Documents\Shoestring-Hustle\transcription-key\nodal-thunder-264510-cfaef22cfedf.json

Delete the first line that was just the path so your text file just contains the command for setting the credentials and save this file. You will need to run this command every time you open Command Prompt before you transcribe any audio.

In the search field in the bottom bar (on older versions of Windows, click the Start menu to show the search field), type cmd and in the results, click Command Prompt to open the app.

When it opens, copy your command from the text file, paste it into Command Prompt by right-clicking on the app and press the return key on your keyboard.

Next go to https://cloud.google.com/sdk/docs/#windows and click the link to download the Cloud SDK Installer.

When the file has downloaded, run the installer and complete all the install steps. It may take a few minutes for the installer to set everything up.

When complete, ensure the Start Google Cloud SDK Shell and Run ‘gcloud init’ to configure the Cloud SDK are both checked. I also checked the Create Start Menu shortcut. Then click the Finish button.

This will open a new Command Prompt window. You’ll see several lines of text written to the window and then you’ll see a line asking if you want to log in. Type y and press the return key.

Your web browser will then open and you will need to select your Google account and allow the Google Cloud SDK to access your account. When you’re done, return to Command Prompt.

You should see you’re now logged in and your projects are listed. Your list should only contain one project, the one you created earlier. If you have created projects previously, you can find the project name by looking at the name of the key file you downloaded.

Type the number of the project and press the return key. My project was nodal-thunder-264510, so I typed 8.

You should now see several more lines of messages displayed, similar to the image below. You’ve now configured the tools for creating transcriptions.

MacOS

These steps are largely the same for Linux too.

In Finder, navigate to the key file you downloaded. Right-click the JSON key file and click Get Info. In the General section of the info window that opens, triple click the text next to the Where label to select all the text and then right-click and select Copy. Note that you must copy all of the text or the following steps won’t work.

Paste the path you just copied into a new text file so you still have access to it in case the pasteboard is overwritten. Now append a backslash (/) and the name of the file.

Mine looks like: /Users/ianpullen/Documents/nodal-thunder-264510-cfaef22cfedf.json

On a new line, enter export GOOGLE_APPLICATION_CREDENTIALS=”[PATH]”, but substitute your path for [PATH].

So mine looks like: export GOOGLE_APPLICATION_CREDENTIALS=”/Users/ianpullen/Documents/nodal-thunder-264510-cfaef22cfedf.json”

Delete the first line that was just the path so your text file just contains the command for setting the credentials and save this file. You will need to run this command every time you open Terminal before you transcribe any audio.

Now go to the Applications directory, open the Utilities directory and launch Terminal. Copy the command you just created in the text file, return to Terminal and click the Cmd and V keys simultaneously to paste the command, then press the return key.

Now go to https://cloud.google.com/sdk/docs/#mac and download the SDK package. For convenience, download it to the same location as your key file.

You will almost certainly want to download the 64-bit version. That is definitely the case if your Mac has OS X 10.7 or later installed or any version of MacOS. If your Mac is running an older OS X, go to the Apple menu and select About this Mac. If you have an Intel Core Solo or Intel Core Duo processor, your Mac is 32-bit. Any other processor, including Intel Core 2 Duo, means your Mac is 64-bit.

After downloading the file, double-click it to expand it. You should now have a directory called google-cloud-sdk. Right-click the directory and click Get Info. In the General section of the info window that opens, triple click the text next to the Where label to select all the text and then right-click and select Copy.

Paste that text into a text file on a new line and append /bin/gcloud init to create the command you will need to run next.

Mine looks like: /Users/ianpullen/Documents/google-cloud-sdk/bin/gcloud init

Copy the command you just created, go to Terminal and click the Cmd and V keys simultaneously to paste the command, then press the return key.

After a few moments, you’ll see the message You must log in to continue. Type y and press the return key.

You will now have to sign into your Google account in your web browser and allow the Google Cloud SDK to access your account. When you are signed in, come back to Terminal and select your project. You may only have one to choose from. In my case I selected project eight by entering 8 and pressing the return key.

After a few moments more, you should the setup process is complete.

To make it easier to use on an ongoing basis, there’s a few more steps. Inside the google-cloud-sdk folder is a file called install.sh. As before, right-click and Get Info and copy the path to the file. Now make a copy of the line below replacing [PATH] with the path you copied.

[PATH]/install.sh

Mine looks like: /Users/ianpullen/Documents/google-cloud-sdk/install.sh

Paste that into terminal and press the return key. After a few moments, you should see a message asking if you want to help improve the Google Cloud SDK. Type y or n to allow or disallow and press return. Typing y to agree will allow Google to receive usage information from the app.

After a few more moments, you should see a message asking if you want to update your $PATH and enable shell command completion. Type y and press return.

The next prompt will ask you to enter a path to an .rc file. Just press return to let it use the default.

To start using Google Cloud, close this Terminal window and open a new one by pressing the Command and N keys simultaneously.

Upload audio to Google Cloud

We’ll upload the audio file to Google’s Cloud storage first and then send it for transcription from there. You should be able to use a file stored on your computer, but I repeatedly had problems trying to transcribe an audio file stored on Windows.

At the time of writing there is 5GB of free storage provided to all users, plus data transfer, so unless you’re transcribing a lot of audio, there should be no costs attached to using this storage. Even if you have to pay for the storage, the rates are very low and you can delete the audio file as soon as the transcription is complete, so it’s only stored for a short time.

Before you upload, your audio file should be in FLAC format. You will trigger an error if you try to transcribe an MP3. If you need to convert your file, open it in Audacity and then click on the File menu, mouseover Export and click Export Audio. In the save dialog that opens, set the Save as type control to FLAC Files and click Save.

Go to https://console.cloud.google.com/storage/browser. It should automatically select your project, but confirm that in the top bar. Click the Create Bucket link.

Give a unique name to your bucket using just letters, numbers, underscores, dashes and dots. The name must begin and end with a letter or number. I suggest using your business name in this to reduce the chance of you selecting something already in use by someone else. Click the Continue button when your name is entered.

Set the Location type to Region, set the Location dropdown to us-east1, use-central1 or us-west1. These settings should give the lowest costs. Click the Continue button.

Set the Storage Class to Standard and click the Continue button.

Set Access control to Fine-grained and click the Continue button.

Leave the Advanced settings unchanged and click the Create button.

On the new screen, click the Upload files button and upload the audio file you want to transcribe. You can also drop the file onto this screen to start uploading it.

When the file is uploaded, click the file name to open the Object details screen.

Click the little copy icon at the right hand side of the URI field to copy the location of the file to your pasteboard. You’ll need this in the next step.

Create an audio transcription

We’re now ready to use the transcription service.

You’re going to need to construct one of two text commands that you will run in either Command Prompt or Terminal that will upload your audio file to Google’s Cloud and instruct it to transcribe it.

The length of the audio will influence the command, with short audio files of a minute or less needing a different command to files longer than that.

The command you write will be different on Windows and Mac, so we’ll look at both operating systems separately again.

Select the code snippet that matches your operating system and audio file length.

In each case, replace [PATH] with the path to your key file location. That command is the same command that you used before you installed the Cloud SDK. If you’ve already entered this line in your current session, it isn’t necessary, but it does no harm running this again.

[YOUR_AUDIO_FILE_URI] should be replaced with the URI you copied from Google Cloud.

You should also change the languageCode to match your accent. For example, in my command, I change en-US to en-GB. You can see all the available codes at https://cloud.google.com/speech-to-text/docs/languages.

Paste your edited code snippet into Command Prompt or Terminal. In Command Prompt, you’ll need to wait for a few moments while the first line of code runs. When all the code is visible, press the return key to set it running. In Terminal, paste the code and you can press the return key straight away.

Short audio files

Windows

set GOOGLE_APPLICATION_CREDENTIALS=[PATH]
gcloud auth print-access-token > temp.txt
set /p shtoken=<temp.txt
curl -s -H "Content-Type: application/json" ^
    -H "Authorization: Bearer "%shtoken% ^
    https://speech.googleapis.com/v1/speech:recognize ^
--data "{ 'config': { 'encoding':'FLAC', 'languageCode': 'en-US', 'enableAutomaticPunctuation': true }, 'audio': { 'uri':'[YOUR_AUDIO_FILE_URI]' } }"

MacOS

export GOOGLE_APPLICATION_CREDENTIALS="[PATH]"
curl -s -H "Content-Type: application/json" \
    -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
    https://speech.googleapis.com/v1/speech:recognize \
--data "{ 'config': { 'encoding':'FLAC', 'languageCode': 'en-US', 'enableAutomaticPunctuation': true }, 'audio': { 'uri':'[YOUR_AUDIO_FILE_URI]' } }"

The snippet for short audio files works in real time. By that I mean, the code uploads the audio file and then waits for the response to be sent back. That response is then displayed in Command Prompt or Terminal and the code ends.

You can copy the transcribed text from the transcript value of the results.

Long audio files

Windows

set GOOGLE_APPLICATION_CREDENTIALS=[PATH]
gcloud auth print-access-token > temp.txt
set /p shtoken=<temp.txt
curl -s -H "Content-Type: application/json" ^
 -H "Authorization: Bearer "%shtoken% ^
 https://speech.googleapis.com/v1/speech:longrunningrecognize ^
--data "{ 'config': { 'encoding':'FLAC', 'languageCode': 'en-US', 'enableAutomaticPunctuation': true }, 'audio': { 'uri':'[YOUR_AUDIO_FILE_URI]' } }"

MacOS

export GOOGLE_APPLICATION_CREDENTIALS="[PATH]"
curl -s -H "Content-Type: application/json" \
    -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
    https://speech.googleapis.com/v1/speech:longrunningrecognize \
--data "{ 'config': { 'encoding':'FLAC', 'languageCode': 'en-US', 'enableAutomaticPunctuation': true }, 'audio': { 'uri':'[YOUR_AUDIO_FILE_URI]' } }"

The code snippet for long audio files works differently. Rather than the results being returned automatically, a name for the transcription job is returned instead.

You can then use this name to manually fetch the transcribed text. Depending on the length of your audio file, you may need to wait for some time for it to finish.

The following are the commands you need to use to fetch your transcription. In each case, replace [YOUR_JOB_NAME] with the name value returned, not including the quotes. So the name value from the screenshot is 5523136167127052080.

Note that if you keep the same Command Prompt or Terminal window open, the first three lines of the Windows code and the first line of the MacOS code aren’t necessary, but it does no harm to include them.

Windows

set GOOGLE_APPLICATION_CREDENTIALS=[PATH]
gcloud auth print-access-token > temp.txt
set /p shtoken=<temp.txt
curl -H "Authorization: Bearer "%shtoken% ^
     -H "Content-Type: application/json; charset=utf-8" ^
     "https://speech.googleapis.com/v1/operations/[YOUR_JOB_NAME]"

MacOS

export GOOGLE_APPLICATION_CREDENTIALS="[PATH]"
curl -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
     -H "Content-Type: application/json; charset=utf-8" \
     "https://speech.googleapis.com/v1/operations/[YOUR_JOB_NAME]"

If the transcription is complete, all the transcribed text will be displayed. This may take a lot of space if you uploaded a long file. Note that the text is broken down into alternatives. You can start with the first one and copy your transcribed text from each and paste it into a text file to create your full transcript. However, next I’ll show you how to use a free tool I’ve created on Shoestring Hustle that will automate this process for you.

In the event the transcription isn’t complete, look in the metadata values for progressPercent. This will give you an approximate value for how much of the job has been processed. For example if your job was running for 10 minutes and the progressPercent value was 51, you’d want to wait for another 10 minutes or so before running your command again. By that time, assuming it maintains the same speed, the job should be complete.

Turn the results into usable text

If your audio file was anything but a very short file, your transcription will have been returned broken down into alternatives. For a long file, copying and pasting each one individually into a new text file will quickly become a pain.

To save you from that hassle, I’ve made a tool you can use that will turn the results into a text file.

First, you need to copy all of the text that was returned. That starts with the first opening curly bracket ({) all the way down to the last closing curly bracket (}). The easiest way to do this is to click the first bracket and it will be highlighted. Then scroll down to the last bracket, hold the Shift key down and click just after the last bracket.

You should see all of the text is highlighted. On Windows, press the Ctrl and C keys simultaneously to copy the text. On Mac, press the Cmd and C keys to copy.

Now go to https://shoestringhustle.com/tbp/google-transcription-tool/ and paste the text you copied into the text box. On Windows, press the Ctrl and V keys to paste, and on Mac, press the Cmd and V keys.

Click the Get text file button and after a few moments, a download dialog will open and you can save your text file to your computer.

Enable data logging

The Speech to Text API comes with two levels of pricing. Of course, you needn’t worry about this if you’re using it for less than an hour of transcription a month or have credit towards cloud services.

It probably still makes sense to set your account to use the lowest pricing now rather than risk forgetting later. Note that to get the lower pricing, you need to allow Google to log your data use to help them improve their service. I have no problem with that, but be aware that some Google employees will have access to your audio files after uploading. If that’s an issue for you, do not complete this step. The pricing that applies will still be very competitive.

Go to https://console.cloud.google.com/apis/api/speech.googleapis.com and click Data Logging in the side column.

On the new screen, click the Enable Data Logging button to turn the feature on. Note that this is set on a project by project basis, so if you have other projects, this setting change won’t affect them.


I know that’s quite a long post and there’s a bit to do, but it does offer significant savings over some other services. For example, Rev costs $75 to transcribe an hour of audio and this method will cost you nothing for the same amount of audio. Of course, there may be other considerations, such as accuracy, but if you’re on a very limited budget, this should look very attractive.

If you’d be interested in a tool that automates this process for you, enter your email below and send the form. If there’s some interest, I’ll take a look into it and let you know if I create something.

Leave a Reply