ACF Library

ProduceSoundFromText

Back to List

Description: Use OpenAI TextToSpeach API to produce a sound file from a text

FileMaker Prototype:

Set Variable [$res; ACF_Run("ProduceSoundFromText";)]

Category: OPENAI

NOTE: This function serves as a reference template. To use it effectively, you may need to adjust certain parts of the code, such as field names, database structures, and specific data lists to fit the requirements of your system. This customization allows the function to work seamlessly within your environment.

The ProduceSoundFromText Function

The ProduceSoundFromText function uses OpenAI's Text-to-Speech API to generate a sound file from a given manuscript. It retrieves the manuscript and other settings from a FileMaker table called TranscriptParts. The output directory is configured in the related table TranscriptProject.

Example Usage

Set Variable [$res; ACF_Run("ProduceSoundFromText")]

Each record in the TranscriptParts table includes a field called Sorting, which represents the part number in a series. The voice used for the audio output, such as "Echo," is specified in a global field on the corresponding SoundPart record.

The generated audio file is saved in the output directory and named using the pattern Sound_part_<Sorting>_<Voice>.aac. For example:

Sound_part_23_echo.aac

Function source:

function ProduceSoundFromText ( )
   
   string Text2Convert = TranscriptParts::PartResultText; 
   string voice = TranscriptParts::SpeachVoice; 
   JSON quest ; 
   
   quest = JSON (
      "model", "tts-1-hd", 
      "input", Text2Convert, 
      "voice", voice,
      "response_format", "aac"); 
   
   // Send POST request
   string APIkey = ExecuteSQL ("SELECT AI_APIKey FROM Preferences"); 
   // Set headers
   
   string headers = "Content-Type: application/json\nAuthorization: Bearer "+APIkey;
   string respons = HTTP_POST ( "https://api.openai.com/v1/audio/speech", string (quest), headers);
   
   // Handle response - STATUS CODES come in next plugin version. 
   // if ( HTTP_STATUS_CODE != 200 ) then
   //    throw "API request failed. HTTP-Status=" + HTTP_STATUS_CODE;
   // end if
   if (respons == "") then 
       throw "API request failed. Empty result";
   end if
   
   string AudioFile = format("%s/Sound_part_%d_%s.aac", TranscriptProject::ProjectDir, int(TranscriptParts::Sorting), voice);
   // Save or process the returned audio file
   int handle = open(AudioFile, "w");
   write(handle, respons);
   close(handle);
   string PrimKey = TranscriptParts::PrimaryKey; 
   string res = ExecuteSQL ( "UPDATE TranscriptParts SET AudioFileName = :AudioFile WHERE PrimaryKey = :PrimKey");
   return "OK"; 
   
end

A little Fun-Fact:

It seems like chatGPT has learned the ACF-language syntax, and mostly produced this function for me, with only minor edits.

Note on AAC Format

The AAC format works well in most applications. However, when imported into ScreenFlow, the audio file appeared slightly shorter than its actual length. Opening the file in other sound editors displayed the correct duration, confirming the file was complete. Finder also displayed the incorrect, shortened duration.

To resolve this, I used ffmpeg to convert the file to M4A format, which worked correctly in ScreenFlow and showed the accurate length. Below is the script used for the conversion:

Script to Convert AAC to M4A

#!/bin/bash

# Define input and output directories
input_dir="AudioFiles"
output_dir="$input_dir/m4a"

# Create the output directory if it doesn't exist
mkdir -p "$output_dir"

# Loop through all AAC files in the input directory
for file in "$input_dir"/Sound_part_*_echo.aac; do
    # Extract the base filename without extension
    base_name=$(basename "$file" .aac)

    # Define the output file path
    output_file="$output_dir/${base_name}.m4a"

    # Convert the AAC file to M4A
    ffmpeg -i "$file" -ar 48000 -ac 2 "$output_file"

    # Check if conversion was successful
    if [ $? -eq 0 ]; then
        echo "Successfully converted: $file -> $output_file"
    else
        echo "Failed to convert: $file"
    fi
done

Using Whisper-1 to Convert Audio to Text

To convert an audio track into a transcript, I used the following Python script:

from openai import OpenAI
import os

# Initialize the OpenAI client
client = OpenAI(api_key='YourAPikey')

# Path to your audio file
audio_file_path = '/Users/ole/Documents/video-audio/video-1-4.mp3'

# Open the audio file
with open(audio_file_path, 'rb') as audio_file:
    # Request transcription with timestamps
    response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        language="en",
        response_format="verbose_json"  # Returns detailed segments with timestamps
    )

# Extract the transcript text
transcript = response.text

# Create a new filename with the .txt extension
output_file_path = os.path.splitext(audio_file_path)[0] + '.txt'

# Save the transcript to the text file
with open(output_file_path, 'w') as output_file:
    output_file.write(transcript)

print(f"Transcript saved to {output_file_path}")
Back to List