Text2Speech in Golang sdk provided in IBM Watson.
Today we discuss how we achieve Text2Speech functionality in Golang SDK provided by IBM Watson. In this post we use Text2Speech service provided by IBM Watson.

Prerequisite is having a IBM Watson Cloud login, which is free. Once you have a IBM watson userid and password, you can create a Text2Speech Service instance under the Lite version(Free Version).
How we achieve this functionality is explained in 4 simple steps.
1) IamAuthentication (Authenticating Watson Text2Speech service)
2) Synthesize method provided by Text2Speech function which Synthesizes text to audio
3) Handles error by raising panic if encountered with exceptions.
4) Read the writing stream and save the audio with file name and extension.
So what do we get in Watson Text2Speech service.
IBM Watson Text to Speech service provides APIs that use IBM’s speech-synthesis capabilities to synthesize text into natural-sounding speech in a variety of languages, dialects, and voices. The service supports at least one male or female voice, sometimes both, for each language. The audio is streamed back to the client with minimal delay. Watson provides Golang sdk in which we code to achieve this.
Starting with First Step[Authentication]:
IAM Authentication:
Enterprise customers brought up the need to assign access control for individual instances to different teams. Taking into account this user feedback, the idea of Identity and Access Management (IAM) for IBM Watson Services was developed. IamAuthenticator provides ApiKey and URL for authetication. When ever we activate a Text2speech service in Watson, every service will be having a specific APIKEY and URL.
You authenticate to the API by using IBM Cloud Identity and Access Management (IAM). You can pass either a bearer token in an authorization header or an API key. Tokens support authenticated requests without embedding service credentials in every call. API keys use basic authentication.
For IBM Cloud instances, the SDK provides initialization methods for each form of authentication.
1) Use the API key to have the SDK manage the life-cycle of the access token. The SDK requests an access token, ensures that the access token is valid, and refreshes it if necessary.
2) Use the access token to manage the life-cycle yourself. You must periodically refresh the token.
Below two lines passes the Apikey and URL to the program:
1) authenticator := &core.IamAuthenticator{ApiKey: “@@@@@”,
2) textToSpeech.SetServiceURL(“https://api.eu-gb.text-to-speech.watson.cloud.ibm.com/#########")
Below seen is the end-to-end code
package mainimport ("bytes"
"fmt"
"os"
"github.com/IBM/go-sdk-core/core"
"github.com/watson-developer-cloud/go-sdk/texttospeechv1")func main() {authenticator := &core.IamAuthenticator{ApiKey: "tF#######################ZD@@",}options := &texttospeechv1.TextToSpeechV1Options{Authenticator: authenticator,}textToSpeech,textToSpeechErr:= texttospeechv1.NewTextToSpeechV1(options)
if textToSpeechErr != nil {
panic(textToSpeechErr)
}textToSpeech.SetServiceURL("https://api.eu-gb.text-to-speech.watson.cloud.ibm.com##############")result, response, responseErr := textToSpeech.Synthesize(&texttospeechv1.SynthesizeOptions{Text: core.StringPtr("Welcome to the world of Go"),
Accept: core.StringPtr("audio/wav"),
Voice: core.StringPtr(texttospeechv1.SynthesizeOptionsVoiceEnGbJamesv3voiceConst),
},
)fmt.Println(response)
if responseErr != nil {
panic(responseErr)
}if result != nil {
buff := new(bytes.Buffer)
buff.ReadFrom(result)
file, _ := os.Create("hello_world.wav")
file.Write(buff.Bytes())
file.Close()
}
}
Off to the 2nd Step: [synthesize the written contents to Voice]
Synthesizes method Synthesizes text to audio that is spoken in the specified voice. The service bases its understanding of the language for the input text on the specified voice. Use a voice that matches the language of the input text.
The method accepts a maximum of 5 KB of input text in the body of the request, and 8 KB for the URL and headers. The 5 KB limit includes any SSML tags that you specify. The service returns the synthesized audio stream as an array of bytes.
There are various audio formats allowed by Watson. They are as seen below:
1) audio/flac
2) audio/mp3
3) audio/mpeg
4) audio/ogg
5) audio/wav
6) audio/webm and many more.
textToSpeech.Synthesize
// Instantiate a service related to texttospeech for v1 vserion in watson.
texttospeechv1.NewTextToSpeechV1(options)
This accepts 3 parameters with Text as input string, its format as audio/wav and lastly voice of a preferred person out of the available list. Here in this post we use Great Britain accent of Mr. James.
- Text: (“Welcome to the world of Go”),
- Accept: (“audio/wav”)
- Voice: (texttospeechv1.SynthesizeOptionsVoiceEnGbJamesv3voiceConst)
In third step, then we move on to handle exceptions.
textToSpeech, textToSpeechErr := texttospeechv1.NewTextToSpeechV1(options)
We check for any errors available.
if textToSpeechErr != nil {
panic(textToSpeechErr)
}
In the 4th step, we save the result on to the disk with file name and audio format.
buff.ReadFrom(result)
file, _ := os.Create(“hello_world1.wav”)
There are various packages used in this program. They are as seen below.
bytes : A byte buffer is used to improve performance when writing a stream of data
fmt : Package for formating and printing statements.
os: OS package provides here the facility to create a file.
Below two packages holds all the relevant texttospeech functionality.
github.com/IBM/go-sdk-core/core
github.com/watson-developer-cloud/go-sdk/texttospeechv1
In this post we wrote a Golang code which uses the Text2Speech service provide by IBM Watson. We authenticated the Watson service and then we called the synthesize method which synthesized the text format to audio format. In the later part we saw how we handle the exceptions and the once everything is fine we proceed to save it on disk with a file name and format.
I sincerely hope you have gained from this post on Golang Text2Speech.