Web Speech API, Text to Speech using your browser

  • img
    Sajan Thomas
  • April 19,2020

Web Speech API is an experimental tech that enables you to incorporate voice into your apps. It has two parts, one is the Speech synthesis and speech recognition. The draft of this can be found in this link.

We will be looking into Speech synthesis. Isn’t it cool when your app can speak to your customer :-). So let’s start.

We will create a simple form to input some text and on button click, we will transfer the input through speech synthesis API, which will output it as voice.

We will be making a simple UI as follows. Web speech API also allows us to pick a voice from a predefined set of voices.

web speech API

Later in this, we will add an option to choose voice also. So the code for our index.html is as follows:

<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
    <title>WebSpeech API</title>
     <!--Import Google Icon Font-->
      <link href="https://fonts.googleapis.com/icon?family=Material+Icons" rel="stylesheet">
      <!--Import materialize.css-->
      <link type="text/css" rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/materialize/1.0.0/css/materialize.min.css"  media="screen,projection"/>

      <!--Let browser know website is optimized for mobile-->
      <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
    
  </head>
  <body>
    <h3 class="center-align">Text to Speech using WebSpeech API</h3>
    <div class="row">
        <div class="container">
            <div class="row">
                <div class="input-field col s12">
                    <input placeholder="Input text" id="tts_input" type="text" >
                    <label for="tts_input">Text Input</label>
                </div>
            </div>

            <div class="row center-align">
                <button class="btn waves-effect waves-light" id="tts_button" type="button" name="action">Do the Magic
                    <i class="material-icons right">send</i>
                </button>
            </div>
        </div>
    </div>

    <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/materialize/1.0.0/js/materialize.min.js"></script>
    <script src="/speech.js"></script>
  </body>
</html>

Now its time for the magic code:

$(function() {
  // on click of button we read the text from input field
  $(document).on("click", "#tts_button", function() {
    let text = $("#tts_input")
      .val()
      .trim();
    // just in the below 2 lines of code magic happens
    const speak = new SpeechSynthesisUtterance(text);
    speechSynthesis.speak(speak);
    $("#tts_input").val("");
  });
});

That’s it now; the browser will start to speak what you typed. Let’s add character to it using voice. We will be modifying the index.html and speech.js a bit.

So we will be adding a select option to our index.html file to choose the voice profile.

Then our final index.html is as follows:

<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
    <title>WebSpeech API</title>
     <!--Import Google Icon Font-->
      <link href="https://fonts.googleapis.com/icon?family=Material+Icons" rel="stylesheet">
      <!--Import materialize.css-->
      <link type="text/css" rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/materialize/1.0.0/css/materialize.min.css"  media="screen,projection"/>

      <!--Let browser know website is optimized for mobile-->
      <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
    
  </head>
  <body>
    <h3 class="center-align">Text to Speech using WebSpeech API</h3>
    <div class="row">
        <div class="container">
            <div class="row">
                <div class="input-field col s12">
                    <input placeholder="Input text" id="tts_input" type="text" >
                    <label for="tts_input">Text Input</label>
                </div>
            </div>

            <div class="row">
                <div class="input-field col s12">
                    <select id="tts_voice">
                  
                    </select>
                    <label>Choose voice</label>
                </div>
            <div>

            <div class="row center-align">
                <button class="btn waves-effect waves-light" id="tts_button" type="button" name="action">Do the Magic
                    <i class="material-icons right">send</i>
                </button>
            </div>
        </div>
    </div>

    <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/materialize/1.0.0/js/materialize.min.js"></script>
    <script src="/speech.js"></script>
  </body>
</html>

And we will be making some mods to speech.js. We will be adding a new function called populateVoiceList. So the speech.js becomes:

$(function() {
  // initialize select for materialize
  $("select").formSelect();

  function populateVoiceList() {
    let voices = window.speechSynthesis.getVoices();
    var voiceList = "";
    for (i = 0; i < voices.length; i++) {
      console.log(voices[i].name);
      voiceList += `<option value="${i}">${voices[i].name} (${voices[i].lang})</option>`;
    }
    $("#tts_voice").html("");
    $("#tts_voice").append(voiceList);
    // need to reinitialize select after dynamic changes in materialize :(
    $("select").formSelect();
  }

  // on click of button we read the text from input field
  $(document).on("click", "#tts_button", function() {
    let text = $("#tts_input")
      .val()
      .trim();
    // just in the below 2 lines of code magic happens
    const speak = new SpeechSynthesisUtterance(text);
    //add voice
    let voiceSelected = $("#tts_voice").val();
    speak.voice = window.speechSynthesis.getVoices()[voiceSelected];
    speechSynthesis.speak(speak);
    $("#tts_input").val("");
  });

  //second part lets add voices
  populateVoiceList();

  //why the below if
  /* 
  voiceschanged: Fired when the contents of the SpeechSynthesisVoiceList, that the getVoices method will return, have changed. Examples include: server-side synthesis where the list is determined asynchronously, or when client-side voices are installed/uninstalled.
  */
  //inshort the voice list is loaded asynchronously so once it is loaded it fires a onvoicechanged event
  if (
    typeof speechSynthesis !== "undefined" &&
    speechSynthesis.onvoiceschanged !== undefined
  ) {
    speechSynthesis.onvoiceschanged = populateVoiceList;
  }
});

You may notice that at the end after calling populateVoiceList() we are doing one more check. This is because voice profiles are loaded asynchronously and when it's loaded it fires an onvoicechanged event.

So when that is fired we called the populateVoiceList again(first time it may not have added the voice list to select as its async call).

Final view of our app will be like this:

web speech API Demo

So that’s it guys, now your browser will start speaking.

For more info please visit https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesisVoice

This is the link to the gist https://gist.github.com/sajanthomas01

Thanks for reading :-)

Subscribe to newsletter
Need more tech news? Tune in to our weekly newsletter to get the latest updates