Monday, 23 July 2018

Watson Asynchronous Speech to Text

Speech to Text (STT) transcription can take a long time. For this reason the Watson Speech to Text service offers an Asynchronous API where the caller doesn't need to wait around while transcription is happening. Instead the person requesting the transcription provides details of a callback server which should be notified when the transcription is complete.

This programming style is not too difficult when you get used to it, but there are a number of concepts to learn and one thing that slows down people is that you need a fully functional callback server before you can see any of the other components in action.

In order to help people get started I decided to write a very basic callback server which interacts with Watson STT service, It can help you understand the interaction between the various components and also can also serve as a starting point for a fully functional callback server.

My callback server is implemented in node.js and because it is small, all of the code is in a single file called app.js. Like all node,js programs, it starts with a list of the dependencies which we will use:

const express = require('express');
const crypto = require('crypto');
const bp = require("body-parser");
var jsonParser = bp.json()
const app = express();
const port = process.env.PORT || 3000;
Next we define a variable called secret_key. For security reasons, you probably don't want any random person to be able to send notifications to your callback server. Therefore the Watson STT asynchronous API allows you to specify a secret key that should be used to sign all requests from the Watson STT service to your callback server. A secret which is published in a blog post is not really a secret so you should edit this variable to some secret value which is unique to your deployment. If you don't want to use this security feature, just set the variable value to null.

// var secret_key = null
var secret_key = 'my_secret_key';
This callback server doesn't do much other than write messages in the log to help you understand the flow of messages to and from your callback server. Therefore the log_request() function does that key task for each request.


// record details of the request in the log (for debugging purposes)
function log_request (request) {
  console.log('verb='+request.method);;
  console.log('url='+ request.originalUrl);
  console.log("Query: "+JSON.stringify(request.query));
  console.log("Body: "+JSON.stringify(request.body));
  console.log("Headers: "+JSON.stringify(request.headers));
}
The only thing about this server which is moderately complex is the way it handles signatures. The following function checks whether or not the request contains a valid signature. If the secret_key variable is set to null then no checking is done. When the signature is not valid, it puts a message in the log responds telling you what the signature should have contained.This behaviour is intended to be helpful for developers debugging interactions, but you would probably want to turn it off for production systems because it would be helpful for hackers.

// check if the signature is valid
function check_signature(request, in_text) {

  // check the request has a signature if we are configured to expect one
  if (secret_key) {
    var this_signature = request.get('x-callback-signature');
    if (!this_signature) {
      console.log("No signature provided despite the fact that this server expects one");
      throw new Error("No signature provided despite the fact that this server expects one");
    } else {
      console.log("Signature: "+this_signature);
    }

    // Calculate what we thing the signature should be to make sure it matches
    var hmac = crypto.createHmac('sha1', secret_key);
    hmac.update(in_text);
    hmac.end();
    var hout = hmac.read();
    var expected_signature = hout.toString('base64');
    console.log("Expected signature: "+expected_signature);

    if (this_signature != expected_signature) {
      err_str = "Actual signature \""+this_signature+"\" does not match what we expected \""+expected_signature+"\"";
      console.log(err_str);
      throw new Error(err_str);
    }
  } 
}
The server needs to handle POST requests coming from the Watson STT server when the status of any transcription service changes. All we do is log the request for debugging purposes. If the signature matches the body of the POST, we give a status of 200 and respond with OK. Obviously a production server would be expected to do something more useful.

// Handle POST requests with STT job status notification
app.post('/results', jsonParser, (request, response) => {
  log_request (request);
  if (!request.body) {
    var err_text = 'Invalid POST request with no body';
    console.log(err_text);
    response.status(400);
    response.status(err_text);
  }
  check_signature(request, JSON.stringify(request.body));

  // for now just record the event in the log
  console.log('Event id:'+request.body.id+' event:'+request.body.event+' user_token:'+request.body.id);

  // The spec is not clear about what we should respond to just say OK
  response.type('text/plain');
  response.send("OK");
})
When registering your callback server, Watson STT issues a GET request with a random challenge_string to see if your server is up an running. If the signature on the request matches the content of the challenge_string then we simply echo back the challenge_string to let the Watson server know we are functioning OK. If the signature is wrong we issue an error response and the registration of the callback server will fail.

// Deal with the initial request checking if this is a valid STT callback URL
app.get('/results', (request, response) => {
  log_request (request);

  if (!request.query.challenge_string) {
    console.log("No challenge_string specified in GET request");
    throw new Error("No challenge_string specified in GET request");
  }

  check_signature(request, request.query.challenge_string);

  response.type('text/plain');
  response.send(request.query.challenge_string);
})
Finally the app starts listening for incoming requests:

app.listen(port, (err) => {
  if (err) {
    return console.log('something bad happened', err);
  }
  console.log(`server is listening on ${port}`);
})
I have an instance of this callback processor running at https://stt-async.eu-gb.mybluemix.net/results but it is not really any use to you since you won't be able to see the console log messages. You can also download the complete sample for code from GitHub and host it either in BlueMix or the hosting platform of your choice,

3 comments: