The Voice-Controlled, Face Recognizing, Drone Journey – Part 4

Total Shares

Introduction to the Drone Journey

Cognitive Services

This post is the fourth post in documenting the steps I went through on my journey to build an autonomous, voice-controlled, face recognizing drone. There are 4 other posts building up to this one which you can find at the end of this post.

Focus of this post

In this post I am going to pick-up where we left off and cover how you can:

  • Install the packages you need to access Microsoft Cognitive Services from node.js
  • Understand some of the capabilities of Microsoft Cognitive Services
  • Start to see how the face API can be used from node.js
  • Understand how to use the ImageMagick library (thanks to the tip from Lukas) to annotate the images that come back with the faces in node.js

Adding Intelligence To The Drone

In this part of the project we start to add some intelligence to our drone. We will do that using Microsoft Cognitive Services. There is a great introduction in this YouTube video which I encourage you to take a look at.

For this demo the primary APIs we will use are the Face API and the Speech APIs so I will focus there over the next few posts.

Installing the project oxford node.js package

The node package we are going to use is the project oxford library you will find here. The project oxford name relates to the name the services had before they were renamed to Microsoft Cognitive Services.

To install you need to run the command as shown below from the c:\Development\Drone directory.

npm install project-oxford

If all goes well you will get the following report back (click to enlarge).

Install Project Oxford

Getting an API Key

To use the services you need an API key. This means you need to sign up to use the services. The best place to do that is this page.  For what we are doing the Try for Free options should offer enough runway. If you do a lot you might need to upgrade to a paid version but you should not need this for our demo.

You will need to sign in or create an account. If you are new there will be security verification steps to go through and you will need to also verify your email address.

Once all that is done you need to simply click on the big + you see on the page which you can see below.

Next you need to select the three services we will use in this project. Essentially we will use Speech, Face and Speaker Recognition.

Step 2

Then scroll to the bottom of the page and accept the terms and conditions (after reading them of course). Then you can hit subscribe.

Step 3

Note that if your account is not verified nothing will happen now. If that is the case look for how to verify your account. If all is good on verification you will then be presented with the following screen

Step 4

This is the screen where you can get your specific API Keys from. We will only use Key 1 for our purposes so you can quickly copy them somewhere so you have them handy.  For the rest of this project the api key you pass to the object we create determines which services you have access to. Without the correct API key nothing will work.

Understanding the capabilities of Microsoft Cognitive Services

We could spend a lot of time looking at the various Microsoft Cognitive Services. They are extremely powerful covering a variety of domains. Many of these services were harnessed in the video I have shared below.

Microsoft Cognitive Services are essentially designed to allow you to build apps with powerful algorithms using not a lot of code.  There are many APIs designed to help you develop smarter apps that interact more naturally with your users.

Microsoft Cognitive Services Overview

This project is going to make use of the FACE API to allow us to detect peoples faces in an image and then to identify who they are. We are also going to make use of the Bing Speech API to allow us to synthesize voice from a text input and lastly we will use the Speech Recognition API to transform spoken information into Text that we can use to drive some actions.

As you can see these are just three of the APIs on offer. Even within those three APIs we will just scratch the surface on what is possible. Once you get going your imagination is the only real hurdle.

Learning how the APIs work is pretty simple at one level. You can review the documentation and you are good to go. On top of that you can also use an API testing console in most cases to send your input and see the output from web forms.

I found this to be useful when trying to work out if your code is wrong or if your JSON input to the APIs is wrong. I spent a lot of time on these screens but hopefully you will not need to succeed with this project.

Getting to know the Face API via the node.js package

To be honest the basics of using the Face API from node.js are really well documented on the relevant GitHub page. In short you need to first include the package in your code and then define an object passing it the right API key.

var oxford = require('project-oxford');
var apikey = "insert_your_key_here";
var client = new oxford.Client(apikey);

Once you have done that it is a simple matter of calling the detect function to establish the age and gender of a passed image.  If you stopped your DroneWebServer with an image of a person within it then that makes a fine test subject. If not you have to edit the path below to point to an image you want to analyze. You can also replace path: with url: and then include an online image URL. Be careful if you do that though as my experience was that certain images seemed not to work although that was probably a setup issue.

client.face.detect({
    path: 'public/DroneImage.png',
    analyzesAge: true,
    analyzesGender: true
}).then(function (response) {
    console.log('The age is: ' + response[0].faceAttributes.age);
    console.log('The gender is: ' + response[0].faceAttributes.gender);
});

The options analyzesAge and AnalyzesGender being set to true basically tell the detect function what information you want to retrieve which in this case are the guessed age and gender of the person in the image.

For now you can save all that code in a file called FaceTest.js which you can invoke with the command “node FaceTest.js“. One thing you will notice is that in the code I have shared you need to reference faceAttributes in the response not just attributes. There has obviously been some internal changes since the samples were created.

This stumped me for a while until I put a “console.log(response);” after the then clause which reported exactly what I needed to look for.  Once that was done and my code was fixed the output on my DroneImage.png file in my public directory looked as shown below.

Face Detect 1

The other thing I ran into was sometimes I would forget to stop the DroneWebServer and my image would have no faces. If that was the case you would get an error as the code above is not too sophisticated.

Lastly – if your image has two faces in it they will both be returned but only the first one detected will have the specific age and gender printed explicitly since we only look at the first one returned in our code above. For now I would advise to just stick to one face.

Drawing the box around the first face found

You have all seen it. The fact that when you look at an image the face is detected and a box is drawn around it. I wanted to do the same. I looked around at the APIs and while I could see that face.analyzeEmotion supplied that capability face.detect did not.  I did not want to analyze the emotion so I needed to do something else.

Luckily there was another hint in the blog from Lukas where he pointed to the imagemagick utility. It took a bit of searching but eventually I found the node package that supported that called gm. The first thing I did is install that package.

npm install gm

I then had to go and install the latest version of the GraphicsMagick software which was on sourceforge here. If you look at the top of that page you will see the text “Looking for the latest version?” followed by a link to the install executable you can click on. You need to click that link, save it to your disk and then double click on the resulting executable to install it taking all the defaults.

VERY IMPORTANT: At this stage you need to close your command window you are using to launch your node.js scripts and re-open it. The installation of the software updates the path and if you run without reopening the path is not available in that window. That will result in an error!

Now we can edit our FaceTest.js code. First step is to include the gm package we just installed and the filesystem package as we will be writing a file.

var gm = require('gm');
var fs = require('fs');

Once they are there we will use another part of the response (which we previously ignored) called faceRectangle to obtain the location of the face. We will then use that to annotate our original image using the gm package and finally we will write out the file.  For the sake of simplicity I am including the whole of the face.detect code that does this below.  So we do not overwrite the original I am giving the output image a slightly different name.  Be careful as a few lines wrap in the code.

client.face.detect({
    path: 'public/DroneImage.png',
    analyzesAge: true,
    analyzesGender: true,
    returnFaceId : true
}).then(function (response) {
    var topy = response[0].faceRectangle.top;
    var topx = response[0].faceRectangle.left;
    var bottomx = response[0].faceRectangle.left + response[0].faceRectangle.width;
    var bottomy = response[0].faceRectangle.top + response[0].faceRectangle.height;
    var textx = topx ;
    var texty = topy - 10; 
    var TextOut = response[0].faceAttributes.age + " , " + response[0].faceAttributes.gender;
    gm('public/DroneImage.png')
       .fill('none')
        .stroke("red",4)
        .drawRectangle(topx,topy,bottomx,bottomy)
        .fontSize("20px")
        .stroke("red",2)
        .font('/Windows/Fonts/trebuc.ttf')
        .drawText(textx,texty,TextOut)
         .write('public/DroneImageSQ.png', function (err) {
        if( err ) throw err;
        console.log(response);    
    });
});

If you have got this far then you should have seen this sort of output in your command window (obviously with different results).

Face Test Working

Taking a quick look in your public directory should reveal a nicely annotated image for you to look at.

Drone Image SQ

Congratulations.. . you are now successfully using the Microsoft Cognitive Services Face API to detect a face in an image. From there you have annotated it using a third party package and written out some of the data related to that face. This is the foundation for the next step.

Identify or Similar – That is the question

So having got the face detection working, along with annotation, the next step was to make it so that we can detect a specific face. I spent a fair bit of time reading the documentation and concluded that I needed to use face.identify. I then spent a fair amount of time trying to make it work unsuccessfully.  The challenge with using face.identify is that it needs a great deal of setup which has to happen in a specific order.

So having tried for a while with Identify I thought I would switch gears and try using face.similar. While face.similar needed some setup it was a lot less complex. As a result I was able to very quickly get face.similar working. In the next post I will share a little of my journey in both areas but while you wait you can of course try yourself 🙂

Where are we and next steps

So far we have only just started our journey into the world of Microsoft Cognitive Services. We have shown the basics in action and created an approach to detect and annotate a face in an image.

The next step of the project is to make it so we can recognize a specific person using the face.similar function before looking at other techniques. This will be the focus of the next blog.

Previous Posts in the Series

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.