Building Versa – Lessons in AI & Robotics

With a locked up world in 2020, I recently picked up a book, to get me through the long Easter Weekend. Interestingly, the book titled “Rebooting AI – Building Artificial Intelligence We Can Trust” by Gary Marcus and Ernest Davis [ISBN 978-1-5247-4825-8] describes what would make AI useful and also describes the state-of-the-art for it. The book triggered me to write this blog on my own experience with AI from back in 2015.

My friends Sudhee, Meera, Suku and I participated in a global IBM competition for uses of Cognitive and AI. We decided to build a technology demonstrator using the IBM Watson Application Programming Interfaces (APIs) and integrate it with a Robotics chassis. We fondly called the robot  – Versa! We won the challenge and here’s the back story of how we built Versa and our experiences in this process. The challenge required us to not only create the system but also submit several video submissions. A video of Versa navigating the maze is reproduced below. Full disclaimer, the video is sped up since the on-board processing as well as the communication to and from the Cloud based APIs took “some” time!

Versa was designed as a demonstration of cognitive capabilities used from IBM Watson and integrated using software with off-the-shelf computing components like Raspberry PI, BeagleBone Black and Arduino. These components were integrated with hardware components like a robotic chassis mounted with a camera and microphones to be able to see directions and hear commands. The chassis was also connected with on-board ultrasonic detectors to avoid collisions.

Our objective was that Versa should be able to “read” signs and “hear” commands to navigate through a maze. The ability to read signs required us to work with the Opensource Computer Vision library – OpenCV. Hearing required us to work with the ALSA Linux Library and Pyaudio.

We worked on the computer vision and while Sudhee made it seem easier that it was, he did have to work through several challenges. Firstly, the orientation!!! Now, as humans, we can see thing from many different angles and our personal super-computer in our heads can straight away make sense of what we are seeing. For a robot, this was incredibly hard. In the test scenario, Versa would end up in different positions and the optical recognition just wouldn’t work. To compensate for this, code was written to physically direct the robot to move a “bit forward” or a “bit backwards” so as to be able to properly orient the camera.

Secondly, lighting!!! Our eyes can adapt to  varying lighting scenarios. You got it, doing it with a machine, incredibly hard!!! Depending on the lighting, the reliable detection of  image boundaries would throw the entire system off. We put a LED torch to improve this.

For anyone who’s spoken with Alexa, Siri, Cortana or Google Talk, you might understand some of the challenges I describe for the hearing. Firstly, microphones can pick up a lot of noise. They can pick up the lowest of sounds, which while you may not hear, it is definitely heard by the computer! Secondly, just like the light, unless the microphone is close or you have specialised electronics to really amplify many times over and filter out the noise, you’re going to basically have to keep the mic next to your mouth. Anyone remember instances of when the person at the other end of a telephone conversation says “I can’t hear you, speak loudly”?

On the cognitive / AI front, the behaviour of the speech to text engine depends on intonation, accent, speech speed and of course noise. We worked on this and quickly realised that we needed to buy (not build) better audio electronics. We changed the audio processing from cheap USB sound boards to a Creative SoundBlaster USB sound board. That changed the performance immensely from a noise point of view. But distance from mic, and other human speech qualities still impacted the system. Here are two snippets of my kids speaking to illustrate the difference.

Girl Speech

Boy Speech

Switching from the tech story! The fun part of doing this was going through the overall process, while we definitely enjoyed getting our hands dirty building it. We had to submit presentations and videos of our ideas. Our Macs and Apple’s iMovie and Keynote were used immensely. The video above was put together with these tools. I recall the shooting one of my  videos, which needed to be formal wear. The jacket, shirt and tie were put on over my shorts and the video was taken in a sit down pose. Talk about tricks of the trade, I personally would have gotten fully dressed!

This project went through several stages of evaluations over several months. We worked through our family obligations, business trips and work projects to get this done. And looking at today when the world is leveraging technology to keep business going, what we managed for this project five years back was truly pioneering. During the middle of the final evaluation, Sudhee and I had to travel urgently and Versa was left in Gurgaon, and my kids were left to care for their new Tech pet. The scenario as it played out, was that the evaluation had to be done by two judges based in US for Versa, with the team distributed in Shanghai, Singapore and other parts of India. We used technology to our advantage to ensure that everyone could connect to Versa, test it, update the software and demo it. It did seem a bit like managing Curiosity on Mars.

Finally as the project drew to a close in the summer of 2015, our efforts paid off and while on a vacation to Ireland, I received the news that we had won. Talk about the Luck of the Irish! Some of it seemed to have rubbed off! Our intentions were and still are to develop this into something more. The force is strong within us all, the call is awaited!

Leave a comment