Today we speak to Sascha Klement, founder and CTO of gestigon, a leading supplier of middleware for gesture control and skeleton tracking and a videantis partner.

Could you tell us a bit about the history of the gestigon?

The roots of gestigon go back to 2002, when researchers at the University of Lübeck started working on Time of Flight image sensors, depth sensing, and pattern recognition based on significant funding from the EU. In 2011 we founded gestigon and exclusively licensed-in all academical developments to take this technology to market. Today, gestigon has 23 employees in two offices, Lübeck, Germany and Sunnyvale in the US, and we have built up our own portfolio of software, algorithms, technologies and market-ready products.

When did you start working on user interfaces and gesture control?

Depth sensors have a lot of different application areas. We didn’t want to go for industrial applications like packaging or so. We really wanted to bring this to automotive, consumer products and embedded systems. For systems like that, it’s important to be very fast, and have low-latency. So we spent a great deal of time in optimizing our algorithms for that. In 2009 we started working on gesture control. We view gesture control as not just being about swiping and pointing. We take a more holistic approach that considers the whole user experience. Gesture control is also about understanding what the person does in front of the device. There may be implicit gestures, based on the context, for instance. In automotive we want to understand what the driver is looking at, what’s the pose, and how does she enter the car, for instance.

Can you tell us what is hard about gesture recognition and skeleton tracking?

It starts at the sensor. You need to have a sufficiently high frame rate and process the data at this same high speed. For instance, for a QVGA resolution at 60fps, processing every pixel in every frame adds up to a lot of processing. Low latency is key for gesture control. So we need to process the data in about 4ms. We try to avoid smoothing the results over multiple frames, which is what other solutions do. Instead, we finish recognizing the gesture still within a frame of data. For some gestures, you need multiple frames to recognize it, but limiting this recognition time is still extremely important. When latency becomes too high, the system becomes laggy, and triggering the user to perform the gesture again. The principle is similar to delay on a phone line. With a short delay there’s no issue, but as soon as the delay is above a certain threshold, the whole user experience breaks down.

What makes your technology unique?

gestigon gesturesWe provide gesture control at all levels, from tracking the fingertips to tracking the full body. The underlying technology is based on self-organizing maps, a type of artificial neural network. This method is very robust and efficient. We run on PCs, tablets, mobile phones, and deeply embedded devices for automotive. With a single parameter, we can trade off the accuracy of the whole solution for run-time performance. This way the algorithms are fully scalable. In comparison against Intel’s RealSense, for instance, gestigon’s tracking uses less than half the resources. When more processing power is available, the system automatically increases its performance and accuracy. Those are just a few ways we’re different.

There’s no single common gesture language yet. Do you view this as an obstacle?

I’m sure that at some point there will be a standard. At the same time, we see that in the automotive space, the OEMs aren’t yet interested in standardizing. Instead, they want to ensure that their gesture interface differentiates from the competition. They want to provide a unique experience for their brand. As long as it’s intuitive and adds to the user experience, I believe gesture interfaces don’t need a single common language to be successful. In addition, since gestures may be context-dependent, a single standard only makes sense for certain applications.

Can you tell a success story?

We’re working with several major automotive OEMs. Everyone in automotive is working on gestures and creating prototypes. We can’t tell you about all the great things we’re seeing in the labs and prototypes due to very strict NDAs we have to adhere to, and due to the automotive lead times, it’ll take some time before this technology is public on the market. We provide the middleware and know how to recognize gestures and track skeletons. What’s been very interesting to see is that some of our licensees have come up with very creative solutions based on our libraries: things that we didn’t envision at all. One example is that you can have a single button behave differently for the passenger or the driver for instance, where the system automatically recognizes who touches the button. This is just one small example of what’s possible. The automotive OEMs are really covering a lot of different use cases with a single sensor, and we’re excited to provide them with the enabling technology.

Can you tell us more about the automotive opportunity?

Automotive is currently a key driving the market for gesture control. There are a lot of new automotive concepts that use this technology: in the car, outside of the car, and different ways of interacting with the car. There’s the steering wheel, the middle console, the back seats, the area around the car, etc. These techniques increase driver safety, which is always a big deal. You can keep your hands on the wheel while interacting for instance. In addition, you can design a better-looking car interior, by removing knobs in the car. Design is another key factor in automotive, so another big deal we address. We’ve made a video that shows how we see our technology apply.

Why the need for low power and high performance?

We view gesture control as just an interface to an electronics system. You should be able to plug it in just like you plug in a keyboard or mouse today. When plugging such a system in, there should be as little extra system load as possible. This modularity is a requirement for automotive too, where such added features may become paid-for options. The base model should be as low cost as possible, and not bear the overhead of a gesture system if it’s not selected by the consumer.

The powerful, and scalable platform that videantis licenses to the market is one of the key reasons we teamed up. The low-cost, low-power and high-performance system is a great match for our algorithms. The combined solution can be integrated into very small camera modules and SOCs. There’s the added benefit that such a tight integration also protects our software from illegal copying, which is harder to warrant in CPU-based processing. In addition, we’re both in Germany, and we both have many customers in automotive. So from that angle it made business sense to team up also.

Where do you see gestigon 10 years from now?

In 10 years our software should be integrated into cars as a standard way of interacting. The driver won’t know our technology is inside. Gesture control will be as easy as using a keyless entry or volume button today, and people won’t even think about the complicated technology under the hood. At home, your devices will seamlessly understand what you want to do. You can easily control them without needing to press any buttons or touch screens. We’re excited about shaping the future of the user experience together with videantis, and to deliver better ways to control and interact with the many electronic devices around us.

This article is part of a series of interviews. Previously we interviewed our own employees Andreas Dehnhardt, VP Applications Engineering, and Tony Picard, VP Sales.