Why your DL accelerator should be replaced


Intelligence is quickly being added to all our electronics devices. Whether it’s our vehicles that automatically brake when things get dangerous, our phone’s cameras that ensure every picture we take looks great, or our datacenters that need to not just store and distribute our videos, but also understand what’s in them – intelligent image processing using deep learning is everywhere.

ContainersBut just adding a deep learning accelerator next to a chip’s host CPU subsystem doesn’t mean you have a chip that can handle all the required visual computing tasks. While we’ve seen some designs that didn’t realize this, many SOCs these days do have multiple compute engines for the different imaging-related processing duties.

Last year we wrote an article that gave 6 reasons that deep learning accelerators need vision processors next to them on the same chip. Here’s a quick repeat of the rationale:

  • Feed the beast: images need preprocessing before they are ready for consumption by the DL engine
  • Algorithms use both deep learning and computer vision: hybrid solutions are still often used
  • Power and cost: deep learning is compute intensive; only use it in case classical computer vision doesn’t meet your needs
  • Processing video: use deep learning to detect and recognize but use computer vision to track from frame to frame
  • No need to use AI when math works: techniques such as simultaneous localization and mapping use math instead of deep learning
  • Flexibility: hard-wired deep learning accelerators can’t run all nets

But what if you had a unified vision processor that runs deep learning tasks just as efficiently as a hard-wired deep learning accelerator? In another article we recently wrote we explain that although this may seem impossible, it’s not. A processor that’s both software programmable and more efficient than a deep learning accelerator, using a multitude of careful software and hardware optimizations, can be realized.

So first we add a vision processor, and then we replace the deep learning accelerator with a unified DL/vision processor subsystem. Now, the image signal processing block that processes the raw images captured from the image sensor is also getting more complex and needs to also include more intelligence, which means another unified DL/vision processor needs to be added. Often, the images are compressed to limit storage and bandwidth for transmission. If your visual computing subsystem is flexible enough, which our unified v-MP6000UDX architecture is, it can also handle this task.

We can now see that a unified multicore vision processing architecture that handles all intelligence and visual computing tasks makes a lot of sense. Besides being able to handle all processing tasks, there are several benefits of using such a single unified multicore system. For example, there’s only one software development tool chain the programmers need to learn, code reuse is maximized, and data moves between different accelerator blocks are minimized. The hardware design, integration, and verification efforts are also greatly reduced since there’s only a single block that needs to be integrated and replicated.

Such an architecture is exactly what we’ve built at videantis. Our unified, scalable, programmable v-MP6000UDX visual computing subsystem is easier to integrate, easier to use, and more efficient. Sometimes keeping things simple simply pays off.

31/10/2019 / Marco Jacobs