We recently published “Computer Vision in Surround View Applications,” together with Embedded Vision Alliance member company AMD. In the article we go into detail on how to “stitch” together multiple images capturing varying viewpoints of a scene, in a variety of applications.

Introduction

The ability to “stitch” together (offline or in real-time) multiple images taken simultaneously by multiple cameras and/or sequentially by a single camera, in both cases capturing varying viewpoints of a scene, is becoming an increasingly appealing (if not necessary) capability in an expanding variety of applications. High quality of results is a critical requirement, one that’s a particular challenge in price-sensitive consumer and similar applications due to their cost-driven quality shortcomings in optics, image sensors, and other components. And quality and cost aren’t the sole factors that bear consideration in a design; power consumption, size and weight, latency and other performance metrics, and other attributes are also critical.

Seamlessly combining multiple images capturing varying perspectives of a scene, whether taken simultaneously from multiple cameras or sequentially from a single camera, is a feature which first gained prominence with the so-called “panorama” mode supported in image sensor-equipped smartphones and tablets. Newer smartphones offer supplemental camera accessories capable of capturing a 360-degree view of a scene in a single exposure. The feature has also spread to a diversity of applications: semi- and fully autonomous vehiclesdrones, standalone consumer cameras and professional multi-camera capture rigs, etc. And it’s now being used to not only deliver “surround” still images but also high frame rate, high resolution and otherwise “rich” video. The ramping popularity of various AR (augmented reality) and VR (virtual reality) platforms for content playback has further accelerated consumer awareness and demand.

Early, rudimentary “stitching” techniques produced sub-par quality results, thereby compelling developers to adopt more advanced computational photography and other computer vision algorithms. Computer vision functions that will be showcased in the following sections implement seamless “stitching” of multiple images together, including aligning features between images and balancing exposure, color balance and other characteristics of each image. Dewarping to eliminate perspective and lens distortions is critical to a high quality result, as is calibration to adjust for misalignment between cameras (as well as to correct for alignment shifts over time and use).

Continue reading the full article on the Embedded Vision Alliance website